Mihai Bojin
Feb 6, 2024

--

Actually its limitation is more around disk space. You can store the source datasets as parquet files on S3, process them with DuckDB, and store the training data on disk. If you have enough disk space you should be able to process a high volume of data on a single node. And given that you can get pretty high-memory instances in all the major clouds, you should be able to get pretty far with 'just a single DuckDB node'.

--

--

Mihai Bojin
Mihai Bojin

Written by Mihai Bojin

Software Engineer at heart, Manager by day, Indie Hacker at night. Writing about DevOps, Software engineering, and Cloud computing. Opinions my own.

Responses (1)