Writing to DB in realtime with JS Plugin

I have a custom plugin that pulling 10s of thousands of records, I am iterating API calls of 500 records each. After each 500, I am passing them to the stream.write function.

However, nothing is being written to the DB until after the entire table (destination is Postgres) has been processed. Is this the expected behavior? I have even tried to make the batch size very small to try and trigger it writing sooner. It seems it would be much more efficient to pass the data through so it doesn’t have to accumulate in memory.

Any comments would be appreciated.

Hi @Duncan_Mapes what you’re experiencing could be related to batch settings at the destination.
There’s nothing on the JS SDK side that does buffering.
So reducing batch size to 1 via batch_size : 1 for the Postgres destination should cause rows to be committed immediately.
The tradeoff here is between memory consumption and performance. It’s much faster to write to Postgres in batches. The default batch size was selected to work for most cases, you can experiment with different values to see what fits yours the best

1 Like

Hey @erez , thanks for replying,. I tried that, it doesn’t appear to write until the resolver is completed.

Hi @Duncan_Mapes can you open a bug report then on GitHub - cloudquery/cloudquery: The open source high performance ELT framework powered by Apache Arrow? If you can share a reproduction (via a GitHub repo would be best) that would be great