I have a custom plugin that pulling 10s of thousands of records, I am iterating API calls of 500 records each. After each 500, I am passing them to the stream.write function.
However, nothing is being written to the DB until after the entire table (destination is Postgres) has been processed. Is this the expected behavior? I have even tried to make the batch size very small to try and trigger it writing sooner. It seems it would be much more efficient to pass the data through so it doesn’t have to accumulate in memory.
Hi @Duncan_Mapes what you’re experiencing could be related to batch settings at the destination.
There’s nothing on the JS SDK side that does buffering.
So reducing batch size to 1 via batch_size : 1 for the Postgres destination should cause rows to be committed immediately.
The tradeoff here is between memory consumption and performance. It’s much faster to write to Postgres in batches. The default batch size was selected to work for most cases, you can experiment with different values to see what fits yours the best