Support for streaming in source plugins

Itay_Waisman · November 11, 2024, 11:49am

Hey everyone,
I’m evaluating implementations for configuration and metadata extraction from the main 3 cloud service providers to an internal storage we have.
We have some tables we expect to be very large, and we thought of going towards the directions of extracting them in batches or even streaming rows.
I saw CQ has great support for streaming destinations plugins such as Kafka, but I couldn’t find information about the support on the source side.
Do source plugins first load all results into memory and then stream them or stream intermediate batches \ rows directly to the destination?
Thanks a lot!
Itay.

erez · November 11, 2024, 12:03pm

Hi @Itay_Waisman source plugins stream the results and they don’t load everything into memory.
We apply batching both on sources side and destinations side to achieve the best performance and memory consumption.

You can read more about source batching in One change to optimize them all: a story about batching on the source side | CloudQuery Blog and for destinations, it’s destination specific and you can see the Kafka defaults in cloudquery/plugins/destination/kafka/client/spec/spec.go at 9d420c96278fe08c35b37a1afb915baf271dee59 · cloudquery/cloudquery · GitHub

Please let me know if you have further questions

Topic		Replies	Views
Running CloudQuery plugin recursively for large dataset retrieval CloudQuery Plugins	30	57	May 21, 2024
Request for CloudQuery plugin to forward data via socket CloudQuery Plugins	3	9	August 9, 2024
CloudQuery not processing multiple source plugins in config.yml CloudQuery Plugins	10	39	February 2, 2024
Streaming to Google Pub/Sub not clearly documented in CloudQuery CloudQuery Plugins	7	5	November 11, 2023
Writing to DB in realtime with JS Plugin CloudQuery Plugins	4	42	June 26, 2025

Support for streaming in source plugins

Related topics