Running CloudQuery plugin recursively for large dataset retrieval

natural-beagle · May 21, 2024, 2:38pm

Thanks so much, sir.
Would you check this please?
I am saving the result after fetching all pages, and am using Mongoose.
Is this the right approach?
@erez

erez · May 21, 2024, 3:18pm

Can you explain where you are trying to get data from and where you want to save it?

Usually, sources and destinations are separate plugins. For example, if you write a custom source, you would do stream.write in the source, then configure a destination separately (e.g. CloudQuery MongoDB Destination).

Are you using findOneAndUpdate to avoid re-syncing the same data over and over again?

natural-beagle · May 21, 2024, 3:20pm

To save only one item in the collection, it saves the timestamp to get modified items.

erez · May 21, 2024, 3:22pm

I think that’s OK but not so much related to the CloudQuery SDK API. You need to make sure to do

stream.write

inside fetchCVEList.

natural-beagle · May 21, 2024, 3:23pm

Yes, I did.
@erez
Above is saving documents which have the same id value.
Any solution, please?
I want to update the document which is already inserted.

erez · May 21, 2024, 3:37pm

Can you share how you are running the plugin and the CloudQuery CLI?
And the configuration spec you’re using?

natural-beagle · May 21, 2024, 3:38pm

kind: source
spec:
  name: 'sync'
  registry: 'grpc'
  path: '127.0.0.1:7777'
  version: 'v1.0.0'
  tables: ['*']
  destinations:
    - 'mongodb'
  spec:
    connectionString: mongodb://127.0.0.1:27017
    database: cloudquery-test
---
kind: destination
spec:
  name: mongodb
  path: cloudquery/mongodb
  registry: cloudquery
  version: 'v2.3.11'
  write_mode: "append"
  spec:
    connection_string: mongodb://127.0.0.1:27017
    database: cloudquery-test

erez · May 21, 2024, 3:38pm

Depending on the destination, if you define a primary key for the column based on id, the item will get updated instead of duplicated.

natural-beagle · May 21, 2024, 3:38pm

I set like this:

[Your settings or configuration here]

But still duplicating.

erez · May 21, 2024, 3:39pm

That’s because you have write_mode: "append". You should remove that configuration to use the default write_mode: "overwrite-delete-stale".

Topic		Replies	Views
CloudQuery sync from one source to multiple destinations clarification needed CloudQuery Plugins	2	12	September 27, 2023
Support for streaming in source plugins CloudQuery Plugins	1	33	November 11, 2024
Issue with synchronizing large data sources in CloudQuery CloudQuery Plugins	8	51	June 24, 2024
Request for CloudQuery plugin to forward data via socket CloudQuery Plugins	3	10	August 9, 2024
Writing to DB in realtime with JS Plugin CloudQuery Plugins	4	44	June 26, 2025

Running CloudQuery plugin recursively for large dataset retrieval

Related topics