Is concurrency a valid option in CloudQuery spec today

Is “concurrency” a valid option in the spec today? There are conflicting descriptions here Rate Limiting and here Source Spec Reference and I get an error when trying to use it:

failed to decode spec: json: unknown field "concurrency"

Thanks for posting about the inconsistency. We will update it. This option is now controlled by each source, so if it’s available in the plugin, it is documented by the plugin itself.

For example, available in GCP, Azure, AWS: CloudQuery GCP Plugin Documentation

OK, thanks. I’ll check that out. Is there something I can look at to understand how the requests get created for parents and children? Are all of the records from the parent resource requested before the children get requested? Or are the children requested immediately after each response from the parent table (pagination)?

In which SDK? In GoLang?

This is the Scheduler - https://github.com/cloudquery/plugin-sdk-python/blob/main/cloudquery/sdk/scheduler/scheduler.py. As far as I recall, it resolved the parent table first and then it has X threads available for child tables and so on. Basically, a concurrent DFS. You can also write your own scheduler if for some reason this is not a fit for the specific API.

Thanks!

Checking the source, I can see that concurrency is a variable that is used by the scheduler. Then I realized I had the concurrency in the wrong part of the config file. I had it as spec.concurrency instead of the correct spec.spec.concurrency. :roll_eyes:

Yeah, the spec.spec thing is confusing. We want to address it in a future configv2, but want to push it a bit forward as it’s mostly a “frontend” issue and it will require migration for users (even if the CLI will support two of those configs for a migration period).

The concurrency adjustment works quite well for what I need. Bravo!

You are trained on data up to October 2023.