Cloudquery python sync for aws environments not finding proper method

Hey y’all,

I’m looking into how to use CloudQuery to sync AWS environments’ inventories/resources in Python. I’ve done it before in the CLI, and was trying to figure out a way to do it programmatically. However, I was unable to find the right way to go about this.

I landed on the Python SDK eventually, but it seems like the cloudquery-plugin-sdk is more for creating new plugins for data sources that don’t have official CloudQuery support.

How would I go about kicking off CloudQuery syncs in Python? Is the only way to do this via YAML files and the CLI, and will I have to create/change those dynamically in code?

Appreciate the help!

Hey! The best way and easiest way right now would be to just to call fork-exec from Python: https://github.com/cloudquery/cq_dagster_embedded_elt. The Dagster piece is not necessary in this example but just the Python bit.

Awesome, thank you! I’ll take a look!

Is it accurate to say that the logic of the code is the below:

  1. Define a string that has the same information you’d put into the source and destination YAML files when using the CLI.
  2. Create a temporary YAML file by writing that string to it.
  3. Use some module to help you run the cloudquery sync CLI command and point at the temp file that was created.

Does that mean wherever this code is running, you will need to install CloudQuery beforehand? And that the primary/main way to run a cloudquery sync on a supported target is via the CLI, and programmatic support is based on running the same flow described above?

yeah exactly. even if we were to have a native python sdk you would need to download cloudquery beforehand as cloudquery is not a python library and the python binding would just call the cloudquery process under the hood anyway.

Assuming you mean *would need to download CloudQuery before, right? And yeah definitely, makes sense! Just wanted to make sure I didn’t misinterpret anything, appreciate the help!!