Understanding CloudQuery incremental plugin and cq_state_aws table functionality

concise-oryx · May 16, 2024, 11:45am

How does the incremental plugin actually work?
What does the cq_state_aws table do? Does it store the data when we try to do an incremental sync?

kemal · May 16, 2024, 11:57am

Yes, it stores a state value for each incremental table. How it actually works is documented in the CloudQuery documentation.

concise-oryx · May 16, 2024, 12:06pm

We are actually trying to have CloudQuery syncs based on the different environments. We have around 9 environments.

I have tried setting up the incremental sync for the aws_inspector2_findings table, and the issue here is it is taking around 15 hours to complete the sync, and this environment has just 132 accounts.

I also see that there is nothing stored in the cq_aws_state table. I have read the documentation but am not sure how the sync is going to happen. In my case, I am running the incremental sync for different environments with a 2-hour gap. How is the incremental sync going to happen in my case?

erez · May 16, 2024, 12:34pm

Hi @concise-oryx, can you share the configuration files you’re using for the sync?

concise-oryx · May 16, 2024, 12:49pm

***source.yml***
kind: source
spec:
    name: "AWS_ACCOUNT_ID"
    path: "cloudquery/aws"
    version: "23.1.0"
    tables: ["*"]  
    destinations: ["postgresql"]
    backend_options:
        table_name: "cq_state_aws"
        connection: "@@plugins.postgresql.connection"
    spec:
        accounts:
            - id: "xxxxxxxxx"
              role_arn: "arn:aws:iam::xxxxxxxxx:role/test-role"

erez · May 16, 2024, 12:59pm

OK, so I recommend at least skipping the tables in CloudQuery AWS Configuration when using tables: ["*"] as they contain a lot of static data and make the sync slow.

kemal · May 16, 2024, 1:39pm

Are you using multiple source sections to fetch multiple AWS accounts? ~~If so, each one should have its own table_name backend option; otherwise, they will clash~~. I checked the code and looking at how it was implemented, they won’t clash for the AWS plugin.

You need at least the v24.1.0 version of the AWS plugin to utilize incremental table functionality for aws_inspector2_findings. v23.x or 24.0.0 doesn’t have that.

concise-oryx · May 17, 2024, 6:04am

I am already skipping around 80 AWS tables which are not needed for us and that which are recommended in the documentation.

Okay… I will try it and let you know the output.

Sorry… I didn’t get it… All the source plugins are AWS itself.

I’m using this like… First, the sync is done for all the tables except the Inspector table, and later the incremental sync is done for that account. This process repeats for all the other accounts.

kemal · May 17, 2024, 8:54am

Interesting, you’re doing this to measure how long aws_inspector2_findings takes, to be able to get to the rest of the data early?

concise-oryx · May 17, 2024, 8:55am

The rest of the data sync can be done in 30 min or so, but when I have added the incremental sync, it is taking around 15 to 20 hours to complete the sync. I’ll first try with 24.1.0 and get back to you.

Hi @kemal, @erez,

The aws_inspector2_findings is a premium table for v24.1.0. How can we perform a CloudQuery sync as we are using an open source version?

kemal · May 17, 2024, 9:50am

You still get some amount of free rows per month, which resets every month, so it should be possible to get some benefit without paying if you manage the tables option carefully. Also please refer to the blog post at CloudQuery Official Free Plugins Moving to Paid about the announcement.
You could even mix and match AWS plugin versions to further reduce your quota usage, I believe.

concise-oryx · May 20, 2024, 4:27am

I think this needs a CloudQuery login to use AWS Plugin version 24.1.0. Will it work even without providing a CloudQuery login?

kemal · May 20, 2024, 8:33am

Login is required in newer versions so we can track free usage quota for everybody.

concise-oryx · May 20, 2024, 8:58am

From which AWS plugin version onwards is the CloudQuery login needed?

kemal · May 20, 2024, 9:15am

Here’s the announcement about that: Mandatory Login

concise-oryx · May 20, 2024, 9:25am

This AWS Plugin 24.1.0 version was released on February 7th, and the announcement was made on January 17th. As a result, I can’t perform a sync without CloudQuery login.

Is there any other solution that you can provide so that we can perform an incremental sync for the inspector2_findings table?

kemal · May 20, 2024, 9:57am

You can also create an API key and use that instead of the login.

concise-oryx · May 21, 2024, 4:27am

Can we run a CloudQuery sync (excluding Inspector tables) for some group of accounts, and after completing the sync, can we run the incremental sync for all these accounts only for Inspector?

kemal · May 21, 2024, 8:03am

You can. But if you’re using the write_mode: overwrite-delete-stale, it will delete all rows that weren’t synced in the same source name. So you’ll need different name fields in each source config. No issue if you’re using write_mode: append (or just overwrite).

concise-oryx · May 21, 2024, 10:02am

For the sync that is excluding inspector, it is set to mode “overwrite-delete-stale”, and for the incremental sync to work, we need either “overwrite-delete-stale” or “overwrite”. How can I perform the syncs together?

Topic		Replies	Views
How does incremental syncing work with CloudQuery from aws to postgres CloudQuery Plugins	6	12	January 31, 2024
Help with incremental syncing of CloudTrail management and data events in CloudQuery CloudQuery Plugins	13	29	August 28, 2024
CloudQuery sync time for AWS data in PG database on EKS CloudQuery Plugins	24	14	October 23, 2023
Confusion about sync_state table creation in CloudQuery PostgreSQL integration CloudQuery Plugins	2	2	September 29, 2023
Incremental tables not syncing with Elasticsearch due to index not found error CloudQuery Plugins	1	1	November 8, 2023

Understanding CloudQuery incremental plugin and cq_state_aws table functionality

Related topics