How does the incremental plugin actually work?
What does the cq_state_aws table do? Does it store the data when we try to do an incremental sync?
Yes, it stores a state value for each incremental table. How it actually works is documented in the CloudQuery documentation.
We are actually trying to have CloudQuery syncs based on the different environments. We have around 9 environments.
I have tried setting up the incremental sync for the aws_inspector2_findings
table, and the issue here is it is taking around 15 hours to complete the sync, and this environment has just 132 accounts.
I also see that there is nothing stored in the cq_aws_state
table. I have read the documentation but am not sure how the sync is going to happen. In my case, I am running the incremental sync for different environments with a 2-hour gap. How is the incremental sync going to happen in my case?
Hi @concise-oryx, can you share the configuration files you’re using for the sync?
***source.yml***
kind: source
spec:
name: "AWS_ACCOUNT_ID"
path: "cloudquery/aws"
version: "23.1.0"
tables: ["*"]
destinations: ["postgresql"]
backend_options:
table_name: "cq_state_aws"
connection: "@@plugins.postgresql.connection"
spec:
accounts:
- id: "xxxxxxxxx"
role_arn: "arn:aws:iam::xxxxxxxxx:role/test-role"
OK, so I recommend at least skipping the tables in CloudQuery AWS Configuration when using tables: ["*"]
as they contain a lot of static data and make the sync slow.
Are you using multiple source
sections to fetch multiple AWS accounts? If so, each one should have its own . I checked the code and looking at how it was implemented, they won’t clash for the AWS plugin.table_name
backend option; otherwise, they will clash
You need at least the v24.1.0
version of the AWS plugin to utilize incremental table functionality for aws_inspector2_findings
. v23.x
or 24.0.0
doesn’t have that.
I am already skipping around 80 AWS tables which are not needed for us and that which are recommended in the documentation.
Okay… I will try it and let you know the output.
Sorry… I didn’t get it… All the source plugins are AWS itself.
I’m using this like… First, the sync is done for all the tables except the Inspector table, and later the incremental sync is done for that account. This process repeats for all the other accounts.
Interesting, you’re doing this to measure how long aws_inspector2_findings
takes, to be able to get to the rest of the data early?
The rest of the data sync can be done in 30 min or so, but when I have added the incremental sync, it is taking around 15 to 20 hours to complete the sync. I’ll first try with 24.1.0
and get back to you.
The aws_inspector2_findings
is a premium table for v24.1.0
. How can we perform a CloudQuery sync as we are using an open source version?
You still get some amount of free rows per month, which resets every month, so it should be possible to get some benefit without paying if you manage the tables
option carefully. Also please refer to the blog post at CloudQuery Official Free Plugins Moving to Paid about the announcement.
You could even mix and match AWS plugin versions to further reduce your quota usage, I believe.
I think this needs a CloudQuery login to use AWS Plugin version 24.1.0. Will it work even without providing a CloudQuery login?
Login is required in newer versions so we can track free usage quota for everybody.
From which AWS plugin version onwards is the CloudQuery login needed?
Here’s the announcement about that: Mandatory Login
This AWS Plugin 24.1.0 version was released on February 7th, and the announcement was made on January 17th. As a result, I can’t perform a sync without CloudQuery login.
Is there any other solution that you can provide so that we can perform an incremental sync for the inspector2_findings
table?
You can also create an API key and use that instead of the login.
Can we run a CloudQuery sync (excluding Inspector tables) for some group of accounts, and after completing the sync, can we run the incremental sync for all these accounts only for Inspector?
You can. But if you’re using the write_mode: overwrite-delete-stale
, it will delete all rows that weren’t synced in the same source name
. So you’ll need different name
fields in each source config. No issue if you’re using write_mode: append
(or just overwrite
).
For the sync that is excluding inspector, it is set to mode “overwrite-delete-stale”, and for the incremental sync to work, we need either “overwrite-delete-stale” or “overwrite”. How can I perform the syncs together?