This looks like a bug; would you mind opening an issue with the details of the steps you followed so we can take a look? Open an issue here.
If you can, it would be useful if you could also include the contents of the table you are using to store incremental state (the table you specified in backend_options.table_name, probably cq_state_aws or similar).
Hey @kemal, as you suggested, I tried using version 25.5.3 for adding incremental sync for the Inspector2 findings table. But after using this, I’m getting another issue. I don’t see any pointer added in the cq_state AWS table that I synced today. Is there something I’m missing?
Hey @funny-whale I can’t really tell what the issue you’re referring to from the image you posted. Could you clarify what your issue is?
If the sync didn’t complete successfully, you probably won’t get any updated cursor information in your state table. So if you’re getting an error, that’s probably why.
Can we apply any filters while we are using incremental_sync for inspector2_findings? Like, can we add filters to collect only high and critical violations, instead of collecting all the violations info?
Hey Are you referring to adding table_options for inspector2_findings in the sync configuration to filter the results? If so, yes, this should be independent of whether the sync is incremental or not.
I can help you with any issue setting the table options configuration for the sync. You start by adding the following in your source config file’s inner spec:
Then, from there, you look into the AWS docs for the list_findings endpoint. This is not something that CloudQuery maintains. We just enable clients to set those configurations.
I don’t exactly know how you set this filter in the AWS endpoint. What I would do is experiment with the filtering options against the AWS endpoint directly if you have direct access to it, and if not, then you could experiment by setting candidate options and running an incremental sync?
Thanks!.. But I’ve another question. I’ve already synced the data using incremental sync. Now, will the data be replaced with only high and critical values, if I’m using these filters inside an incremental sync?
Also, I can observe that the second run for inspector (running using incremental sync) is taking much time when compared to the first run. Is this expected?
Now, will the data be replaced with only high and critical values if I’m using these filters inside an incremental sync?
This is a good question. Normally, it wouldn’t, but this table (aws_inspector2_findings) is a very special case. In the case of changing the table options configuration, it will trigger a full refresh. So the answer to your question is yes.
Also, I can observe that the second run for inspector (running using incremental sync) is taking much time when compared to the first run. Is this expected?
A run using incremental sync should not take more time than a run not using it. If it’s the first time you run an incremental sync, it should take the same time, as there is nowhere to start incrementing from.
However, there are other unrelated reasons why a sync could be slower than a previous one. Examples are a slightly different sync configuration that triggers more rows to be synced, the endpoint could be temporarily slower, you could be rate limited, the immediate network from where you’re running the sync could be experiencing congestion, etc.
Okay, that makes sense for the Inspector2 findings options part.
But, I can see that the API is calling many times and after some hours the sync is being successful. The first time I synced, it took 2.5 hours, and the second time I synced, it took around 9 hours.
I will put the same filter (getting only high and critical vulnerabilities, for example) to reduce entries. Will the Inspector2 table still be refreshed every day?
Also, if the incremental sync only collects new data from the last sync time, then what happens to old data that is not valid anymore? For example, if AWS clears some Inspector findings from their service, will that still be showing in our database?
If the table options don’t change, then an incremental sync should only sync new results from the API, not a full refresh. The sync happens as often as you trigger it, so if you trigger it every day, new results will be synced every day.
Old data which is not valid anymore would disappear from your destination database if you do a full (non-incremental) sync, assuming the AWS endpoint is not returning it anymore. Otherwise, it would stay.
It’s not normal for the same sync to take 2 hours one time and 9 hours the second time, if the destination data is the same. This could be a bug. You could open an issue, or perhaps share the full configuration you used on both syncs and we can take a look.
CQ CLI version - 5.20.1
PostgreSQL version - 7.1.2
AWS version - 25.5.3
It is the same configuration for both the runs.
Also, can you explain to me here, if for one AWS account, if I have 300 vulnerabilities that should be added into the DB, then will this CloudQuery API be called 300 times? (i.e., for each and every vulnerability being logged, will it call those many times?)
The CloudQuery usage API is intended to be called periodically (every x seconds or when a counter hits a specific threshold). I see that you are running an older version of the plugin; I would encourage you to upgrade to the latest version (v27.5.0).
Also, can you explain me here, if for one AWS account, if I have 300 vulnerabilities that should be added into the database, then will this CloudQuery API be called 300 times? (i.e., for each and every vulnerability being logged, will it call those many times?)
ListFindings returns a paginated response. Each page has some number of items. So no, it wouldn’t make 300 requests. The number depends on the page size. I think you can set the page size with the max_results option (ListFindingsInput) in your table_options config, but I think the defaults should be sensible.