Error on second incremental sync with CloudQuery invalid memory address or nil pointer

Hi,

While running the incremental syncs, I encountered an issue. The first run works fine, but the second run throws an error.

Error:

{
    "level": "error",
    "module": "aws-src",
    "invocation-id": "xxxxxxxxxxxxxxx",
    "client": "account_id:region",
    "error": "invalid memory address or nil pointer dereference",
    "message": "table resolver finished with panic",
    "stack": "runtime error: invalid memory address or nil pointer dereference\ngoroutine 103 [running]:\nruntime/debug.Stack()\n\t/opt/hostedtoolcache/go/1.21.4/x64/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/cloudquery/plugin-sdk/v4/scheduler.(*syncClient).resolveTableDfs.func1.1()\n\t/home/runner/go/pkg/mod/github.com/cloudquery/plugin-sdk/v4@v4.29.1/scheduler/scheduler_dfs.go:92 +0x65\npanic({0xb6bdd60?, 0x121618d0?})\n\t/opt/hostedtoolcache/go/1.21.4/x64/src/runtime/panic.go:914 +0x21f\ngithub.com/cloudquery/cloudquery/plugins/source/aws/resources/services/inspector2.fetchFindings({0xdb0a668, 0xc003ba2a20}, {0xd995d80?, 0xc0016e2200}, 0xa0c654b?, 0xa0c916f?)\n\t/home/runner/work/cloudquery-private/cloudquery-private/plugins/source/aws/resources/services/inspector2/findings.go:75 +0x430\ngithub.com/cloudquery/plugin-sdk/v4/scheduler.(*syncClient).resolveTableDfs.func1()\n\t/home/runner/go/pkg/mod/github.com/cloudquery/plugin-sdk/v4@v4.29.1/scheduler/scheduler_dfs.go:102 +0xce\ncreated by github.com/cloudquery/plugin-sdk/v4/scheduler.(*syncClient).resolveTableDfs in goroutine 102\n\t/home/runner/go/pkg/mod/github.com/cloudquery/plugin-sdk/v4@v4.29.1/scheduler/scheduler_dfs.go:89 +0x808\n",
    "table": "aws_inspector2_findings",
    "time": "2024-06-28T12:03:39Z"
}

Is this error common if we run it for the second time? Currently, there are no changes or vulnerabilities that have been added to this table.

I run the sync every day. How can I resolve this issue?

Hi @funny-whale :wave:

This looks like a bug; would you mind opening an issue with the details of the steps you followed so we can take a look? Open an issue here.

If you can, it would be useful if you could also include the contents of the table you are using to store incremental state (the table you specified in backend_options.table_name, probably cq_state_aws or similar).

Hi, try with the AWS plugin version equal to or newer than v25.5.3. There was a bug about this we fixed in that version.

Let us know how it goes! :slightly_smiling_face:

Hey @kemal, as you suggested, I tried using version 25.5.3 for adding incremental sync for the Inspector2 findings table. But after using this, I’m getting another issue. I don’t see any pointer added in the cq_state AWS table that I synced today. Is there something I’m missing?

Hey @funny-whale :wave: I can’t really tell what the issue you’re referring to from the image you posted. Could you clarify what your issue is?

If the sync didn’t complete successfully, you probably won’t get any updated cursor information in your state table. So if you’re getting an error, that’s probably why.

Hey @kemal,

Can we apply any filters while we are using incremental_sync for inspector2_findings? Like, can we add filters to collect only high and critical violations, instead of collecting all the violations info?

Hey :wave: Are you referring to adding table_options for inspector2_findings in the sync configuration to filter the results? If so, yes, this should be independent of whether the sync is incremental or not.

Okay… can you please provide me the documentation for this?

Sure. The table_options docs for the aws plugin are listed here: CloudQuery AWS Plugin Documentation

For the inspector2_findings-specific table options, the docs above will link you to this: ListFindingsInput Documentation

Can you give me an example of how to collect only high and critical violations?

I can help you with any issue setting the table options configuration for the sync. You start by adding the following in your source config file’s inner spec:

  table_options:
    aws_inspector2_findings:
      list_findings:
        - ...

Then, from there, you look into the AWS docs for the list_findings endpoint. This is not something that CloudQuery maintains. We just enable clients to set those configurations.

For example, according to the AWS docs: ListFindingsInput

It seems that you’d want to add a filter. Therefore you’d use FilterCriteria:

  table_options:
    aws_inspector2_findings:
      list_findings:
        - filter_criteria: ...

According to the endpoints docs on the AWS website: Understanding Severity

There’s a numeric score for severity:

  • 7.0–8.9: High
  • 9.0–10.0: Critical

I don’t exactly know how you set this filter in the AWS endpoint. What I would do is experiment with the filtering options against the AWS endpoint directly if you have direct access to it, and if not, then you could experiment by setting candidate options and running an incremental sync?

Thanks!.. But I’ve another question. I’ve already synced the data using incremental sync. Now, will the data be replaced with only high and critical values, if I’m using these filters inside an incremental sync?

Also, I can observe that the second run for inspector (running using incremental sync) is taking much time when compared to the first run. Is this expected?

Now, will the data be replaced with only high and critical values if I’m using these filters inside an incremental sync?

This is a good question. Normally, it wouldn’t, but this table (aws_inspector2_findings) is a very special case. In the case of changing the table options configuration, it will trigger a full refresh. So the answer to your question is yes.

Also, I can observe that the second run for inspector (running using incremental sync) is taking much time when compared to the first run. Is this expected?

A run using incremental sync should not take more time than a run not using it. If it’s the first time you run an incremental sync, it should take the same time, as there is nowhere to start incrementing from.

However, there are other unrelated reasons why a sync could be slower than a previous one. Examples are a slightly different sync configuration that triggers more rows to be synced, the endpoint could be temporarily slower, you could be rate limited, the immediate network from where you’re running the sync could be experiencing congestion, etc.

Okay, that makes sense for the Inspector2 findings options part.

But, I can see that the API is calling many times and after some hours the sync is being successful. The first time I synced, it took 2.5 hours, and the second time I synced, it took around 9 hours.

I will put the same filter (getting only high and critical vulnerabilities, for example) to reduce entries. Will the Inspector2 table still be refreshed every day?

Also, if the incremental sync only collects new data from the last sync time, then what happens to old data that is not valid anymore? For example, if AWS clears some Inspector findings from their service, will that still be showing in our database?

If the table options don’t change, then an incremental sync should only sync new results from the API, not a full refresh. The sync happens as often as you trigger it, so if you trigger it every day, new results will be synced every day.

Old data which is not valid anymore would disappear from your destination database if you do a full (non-incremental) sync, assuming the AWS endpoint is not returning it anymore. Otherwise, it would stay.

It’s not normal for the same sync to take 2 hours one time and 9 hours the second time, if the destination data is the same. This could be a bug. You could open an issue, or perhaps share the full configuration you used on both syncs and we can take a look.

Makes sense, thanks for the information.

Yeah, the configuration for incremental sync is:

kind: source
spec:
    name: "xxxxxxxx_incremental"
    path: "cloudquery/aws"
    version: "25.5.3"
    tables:
         - aws_inspector2_findings
     spec:
         accounts:
             - id: "xxxxxx"
     destinations: 
         - postgresql
     backend_options:
     table_name: cq_state_aws
     connection: '@@plugins.postgresql.connection'

CQ CLI version - 5.20.1
PostgreSQL version - 7.1.2
AWS version - 25.5.3

It is the same configuration for both the runs.

Also, can you explain to me here, if for one AWS account, if I have 300 vulnerabilities that should be added into the DB, then will this CloudQuery API be called 300 times? (i.e., for each and every vulnerability being logged, will it call those many times?)

@funny-whale - Not sure if it was just a typo when copying the config, but table_name and connection need to be indented, like this:

kind: source
spec:
  name: "xxxxxxxx_incremental"
  path: "cloudquery/aws"
  version: "25.5.3"
  tables: ["aws_inspector2_findings"]
  destinations: ["postgresql"]
  backend_options:
    table_name: cq_state_aws
    connection: '@@plugins.postgresql.connection'
  spec:
    accounts:
      - id: "xxxxxx"

The CloudQuery usage API is intended to be called periodically (every x seconds or when a counter hits a specific threshold). I see that you are running an older version of the plugin; I would encourage you to upgrade to the latest version (v27.5.0).

Okay, will check it and let you know.
It’s a typo @ben

@funny-whale regarding your question:

Also, can you explain me here, if for one AWS account, if I have 300 vulnerabilities that should be added into the database, then will this CloudQuery API be called 300 times? (i.e., for each and every vulnerability being logged, will it call those many times?)

ListFindings returns a paginated response. Each page has some number of items. So no, it wouldn’t make 300 requests. The number depends on the page size. I think you can set the page size with the max_results option (ListFindingsInput) in your table_options config, but I think the defaults should be sensible.