CloudQuery job performance issues after upgrading to plugin version v22.14.0

Hello CQ team,

After upgrading to the new plugin version, CloudQuery jobs are taking too long to finish. You can see from the screenshot that the duration for the job to complete on plugin version v18.4.0 is just 34 minutes. However, after upgrading to plugin version v22.14.0, the duration of the job increased to 5.72 hours, which is very high.

Config used for v18.4.0:

  kind: source
  spec:
    name: "aws-tables"
    path: "cloudquery/aws"
    version: "v18.4.0"
    concurrency: 300000
    destinations: ["postgresql"]
    tables:
      - "*"
    skip_tables:
      - aws_organization_resource_policies
      - aws_organizations
      - aws_organizations_*
      - aws_iam_*
      - aws_cloudtrail_*
      - aws_cloudwatchlogs_*
      - aws_athena_*
      - aws_cloudtrail_events
      - aws_inspector2_findings
      - aws_inspector_findings
      - aws_guardduty_*
      - aws_frauddetector_*
      - aws_config_*
      - aws_accessanalyzer_*
      - aws_stepfunctions_*
      - aws_glue_job_runs
      - aws_ecr_repository_image_scan_findings
      - aws_securityhub_findings
    spec:
      org:
        member_role_name: cq-role
      aws_debug: true
      max_retries: 2
      max_backoff: 30
      regions:
        - "specific-region-01"
  ---
  kind: destination
  spec:
    name: postgresql
    path: cloudquery/postgresql
    version: "v4.2.2"
    write_mode: "overwrite-delete-stale"
    spec:
      connection_string: ${CQ_DSN}

Config used for v22.14.0:

  kind: source
  spec:
    name: "aws-tables"
    path: "cloudquery/aws"
    version: "v22.14.0"
    concurrency: 300000
    destinations: ["postgresql"]
    tables:
      - "*"
    skip_tables:
      - aws_organization_resource_policies
      - aws_organizations
      - aws_organizations_*
      - aws_iam_*
      - aws_cloudtrail_*
      - aws_cloudwatchlogs_*
      - aws_athena_*
      - aws_cloudtrail_events
      - aws_inspector2_findings
      - aws_inspector_findings
      - aws_guardduty_*
      - aws_frauddetector_*
      - aws_config_*
      - aws_accessanalyzer_*
      - aws_stepfunctions_*
      - aws_glue_job_runs
      - aws_ecr_repository_image_scan_findings
      - aws_securityhub_findings
      - aws_ssoadmin_*
    spec:
      org:
        member_role_name: cq-role
      aws_debug: true
      max_retries: 2
      max_backoff: 30
      regions:
        - 'specific-region-01'
  ---
  kind: source
  spec:
    name: "aws-low-ratelimit"
    path: "cloudquery/aws"
    version: "v22.14.0"
    concurrency: 100
    destinations: ["postgresql"]
    tables:
      - "aws_ssoadmin_instances"
    spec:
      org:
        member_role_name: cq-role
      aws_debug: true
      max_retries: 2
      max_backoff: 30
      regions:
        - 'specific-region-01'
  ---
  kind: destination
  spec:
    name: postgresql
    path: cloudquery/postgresql
    version: "v6.0.8"
    write_mode: "overwrite-delete-stale"
    spec:
      connection_string: ${CQ_DSN}

I suspect that what happened for this run that differed is that the upgraded version caused some DB migrations; as in, some table columns got shifted around or something like that.

Have you performed any subsequent runs after the upgrade?

@neat-kingfish - That is a great idea; various table migrations would definitely have added latency!

@summary-robin In between v18.4 and v22.14, there were nearly 100 new tables and 400 services that are now available in new regions. Each could add significant latency depending on the number of resources, accounts, and regions being synced. If you look at the changelog for the AWS plugin, you can see all of the tables that have been added: AWS Plugin Changelog

I would suggest that you identify the tables in the v18 version of the plugin that you are interested in syncing and then only sync those same tables in v22.14. If there is a huge jump in duration between versions when syncing the same tables, please let us know and we will investigate!

Here is a list of all of the tables that were added between v18.4 and v22.14:

Yes, I tried a couple of runs and it’s been a week now, still getting the same duration results.

@summary-robin - Just wanted to let you know that we found a bug in the PostgreSQL destination that impacts the performance when syncing very large tables. I would suggest upgrading to the latest Postgres version (can be found here).

If your sync times are still much higher between v18 and v22, please let us know!