Unable to run cloudquery syncs in parallel seeking configuration guidance

Hi CQ folks,

I am trying to run CloudQuery syncs in parallel, but it seems like the execution happens sequentially. Just curious if I am missing any setting in my configuration below that should be used to run the syncs from different sources in parallel.

kind: source
spec:
  name: aws-us-west-2
  registry: local
  path: /app/plugins/aws
  tables:
    - aws_*
  destinations: ["postgresql"]
  spec:
    concurrency: 100
    initialization_concurrency: 4
    aws_debug: false
    regions:
      - "us-west-2"
    org:
      member_role_name: ${MEMBER_ROLE_NAME}
      admin_account:
        id: dev
        role_arn: ${ASSUME_ROLE_ARN}
---
kind: source
spec:
  name: aws-ap-northeast-1
  registry: local
  path: /app/plugins/aws
  tables:
    - aws_*
  destinations: ["postgresql"]
  spec:
    concurrency: 100
    initialization_concurrency: 4
    aws_debug: false
    regions:
      - "ap-northeast-1"
    org:
      member_role_name: ${MEMBER_ROLE_NAME}
      admin_account:
        id: dev
        role_arn: ${ASSUME_ROLE_ARN}
---
kind: destination
spec:
  name: postgresql
  registry: local
  path: /app/plugins/postgresql
  write_mode: append
  spec:
    connection_string: ${PG_CONNECTION_STR}

Hi @selected-bonefish,

You are correct. All source blocks in a single sync will be executed sequentially. If you are interested in running syncs in parallel, you should put each source in its own container and run the containers in parallel.

It looks like you are using the AWS plugin, so I would suggest checking out the ECS Deployment guide to run parallel ECS tasks: ECS Deployment Guide.

Alternatively, you can run jobs in parallel on EKS or any Kubernetes cluster: Kubernetes Deployment Guide.

I see, thanks for the clarification. Not sure if it’s just me, but it seems like the doc (Running CloudQuery in Parallel) is kind of misleading. I will move each source to a different container. Thanks!

Thanks for that feedback! If you can point me to the piece that tripped you up, we will rework it to be less ambiguous.

The documentation at Running CloudQuery in Parallel | CloudQuery implies that a config file with multiple source blocks will run in parallel.

When splitting a sync into multiple source-integration configurations to be run in parallel

Secondly, if a config file has multiple sources, how does it affect the “sharding” feature?