Unable to run cloudquery syncs in parallel seeking configuration guidance

selected-bonefish · April 24, 2024, 11:18pm

Hi CQ folks,

I am trying to run CloudQuery syncs in parallel, but it seems like the execution happens sequentially. Just curious if I am missing any setting in my configuration below that should be used to run the syncs from different sources in parallel.

kind: source
spec:
  name: aws-us-west-2
  registry: local
  path: /app/plugins/aws
  tables:
    - aws_*
  destinations: ["postgresql"]
  spec:
    concurrency: 100
    initialization_concurrency: 4
    aws_debug: false
    regions:
      - "us-west-2"
    org:
      member_role_name: ${MEMBER_ROLE_NAME}
      admin_account:
        id: dev
        role_arn: ${ASSUME_ROLE_ARN}
---
kind: source
spec:
  name: aws-ap-northeast-1
  registry: local
  path: /app/plugins/aws
  tables:
    - aws_*
  destinations: ["postgresql"]
  spec:
    concurrency: 100
    initialization_concurrency: 4
    aws_debug: false
    regions:
      - "ap-northeast-1"
    org:
      member_role_name: ${MEMBER_ROLE_NAME}
      admin_account:
        id: dev
        role_arn: ${ASSUME_ROLE_ARN}
---
kind: destination
spec:
  name: postgresql
  registry: local
  path: /app/plugins/postgresql
  write_mode: append
  spec:
    connection_string: ${PG_CONNECTION_STR}

ben · April 24, 2024, 11:22pm

Hi @selected-bonefish,

You are correct. All source blocks in a single sync will be executed sequentially. If you are interested in running syncs in parallel, you should put each source in its own container and run the containers in parallel.

It looks like you are using the AWS plugin, so I would suggest checking out the ECS Deployment guide to run parallel ECS tasks: ECS Deployment Guide.

Alternatively, you can run jobs in parallel on EKS or any Kubernetes cluster: Kubernetes Deployment Guide.

selected-bonefish · April 24, 2024, 11:27pm

I see, thanks for the clarification. Not sure if it’s just me, but it seems like the doc (Running CloudQuery in Parallel) is kind of misleading. I will move each source to a different container. Thanks!

ben · April 25, 2024, 11:37am

Thanks for that feedback! If you can point me to the piece that tripped you up, we will rework it to be less ambiguous.

alfredgamulo · November 19, 2024, 6:56pm

The documentation at Running CloudQuery in Parallel | CloudQuery implies that a config file with multiple source blocks will run in parallel.

When splitting a sync into multiple source-integration configurations to be run in parallel

Secondly, if a config file has multiple sources, how does it affect the “sharding” feature?

Topic		Replies	Views
CloudQuery sync from one source to multiple destinations clarification needed CloudQuery Plugins	2	11	September 27, 2023
Error syncing multiple connections to the same database with cloudquery CloudQuery Plugins	2	3	January 19, 2024
CloudQuery not processing multiple source plugins in config.yml CloudQuery Plugins	10	36	February 2, 2024
How to reduce cloudquery sync time for multiple aws accounts CloudQuery Plugins	14	27	September 28, 2023
Introducing Automatic Sharding for Smarter, Faster CloudQuery Syncs! 🚀 Announcements	0	29	September 25, 2024

Unable to run cloudquery syncs in parallel seeking configuration guidance

Related topics