Hi,
Can someone confirm how long CloudQuery takes to sync AWS data in a PG database? I am running it in EKS, and the cron has been running for 2 hours.
Hi,
Can someone confirm how long CloudQuery takes to sync AWS data in a PG database? I am running it in EKS, and the cron has been running for 2 hours.
Hi Sorry, I thought I replied and forgot to send the message.
Duration of sync time
I am using cli-v3.23.1
AWS plugin version v22.15.0
PostgreSQL v6.1.0
The sync time depends on the amount of resources you have in your account, so it’s hard to predict in advance.
yes but it’s usually not this long
Can you share your configuration? Maybe we can optimize it.
its more than 4 hours
kind: source
spec:
name: aws
path: cloudquery/aws
version: "v22.15.0" # latest version of aws plugin
tables: ["*"]
destinations: ["postgresql"]
kind: destination
spec:
name: postgresql
path: cloudquery/postgresql
version: "v6.1.0" # latest version of postgresql plugin
write_mode: "overwrite-delete-stale"
spec:
connection_string: <DB_CONNECTION>
Also see Performance Tuning and AWS Source Configuration.
And the source config?
kind: source
spec:
name: aws
path: cloudquery/aws
version: "v22.15.0" # latest version of aws plugin
tables: ["*"]
destinations: ["postgresql"]
Also, we don’t want to skip any tables with the old version. It was working fine @erez. It’s just after the update I am seeing this.
So with each new version, we’re adding more tables and resources; that’s why newer versions might take longer.
tables: ["*"]
can be super slow as it syncs all tables, including static AWS data that doesn’t change often.
If you’re using tables: ["*"]
, it’s recommended to skip the tables in this link.
Another option is to look at the diff from the previous version, then skip the tables that were added with the new version to get the same performance. But that will render the update less useful.
Yes,
So any idea how long it takes with the new version?
@erez
New tables might be useful for us to get more data.
@erez Can you also confirm if syncing a resource is incremental or if it starts syncing all from scratch?
It’s hard to predict the duration as it’s also based on your connection speed and the machine you’re using to sync.
Incremental syncs are supported for a few tables, and you’d have to enable it manually. See CloudQuery Incremental Tables for more information.
Earlier it used to be around 20 minutes, but now it’s more than 4 hours. Even the cron job terminates and creates a new pod.
@erez, is there any new version I should use? Are you aware of any bugs in the versions I provided?
Can you share the previous version you were using? We’re not aware of any bugs. v22.15.0
is a recent version. It’s doing more work now as we’ve added some tables that require quite a bit of time (listed in the doc I shared).
We’ve also changed the default of tables
. It used to be *
, and we changed it as it’s not a good default for most people since sync time grows with new versions and it syncs data that most people don’t need.
Which version are you upgrading from? Have you tried skipping the tables I shared? Most likely you were not syncing them before.
I removed the old version; it was 2.x. When I started the first sync with the new version, I was getting some migration errors. So, I deleted all tables and let CloudQuery create them again.
Do you mean the AWS plugin major version was v2
?
Usually, switching between major versions requires dropping some tables.
Each major version has a breaking change that’s detailed in the changelog.
I am seeing the below log:
2023-10-20T12:08:28Z INF table sync finished client=<account_id>:us-east-1 errors=0 module=aws-src resources=0 table=aws_route53_hosted_zone_query_logging_configs
2023-10-20T12:08:28Z INF table sync finished client=<account_id>::us-east-1 errors=0 module=aws-src resources=5052 table=aws_route53_hosted_zone_resource_record_sets
2023-10-20T12:08:28Z INF table sync finished client=<account_id>::us-east-1 errors=0 module=aws-src resources=0 table=aws_route53_hosted_zone_traffic_policy_instances
After that, there are no logs in debug mode. It looks like it just got stuck.
That might also signal a memory consumption issue. Can you either try skipping some tables from CloudQuery Documentation - Skip Tables or lowering concurrency settings per CloudQuery Documentation - Tune Concurrency?
Again, we highly recommend not to use
tables: ["*"]
in a single sync, especially for AWS. What most users do is split the configuration into multiple sync jobs.
@erez, you mean defining tables in a different source file?