Request for CloudQuery to identify slow syncing resources and API limit issues

Hi,

I love working with CloudQuery for syncing our AWS resources. One of the issues we have is that we sync many AWS accounts across two organizations, and this may grow in the future. Each of our source/destination configurations is responsible for syncing one account’s (with all necessary regions) data as a separate job.

In some cases, certain accounts will contain many resources, especially for specific types of accounts. For example, one account may be used mostly for DNS, so it will have large Hosted Zones with many records. Others may be EC2 heavy or contain many Step Functions and Step Function executions.

What we do now is check which job (account) is taking too long to sync, review the logs, and try to determine if it is just that there are a lot of resources to sync or if the issue is that a particular AWS resource has a stricter set of API limits. Of course, if you have a lot of resources/data to fetch—again, a good example would be Step Function executions—and that resource has strict limits, it can make the job/sync take days even if most of the resources have already been synced. When needed, we move those slow-syncing resources to a separate job for the same account.

A nice feature to have would be to determine which source resources are taking the longest to sync and whether we are hitting API limits. This can help us plan the frequency of synchronization and decide if we need to move that sync to a separate source/destination config to run it in a separate job.

Hi! That’s a really interesting idea. Thanks for sharing some details about how you run syncs. Would you mind opening this as an issue on our GitHub project? GitHub Project

Hi @comic-firefly!

There’s additionally a list of tables that we know are slower than we wish them to be: CloudQuery - Configuration Skip Tables

Yeah, we’ve already added these to skip_tables.