Hi,
I love working with CloudQuery for syncing our AWS resources. One of the issues we have is that we sync many AWS accounts across two organizations, and this may grow in the future. Each of our source/destination configurations is responsible for syncing one account’s (with all necessary regions) data as a separate job.
In some cases, certain accounts will contain many resources, especially for specific types of accounts. For example, one account may be used mostly for DNS, so it will have large Hosted Zones with many records. Others may be EC2 heavy or contain many Step Functions and Step Function executions.
What we do now is check which job (account) is taking too long to sync, review the logs, and try to determine if it is just that there are a lot of resources to sync or if the issue is that a particular AWS resource has a stricter set of API limits. Of course, if you have a lot of resources/data to fetch—again, a good example would be Step Function executions—and that resource has strict limits, it can make the job/sync take days even if most of the resources have already been synced. When needed, we move those slow-syncing resources to a separate job for the same account.
A nice feature to have would be to determine which source resources are taking the longest to sync and whether we are hitting API limits. This can help us plan the frequency of synchronization and decide if we need to move that sync to a separate source/destination config to run it in a separate job.