@kemal
Please check my repo for your reference.
(1) You could use a larger concurrency value, but would need a much bigger instance to run CloudQuery on.
I have adjusted the instance size to medium
, and the concurrency is the default value, which is 10K.
Do you mean if I increase the size of the EC2 instance, for example, to large or xlarge, will that be helpful? The instance CPU and memory aren’t a problem when running the cloudquery sync
command, in fact.
(2) Disable unused regions by including only the ones you know you have data for. Obviously, that doesn’t really work from a security standpoint, as ideally, you’d want to know about a single rogue resource in an obscure region.
If you check my repo, I have limited the region to Australia Sydney (ap-southeast-2
) only. Still takes a long time to sync.
(3) Another way would be to separate your accounts either logically or otherwise (group by even-odd IDs, or IDs starting or ending with a specific digit) and then run sync concurrently on separate machines (ideally managed by a central CI of some sort).
If you check my repo, I sync with Organization root. How can you manage 300+ account IDs with even and odd IDs? Especially if later some new accounts are added and some of them are suspended?
That’s not convenient at all.
I have discussed this in my blog about CloudQuery Best Practices for AWS, please also take a look: CloudQuery Best Practices for AWS
(4) There are some tables you might want to skip by default.
If you check my repo, I didn’t sync all tables; I only sync tables that are picked up from the CloudQuery Policies
pages.
If I enable all tables, even for 10 accounts, it takes forever. Thanks for the advice, I will test again with the latest version.
Are there any ways we can always use the latest version, more than I hardcoded the version in the source file?