Has there been any thought regarding the AWS plugin to pull the account IDs to sync from the aws_organizations_accounts table? For many organizations, new accounts aren’t added frequently, and by pulling from this table where the status is ACTIVE, suspended accounts can be excluded as well. Perhaps this could be a togglable capability that’s off by default, allowing customers to choose? This would help cut down on API rate limiting at scale.
The GCP plugin could also benefit from this. Historically, I’ve worked with customers who have tens of thousands of GCP projects, and the gcp_resourcemanager_projects table could be utilized.
I’ll put in a GitHub issue for this capability, but figured I’d ask first. For a bit of background, we run CloudQuery with a number of different jobs that sync specific tables at different cadences. If a customer is syncing every AWS table, then this option wouldn’t make sense. Perhaps it’s something that’s only supported/recognized if the target tables to sync don’t include aws_organizations_accounts.
So, if you don’t configure projects, GCP will discover all active projects and use them in the sync. We later use that information for gcp_resourcemanager_projects.
The AWS plugin also supports accounts discovered via the org: config. See the AWS Organization Example: AWS Organization Example.
Is that what you were looking for? This is useful if you don’t split the sync into multiple jobs (e.g., a job per account).
Yes, we’re using account discovery. All of our customers onboard at the organization level. What I’m suggesting is to have an option to only use autodiscover to populate the aws_organizations_accounts table, and for all other tables the option would pull the account ID(s) to sync from locally instead of doing the autodiscover. Hopefully, that makes sense.