Cloudquery tables lack foreign key relationships for improved data flexibility

Then how to join these tables without having foreign keys?

So you could try using that. You can still join without foreign keys; the join will still work. Generally, it’s better to rely on the information returned from AWS to logically link different resources (like load_balancer_arns) instead of using _cq_id and _cq_parent_id.

Once you find which columns work for you to link the data, you’d create a view from the query.

Please let me know if that makes sense.

Hi @erez,

One more question. You said we can use _cq_parent_id and _cq_id to relate the parent and the child table. But I think there are no relationships externally defined in the tables in the database, right? Should I need to create foreign keys explicitly? Also, is there any way in the future where you can try to add the relation for these tables?

Hey @erez, also another thing. When I am trying to sync my tables locally for one account, I am getting some errors.

For example:

2024-03-06T05:07:56Z ERR table resolver finished with error error="operation error API Gateway: GetUsagePlans, https response error StatusCode: 403, RequestID: a427e56b-a886-43aa-bb40-1a6589b2fghd92, api error AccessDeniedException: User: arn:aws:sts:❌xxxxxxxxx-assumed-role/sso-devops-iam-role/xxxxxxxxxxxxxx@xxxxx.com is not authorized to perform: apigateway:GET on resource: arn:aws:apigateway:ap-northeast-3::/usageplans with an explicit deny in a service control policy" client=xxxxxxxxxxxx:ap-northeast-3 module=aws-src table=aws_apigateway_usage_plans

Can you also specify on this?

Hi @concise-oryx, you don’t need to create foreign keys. You can find the relations between the tables using the tables command I shared:

cloudquery tables

That command generates a JSON file with information on the relation. I don’t think we’ll add foreign keys for users as:

  1. It makes it harder to migrate the schema.
  2. It’s only relevant for a few destinations.

Regarding the error, it looks like you’re missing some permissions to perform that API call.

Hi @erez,

I gave the cloudquery tables command but I’m running into a different issue:

cloudquery tables C:/Users/gkaturi

Loading spec(s) from C:/Users/gkaturi

Error: failed to load spec(s) from C:/Users/gkaturi. Error: expecting at least one source

What is the content of C:/Users/gkaturi? You should have at least one source configuration in that directory to generate source tables.

This is my local directory. What should I place there after cloudquery tables in that command?

You should point to the same path you use to run cloudquery sync.

Hi @erez,

I have tried syncing the CloudQuery tables, but I am running into an issue now:

failed to sync records: failed to sync unmanaged client: your configuration references the following premium tables: "aws_accessanalyzer_analyzer_findings_v2,aws_autoscaling_warm_pools,aws_backupgateway_gateways,aws_budgets_budgets,aws_budgets_actions,aws_cloudwatch_metrics,aws_cloudwatch_metric_statistics,aws_cod...". Please run `cloudquery login` or use a valid API Key which can be generated via https://cloud.cloudquery.io to allow the sync to succeed

My versions:

  • cli = “v5.8.1”
  • db = “v7.0.0”
  • aws = “v23.1.0”
  • azure = “v11.0.0”

Should I need to downgrade or upgrade anything here to avoid this error?

Hi @concise-oryx,

You’d need to run cloudquery login since you’re referencing paid tables.

How can I not refer to these paid tables? Can you please specify how I would clear this error without having a login? I should not be able to sync the tables because I am not a premium member.

You can use skip_tables: in the configuration.

Okay… Should I skip all of them? Which are the premium tables?

You can use this list as a reference: CloudQuery AWS Premium Tables.

Please note that for new versions of the plugin, you’d still need to log in regardless of whether you use paid tables or not.

From which versions onwards for AWS, Azure, PostgreSQL, and CLI should we compulsorily log in?

All plugin versions released starting from 22 February 2024.

Is there something preventing you from logging in? You can still use some tables for free with the login and get the latest updates. Happy to learn more about your use case.

Sure… thanks @Erez for your valuable information.

Hi @erez, the aws_inspector_findings table in CloudQuery is taking more time to sync for all the accounts in our DB and then failing for some accounts randomly. It is almost running for 7 to 8 hours and failing for some random accounts.

What should we do for this table?

Also, one thing which we observed is, when we skip the aws_inspector_findings table, this sync is working fine.

Hi @concise-oryx, what you’d usually do is create 2 separate sync jobs, one with aws_inspector_findings included and another with aws_inspector_findings skipped. Then you’d run those on separate schedules.

How to do 2 sync jobs?
Also, in one account we have more than one lakh inspector findings, and there may be a case this might be for multiple accounts too. How would I handle this case then?
@erez can you please suggest or help me out here?

To do 2 sync jobs, you’ll create 2 different sets of configurations, e.g.

filename=config-1.yml
kind: source
spec:
  name: aws-1
  path: cloudquery/aws
  tables: [...]
  skip_tables: ["aws_inspector_findings"]
---
kind: destination
...

and

filename=config-2.yml
kind: source
spec:
  name: aws-2
  path: cloudquery/aws
  tables: ["aws_inspector_findings"]
---
kind: destination
...

Then run cloudquery sync config-1.yml and cloudquery sync config-2.yml separately. You can split to more configurations that way to keep individual jobs from running for a long time, then run each job on a different schedule. You can specify to only sync a single account here.