Cloudquery tables lack foreign key relationships for improved data flexibility

concise-oryx · March 5, 2024, 3:47pm

Then how to join these tables without having foreign keys?

erez · March 5, 2024, 3:47pm

So you could try using that. You can still join without foreign keys; the join will still work. Generally, it’s better to rely on the information returned from AWS to logically link different resources (like load_balancer_arns) instead of using _cq_id and _cq_parent_id.

Once you find which columns work for you to link the data, you’d create a view from the query.

Please let me know if that makes sense.

concise-oryx · March 5, 2024, 4:05pm

Hi @erez,

One more question. You said we can use _cq_parent_id and _cq_id to relate the parent and the child table. But I think there are no relationships externally defined in the tables in the database, right? Should I need to create foreign keys explicitly? Also, is there any way in the future where you can try to add the relation for these tables?

Hey @erez, also another thing. When I am trying to sync my tables locally for one account, I am getting some errors.

For example:

2024-03-06T05:07:56Z ERR table resolver finished with error error="operation error API Gateway: GetUsagePlans, https response error StatusCode: 403, RequestID: a427e56b-a886-43aa-bb40-1a6589b2fghd92, api error AccessDeniedException: User: arn:aws:sts:❌xxxxxxxxx-assumed-role/sso-devops-iam-role/xxxxxxxxxxxxxx@xxxxx.com is not authorized to perform: apigateway:GET on resource: arn:aws:apigateway:ap-northeast-3::/usageplans with an explicit deny in a service control policy" client=xxxxxxxxxxxx:ap-northeast-3 module=aws-src table=aws_apigateway_usage_plans

Can you also specify on this?

erez · March 6, 2024, 9:04am

Hi @concise-oryx, you don’t need to create foreign keys. You can find the relations between the tables using the tables command I shared:

cloudquery tables

That command generates a JSON file with information on the relation. I don’t think we’ll add foreign keys for users as:

It makes it harder to migrate the schema.
It’s only relevant for a few destinations.

Regarding the error, it looks like you’re missing some permissions to perform that API call.

concise-oryx · March 6, 2024, 9:05am

Hi @erez,

I gave the cloudquery tables command but I’m running into a different issue:

cloudquery tables C:/Users/gkaturi

Loading spec(s) from C:/Users/gkaturi

Error: failed to load spec(s) from C:/Users/gkaturi. Error: expecting at least one source

erez · March 6, 2024, 9:18am

What is the content of C:/Users/gkaturi? You should have at least one source configuration in that directory to generate source tables.

concise-oryx · March 6, 2024, 9:29am

This is my local directory. What should I place there after cloudquery tables in that command?

erez · March 6, 2024, 10:11am

You should point to the same path you use to run cloudquery sync.

concise-oryx · March 6, 2024, 1:02pm

Hi @erez,

I have tried syncing the CloudQuery tables, but I am running into an issue now:

failed to sync records: failed to sync unmanaged client: your configuration references the following premium tables: "aws_accessanalyzer_analyzer_findings_v2,aws_autoscaling_warm_pools,aws_backupgateway_gateways,aws_budgets_budgets,aws_budgets_actions,aws_cloudwatch_metrics,aws_cloudwatch_metric_statistics,aws_cod...". Please run `cloudquery login` or use a valid API Key which can be generated via https://cloud.cloudquery.io to allow the sync to succeed

My versions:

cli = “v5.8.1”
db = “v7.0.0”
aws = “v23.1.0”
azure = “v11.0.0”

Should I need to downgrade or upgrade anything here to avoid this error?

erez · March 6, 2024, 1:03pm

Hi @concise-oryx,

You’d need to run cloudquery login since you’re referencing paid tables.

concise-oryx · March 6, 2024, 2:38pm

How can I not refer to these paid tables? Can you please specify how I would clear this error without having a login? I should not be able to sync the tables because I am not a premium member.

erez · March 6, 2024, 3:30pm

You can use skip_tables: in the configuration.

concise-oryx · March 6, 2024, 4:09pm

Okay… Should I skip all of them? Which are the premium tables?

erez · March 6, 2024, 4:15pm

You can use this list as a reference: CloudQuery AWS Premium Tables.

Please note that for new versions of the plugin, you’d still need to log in regardless of whether you use paid tables or not.

concise-oryx · March 6, 2024, 4:17pm

From which versions onwards for AWS, Azure, PostgreSQL, and CLI should we compulsorily log in?

erez · March 6, 2024, 5:08pm

All plugin versions released starting from 22 February 2024.

Is there something preventing you from logging in? You can still use some tables for free with the login and get the latest updates. Happy to learn more about your use case.

concise-oryx · March 7, 2024, 4:52am

Sure… thanks @Erez for your valuable information.

Hi @erez, the aws_inspector_findings table in CloudQuery is taking more time to sync for all the accounts in our DB and then failing for some accounts randomly. It is almost running for 7 to 8 hours and failing for some random accounts.

What should we do for this table?

Also, one thing which we observed is, when we skip the aws_inspector_findings table, this sync is working fine.

erez · March 12, 2024, 11:15am

Hi @concise-oryx, what you’d usually do is create 2 separate sync jobs, one with aws_inspector_findings included and another with aws_inspector_findings skipped. Then you’d run those on separate schedules.

concise-oryx · March 12, 2024, 11:15am

How to do 2 sync jobs?
Also, in one account we have more than one lakh inspector findings, and there may be a case this might be for multiple accounts too. How would I handle this case then?
@erez can you please suggest or help me out here?

erez · March 12, 2024, 11:40am

To do 2 sync jobs, you’ll create 2 different sets of configurations, e.g.

filename=config-1.yml
kind: source
spec:
  name: aws-1
  path: cloudquery/aws
  tables: [...]
  skip_tables: ["aws_inspector_findings"]
---
kind: destination
...

and

filename=config-2.yml
kind: source
spec:
  name: aws-2
  path: cloudquery/aws
  tables: ["aws_inspector_findings"]
---
kind: destination
...

Then run cloudquery sync config-1.yml and cloudquery sync config-2.yml separately. You can split to more configurations that way to keep individual jobs from running for a long time, then run each job on a different schedule. You can specify to only sync a single account here.

Topic		Replies	Views
Understanding CloudQuery incremental plugin and cq_state_aws table functionality CloudQuery Plugins	47	147	June 18, 2024
Error on second incremental sync with CloudQuery invalid memory address or nil pointer CloudQuery Plugins	44	71	July 24, 2024
Issue with adding dependent tables in CloudQuery CloudQuery Plugins	29	36	March 6, 2024
Error syncing CloudQuery with AWS AppSync GraphQL APIs column already exists CloudQuery Plugins	19	37	March 19, 2024
CloudQuery plugin version queries for paid plan eligibility CloudQuery Plugins	1	1	March 6, 2024

Cloudquery tables lack foreign key relationships for improved data flexibility

Related topics