Cloudquery tables lack foreign key relationships for improved data flexibility

Why do the CloudQuery tables not have any foreign key relationships between them?

It might be easier for us to find the relationships between the different resource tables if we had foreign keys enabled for these tables.

Hi @concise-oryx,

I commented on the issue here.

failed to download plugin: failed downloading from URL [{https://github.com/cloudquery/cloudquery/releases/download/plugins-source-aws-v23.1.0/aws_windows_amd64.zip true} {https://github.com/cloudquery/cq-source-aws/releases/download/v23.1.0/cq-source-aws_windows_amd64.zip false}]. Error not found

I’m getting this error when trying to use the AWS version as v23.1.0.

Hi @concise-oryx, can you share your CLI version and configuration content (please redact any sensitive information)?

cloudquery --version

cloudquery version 2.5.1

Can you please update to the latest version? That version is quite old and doesn’t support the CloudQuery registry where new versions of plugins are hosted.
You can use this guide to update: CloudQuery Quickstart

Can I directly update it to the latest version??
Or will it have any dependencies on the AWS and Azure and the Postgres versions that I am using?
I have used this command, but still, I am getting the same version which I am currently using and unable to update it.

Command:

curl -L https://github.com/cloudquery/cloudquery/releases/download/cli-v5.8.1/cloudquery_windows_amd64.exe -o cloudquery.exe

Can you paste the command you’re running to check the version?
It’s best to use the latest CLI with new plugin versions, but it should be compatible with old versions too. However, the latest plugin versions do require a new CLI and won’t work with an old one.

Okay… makes sense checking on this… thanks for your immediate support @erez.
Now I am able to update the CloudQuery CLI version but unable to init the AWS plugin.

Error: 
 unexpected error from sync client receive: rpc error: code = Unknown desc = failed to sync records: failed to sync unmanaged client: skip_tables include a pattern aws_redshift_endpoint_access with no matches

But I can see that the version of AWS Source plugin which I am using is having this table “aws_redshift_endpoint_access”.

Versions I am Using:

  • AWS: 23.1.0
  • PostgreSQL: 7.0.0
  • CLI: 5.8.1

Hi @concise-oryx,

The table name is aws_redshift_endpoint_accesses. I think you’re missing es at the end.

Hi @erez,

Now I am able to run CloudQuery for my required version, but I am getting an issue while running it.

Error:

unexpected error from sync client receive: rpc error: code = Unknown desc = failed to sync records: failed to sync unmanaged client: your configuration references the following premium tables: "aws_accessanalyzer_analyzer_findings_v2, aws_autoscaling_warm_pools, aws_backupgateway_gateways, aws_budgets_budgets, aws_budgets_actions, aws_cloudwatch_metrics, aws_cloudwatch_metric_statistics, aws_cod...". Please run `cloudquery login` or use a valid API Key which can be generated via https://cloud.cloudquery.io to allow the sync to succeed

Syncing resources… (0/-, 0 resources/hr) [0s]

Hi @concise-oryx,

As the error suggests, you’d need to run

cloudquery login

and authenticate to sync those tables.

See more about this in this blog post.

But they are premium tables, right?
Do we forcefully have to be upgraded to v25.0 in CloudQuery since all tables are premium now?

Or can we use the old versions for some time?

You can still use older versions.

And also, there aren’t any foreign key relationships between the tables in CloudQuery. Then how do we find the relation between the tables of different resources?

For example, suppose EC2 has ELbv2LoadBalancers, ElbV2Listeners, and Ec2Eips, etc. How can we find the relation between these tables?

There might be some use cases where we need the Load Balancers and Listener information of EC2 instances. How do we find the relation between them?

Thanks, makes sense.

Each table has a _cq_id and _cq_parent_id columns, so you should be able to use those to link between children and parents.

You can get all the relations via the cloudquery tables command, as explained in this GitHub issue.

Can you specify the table names you’d like to correlate? For example, aws_elbv2_listeners is a child of aws_elbv2_load_balancers, so aws_elbv2_listeners will have _cq_parent_id pointing to _cq_id of aws_elbv2_load_balancers.

So you are saying that for the child table, we have _cq_parent_id, and for the parent table, we have _cq_id. So can we perform a join directly, or should we create a foreign key relationship between them?

And also, if I need to join aws_elbv2_target_group_target_health_descriptions and aws_elbv2_listeners tables, how do I achieve this? Because both of these tables have different parents.

I think you should do a join. I would not create a foreign key, as syncing data from multiple APIs is not an atomic operation, so data can get stale while we sync it or even have transient errors.

What are you trying to achieve with the join between aws_elbv2_listeners and aws_elbv2_target_group_target_health_descriptions?

Sorry, I didn’t get this. You mean I should create new foreign keys for them?
We can get the listeners that are not attached to the target group like that?

I would recommend not creating foreign keys as that might make the schema too hard to modify/extend.

Looks like aws_elbv2_target_groups has a load_balancer_arns column.