CloudQuery sync issues with GCP projects and empty tables

sure-hound · September 18, 2023, 6:57pm

- It’s me again, back with more stuff I broke. I’m noticing some weird syncing issues that I can’t quite figure out what’s going on. I’ll post more details in this thread, but to summarize: I can see in gcp_projects that the _cq_sync_time of a particular project is definitely getting updated on our scheduled CloudQuery runs, but I’m not seeing other tables actually being synced.

Looking at this example, the project is definitely getting synced, but the other tables aren’t showing any more syncing activity. Since this is BigQuery and it’s append-only, I’d expect to see a result returned each time that the CloudQuery sync is run.

The only thing that’s changed is we modified the name in the config file, but I don’t think that would matter. That shouldn’t be relevant, in my humble opinion. This same behavior is seen in a number of different projects (we have about 3900 GCP projects, for what it’s worth).

I discovered this because I noticed that the GCP console told me a particular bucket in this project was public, but then the gcp_storage_bucket_policies for this project was completely empty, leading me to believe that it’s not actually running.

It shouldn’t be a permissions issue; the service account has Viewer and Security Reviewer permissions at the org level (to be honest, this is probably more than what is needed) and has access to all projects in our org. I tried tailing the log file, but nothing is coming up in terms of permissions or whatever.

Here’s the config file we’re running:

Latest CLI version
Source Plugin version 9.3.3

kemal · September 18, 2023, 7:57pm

Interesting. I will investigate and get back to you.

I was first thrown by _cq_source_name not matching, but that would mean it hasn’t successfully run/fetched since you’ve renamed the source.

sure-hound · September 18, 2023, 10:41pm

Well to make it more weird… it syncs some tables but not others? Now I’m even more confused

kemal · September 19, 2023, 7:45am

I know sifting through the logs might be an issue with that many accounts, but how about logs?

enabled_services_only might be consuming most of the quota and you might need higher backoff_* settings.

If you can isolate the sync to a single project (and/or just gcp_storage_buckets maybe) and if it works out fine then it’s most likely quota/backoff.

sure-hound · September 19, 2023, 11:22am

I grep’d the logs but didn’t see anything at all for the project in question.
I’m just gonna break this sync up into a couple smaller jobs either way and see if that helps

kemal · September 19, 2023, 12:37pm

Interesting, are we sure it’s still in the correct folder specified in the config? (If it wasn’t, I’m not sure if it would show up in gcp_projects or not)

sure-hound · September 19, 2023, 3:00pm

Yeah, it’s in the correct folder. I think the folder is just so large that it’s either timing out (CI job has a 3-hour limit ) or it’s quota related.

I broke the job up into smaller jobs and it’s been running for much longer than I expected, so I’m taking that as a good sign that the sync finished, but the _cq_sync_time for this project I’m looking at in gcp_storage_buckets isn’t updating.

The only error in the logs for this project are about the osconfig inventory API not being enabled; otherwise, there’s nothing in the logs.

kemal · September 19, 2023, 4:47pm

Would it be possible to just sync that project specifically, using project_id (instead of folder_ids) just to make sure we can fetch that normally and there’s no funny permission (or destination related, although I wouldn’t expect it) issues?

sure-hound · September 21, 2023, 2:24pm

Sorry for the delay here - so I re-ran the CloudQuery sync with just the one project. The _cq_sync_time in gcp_storage_buckets still isn’t updating, and there’s no new entry in BigQuery.

The only log produced is the ERR that the osconfig API isn’t enabled in the project.

kemal · September 21, 2023, 2:27pm

That’s just weird. Is it possible to try with another destination, e.g., file plugin maybe? Or looking at the number of resources fetched might also help. Is it 0 or a positive count?

sure-hound · September 21, 2023, 2:35pm

My query that worked the other day to pull which table syncs now doesn’t work in BigQuery, so let me see if I can figure out an easy way to see which things actually synced on this last run. We do all our CloudQuery inside a CI job, so I don’t think writing a file would be super easy without exporting a CI artifact that has sensitive values in it (see details here).

But let me see if there’s a way to quickly write to GCS or something without reverse engineering our CI, lol.

Yeah, it’s still only syncing a subset of tables for this project. I’m going to delete this row from the DB and see if that forces a sync to occur.

Well… that did nothing . It’s literally just not syncing this bucket? This is so strange. The GCP account has permissions at the org level, and I can clearly see on the IAM tab of the bucket that it has view access to it. So something is happening that isn’t making it into the error logs?

kemal · September 21, 2023, 2:46pm

Any applicable warning logs?

sure-hound · September 21, 2023, 2:47pm

Updating the CI job now to debug log. Normally we only log at the error level, but YOLO I’ll turn 'em all on.

kemal · September 21, 2023, 2:51pm

Another idea: Is the BigQuery table partitioned (or how is it partitioned)? I think partitioning may introduce delays when queried on.

sure-hound · September 21, 2023, 2:54pm

yeah, it’s just literally not even trying to sync the table at all.
Our BigQuery is partitioned by day but that’s across the board and it’s managing to sync in 90% of our other projects just fine.

2023-09-21T14:50:12Z WRN the top-level `scheduler` option is deprecated. Please use the plugin-level scheduler option instead field=scheduler module=cli source=gcp-test
2023-09-21T14:50:12Z WRN the top-level `concurrency` option is deprecated. Please use the plugin-level concurrency option instead field=concurrency module=cli source=gcp-test

are the only warning messages too
it’s like it’s just ignoring the other things in the project to sync.
I’m going to specify just gcp_storage_* and see what happens then.

kemal · September 21, 2023, 3:03pm

Are you still using enabled_services_only: true? I’m assuming the GCP storage service is enabled for the account…

sure-hound · September 21, 2023, 3:08pm

Okay… I think we’re getting somewhere.
So I have enabled_services: true in the config… and when I look at the project, it doesn’t have the Cloud Storage API enabled… but the project 100% has a bucket in it.
But it looks like the Cloud Storage API doesn’t need to be enabled to create/delete buckets because in my Terraform project I have a GCS bucket I created with Terraform and the Cloud Storage API isn’t enabled there either?

kemal · September 21, 2023, 4:32pm

enabled_services: true specifically checks for if a service is enabled before queueing the resources to be synced.

sure-hound · September 21, 2023, 5:01pm

Right - but it looks like at least for GCS… it doesn’t look like that API needs to be enabled for buckets to exist or be created? I have multiple projects with buckets and that API disabled.

kemal · September 22, 2023, 8:55am

Looks like it… It seems possible to create buckets and even upload objects from the console with the service disabled.

Opened an issue for this to track and prioritize/discuss next steps.

Topic		Replies	Views
Snyk_projects table empty after sync possibly due to API migration issue CloudQuery Plugins	1	0	January 22, 2024
CloudQuery GCP sync folders and project association using _cq_parent_id CloudQuery Plugins	2	3	February 28, 2023
GCP Subfolders not syncing in CloudQuery despite correct IAM permissions CloudQuery Plugins	3	4	May 20, 2024
AWS RDS Instances table not syncing with CloudQuery version 16.2.0 CloudQuery Plugins	1	12	September 16, 2024
Request for CloudQuery to identify slow syncing resources and API limit issues CloudQuery Plugins	3	3	February 26, 2024

CloudQuery sync issues with GCP projects and empty tables

Related topics