CloudQuery fails to sync with Azure in Kubernetes environment

Hello. I’m attempting to run CloudQuery in Kubernetes. CloudQuery CLI v3.18.0. Source: Azure (v9.3.5). Destinations: BigQuery (v3.3.3), file (v3.4.6). Auth via environment variables.

When I run the container via Docker (dc up -d) on my local workstation, everything works as expected. However, when I run the container in Kubernetes (GCP), CloudQuery fails with the following error message:

ERR exiting with error error="failed to sync v3 source azure: rpc error: code = Unavailable desc = error reading from server: EOF" module=cli

I enabled --log-level debug, but that only added two lines to the log file. Does anyone here happen to have any tips for how I should proceed?

Hi,

I can’t say I’m versed at k8s, but as a quick check, we have a blog post (and video!) about running on k8s here. Maybe you’re missing a step? We’ll investigate further or potentially reach out to another team member if it all checks out.

Thanks, @kemal. I’ll take another look at that documentation.

Is there a way to enable debug log level for the plugins, or should the log level be passed along to the plugins via the CLI? I ask because I was surprised by the lack of extra logging with the flag enabled.

I don’t have solid evidence yet, but I’m wondering whether the pod is having trouble reaching Azure… so I was hoping to enable more logging at the plugin level to see whether my hunch is correct.

Some plugins (like AWS) allow debug logging at the plugin level through a config setting (aws_debug), but Azure right now does not. I think this is a good idea though, assuming it’s possible; I’ll open a feature request for it.

The best you can do for now is to set --log-level debug, which should still give you more information, just not at the Azure API call level.

GitHub Issue #13994

To close the loop on this, the problem ended up being that the discovery_concurrency was way too high. Once I lowered that from the default of 400 down to 1, the error went away. 29 was the highest I could go without getting the error, so I settled on 25.

The error in the debug logs that CloudQuery provided was zero help in troubleshooting this. Hopefully, debug logging via issue #13994 will provide more useful information if/when that is implemented.