Hello. I’m attempting to run CloudQuery in Kubernetes. CloudQuery CLI v3.18.0. Source: Azure (v9.3.5). Destinations: BigQuery (v3.3.3), file (v3.4.6). Auth via environment variables.
When I run the container via Docker (dc up -d) on my local workstation, everything works as expected. However, when I run the container in Kubernetes (GCP), CloudQuery fails with the following error message:
ERR exiting with error error="failed to sync v3 source azure: rpc error: code = Unavailable desc = error reading from server: EOF" module=cli
I enabled --log-level debug, but that only added two lines to the log file. Does anyone here happen to have any tips for how I should proceed?
I can’t say I’m versed at k8s, but as a quick check, we have a blog post (and video!) about running on k8s here. Maybe you’re missing a step? We’ll investigate further or potentially reach out to another team member if it all checks out.
Thanks, @kemal. I’ll take another look at that documentation.
Is there a way to enable debug log level for the plugins, or should the log level be passed along to the plugins via the CLI? I ask because I was surprised by the lack of extra logging with the flag enabled.
I don’t have solid evidence yet, but I’m wondering whether the pod is having trouble reaching Azure… so I was hoping to enable more logging at the plugin level to see whether my hunch is correct.
Some plugins (like AWS) allow debug logging at the plugin level through a config setting (aws_debug), but Azure right now does not. I think this is a good idea though, assuming it’s possible; I’ll open a feature request for it.
The best you can do for now is to set --log-level debug, which should still give you more information, just not at the Azure API call level.
To close the loop on this, the problem ended up being that the discovery_concurrency was way too high. Once I lowered that from the default of 400 down to 1, the error went away. 29 was the highest I could go without getting the error, so I settled on 25.
The error in the debug logs that CloudQuery provided was zero help in troubleshooting this. Hopefully, debug logging via issue #13994 will provide more useful information if/when that is implemented.