Azure query locks client node with high memory usage in CloudQuery

modern-burro · October 9, 2023, 9:13pm

Hi there

I’m running a PoC for CloudQuery and successfully implemented queries for AWS and GCP, with full table scans across multiple accounts/projects. However, I have an issue with Azure where my client node (c5.4xlarge) locks up completely when I do the same for Azure; before it does, memory usage will spike up and kswapd0 will eat a CPU core which usually means something doesn’t play well with memory. If I can get CloudQuery to terminate, the VM might go back to being responsive, but I usually have to stop it from the AWS console (which takes some time and will eventually succeed).

I have tried both v9.3.7 and v9.3.8 with a straightforward configuration:

kind: source
spec:
  name: azure
  path: cloudquery/azure
  version: "v9.3.8"
  tables: ["*"]
  destinations: ["postgresql"]
  spec:
  backend_options:
    table_name: "cq_azure_state"
    connection: "@@plugins.postgresql.connection"

Is there a good way of debugging this further?

ben · October 9, 2023, 9:15pm

Hi @included-collie,

Are you syncing lots of subscriptions? If you are, I would suggest that you try and lower the discovery_concurrency value…

modern-burro · October 9, 2023, 9:17pm

Yep, absolutely do - do you have a hint on a sane value when there are many subscriptions? I might want to lower concurrency as well …

ben · October 9, 2023, 9:19pm

How are you sourcing your credentials for Azure?

modern-burro · October 9, 2023, 9:19pm

They’re exported to env.

ben · October 9, 2023, 9:20pm

Ok. Just wanted to make sure you aren’t using azure login as that spawns a process for each authentication token that is needed.

modern-burro · October 9, 2023, 9:26pm

I’ve now started another run with a conservative discovery_concurrency: 50 and concurrency: 1000 which is looking much better.

ben · October 9, 2023, 9:26pm

Great! Let us know how that goes.

modern-burro · October 9, 2023, 9:27pm

Thanks for the pointer to the concurrency settings…!

Topic		Replies	Views
CloudQuery policy execution options after syncing to CSV files CloudQuery Plugins	4	7	September 14, 2023
CloudQuery concurrency spec parameter behavior with Azure API calls CloudQuery Plugins	1	4	October 15, 2023
CloudQuery fails to sync with Azure in Kubernetes environment CloudQuery Plugins	4	15	November 22, 2023
Is concurrency a valid option in CloudQuery spec today CloudQuery Plugins	8	23	February 3, 2024
Errors encountered while executing Azure queries in CloudQuery CloudQuery Plugins	4	10	January 31, 2024

Azure query locks client node with high memory usage in CloudQuery

Related topics