Adding cluster name metadata to k8s source data for gcs bigquery queries

pretty-wolf · January 26, 2024, 1:01am

I’m testing out the k8s source. I’ve run it against ~4 different clusters (we have about 25 clusters). Is there a way to hydrate metadata like the cluster name into it? I send it to GCS, which uses transfer jobs to BigQuery, but I can’t distinguish the data from different clusters in my queries.

My config is:

---
kind: source
spec:
  name: k8s
  path: cloudquery/k8s
  registry: cloudquery
  version: "v5.2.6"
  tables: ["*"]
  destinations: ["gcs_k8s"]
---
kind: destination
spec:
  name: "gcs_k8s"
  path: "cloudquery/gcs"
  registry: "cloudquery"
  version: "v3.4.12"
  spec:
    bucket: "cloudquery"
    path: "k8s/${cluster_shortname}"
    format: "parquet"
    no_rotate: true

yevgenyp · January 26, 2024, 1:03am

Hey .
One thing you could do is use the name as the cluster name and it will be available in a column called _cq_source_name.

pretty-wolf · January 26, 2024, 1:04am

Got a snippet I can steal?
I see what you mean, just not sure how to change it to your suggestion.

yevgenyp · January 26, 2024, 1:04am

---
kind: source
spec:
  name: your_cluster_name
  path: cloudquery/k8s
  registry: cloudquery
  version: "v5.2.6"
  tables: ["*"]
  destinations: ["gcs_k8s"]
---
kind: destination
spec:
  name: "gcs_k8s"
  path: "cloudquery/gcs"
  registry: "cloudquery"
  version: "v3.4.12"
  spec:
    bucket: "cloudquery"
    path: "k8s/${cluster_shortname}"
    format: "parquet"
    no_rotate: true

pretty-wolf · January 26, 2024, 1:04am

ah. that’s easy!
one sec
nvm lol

pretty-wolf · January 26, 2024, 1:12am

Thanks! That was super quick. Now that this is generally working, I’ll be able to write the design review next week and hopefully get it up and running for all 25 clusters.

yevgenyp · January 26, 2024, 1:17am

Nice Keep us posted, curious to learn how it goes!

Topic		Replies	Views
Cluster data becoming stale in DB after deleting k8s cluster CloudQuery Plugins	2	8	November 14, 2023
K8s cloudquery sync empty context issue with service account setup CloudQuery Plugins	1	7	April 30, 2024
CloudQuery table name prefixing and lowercase formatting options CloudQuery Plugins	14	37	August 22, 2024
Multi-client sync configuration guidance for CloudQuery with PostgreSQL CloudQuery Plugins	5	28	January 5, 2024
Cloudquery duplicate destination configuration clarification needed CloudQuery Plugins	5	9	December 15, 2023

Adding cluster name metadata to k8s source data for gcs bigquery queries

Related topics