Hi everyone - new user to CloudQuery here! I was hoping someone may be able to help me with what’s probably a fairly simple question.
I have CloudQuery hooked up and running nicely against an Azure tenant syncing to PostgreSQL. While this works really nicely for a single source, I was wondering how this could work in a ‘multi-client’ scenario. I’d like to be able to run this sync against multiple client environments, storing the data back in to the same database for analytics. What is the recommended way to achieve this?
Is there some way to tag rows within a sync with a client name, for example? I notice there is a _cq_parent_id field in the database after a sync - is this perhaps something that could be used?
So I think the best thing to do here would be to set the name field in your source config to a different value for every environment you’re running the sync in (you can choose any value that makes sense for you).
In the Postgres database, the value of the source name will then get stored as _cq_source_name.
This is also important because when CloudQuery performs actions like delete-stale, it uses the _cq_source_name column to determine which rows should be deleted (or not).
Thanks @herman - that’s really interesting. I actually did try that initially, albeit with the same Azure tenant, but duplicated into two separate .yml configs, both with a different name.
What I observed was that the second run overwrote the first in the database. I assumed (probably incorrectly) that the name change would tell CloudQuery that it was a different source, but it seems that’s not the case.
Does CloudQuery recognize it was the same Azure source in some other way?
Oh right, hmm, I see. Probably you were using overwrite-delete-stale write mode?
With this write mode, each table gets a Primary Key (PK) and rows with matching PKs will be replaced. This is useful if you want to just keep the latest snapshot of your resources. Source name is not part of this PK, so if you were syncing the same Azure tenant and same tables, they would be replaced yes.
But a setup where you sync the same subscriptions and tables with different source names is not usually how it’s done and is probably not advised, since the delete-stale step will only delete old rows matching the current source name, if that makes sense.
Ah OK! Yes, I am using overwrite-delete-stale at the moment. So it sounds like I’m doing the right thing, but my test was probably not ideal as it was the same tenant - I’ll have a go with two separate ones and see what happens.
Thank you
Yep - you were absolutely right! Two separate tenants now record correctly into the database with the default overwrite-delete-stale.