Sync runs duplicating data temporarily in CloudQuery need a solution

Hey, every time a sync runs, it doubles the data temporarily. Is there a good solution to prevent this from happening? Maybe Source → Postgres (Schema 1) then Postgres (Schema 1) → Postgres (Final Schema)?

Hi @comic-pup,

Yes, using different schemas is one good way of doing it. Some users also use views to select only data from a particular sync.

That said, assuming you’re using overwrite-delete-stale write mode, data won’t really be doubled as such, but rather new resources will appear before the stale resources are deleted. Resources with the same PK will be replaced in-place. If the table only has a _cq_id primary key, then you may see temporary doubling.

I think this is a problem we can still improve on, so if you have any ideas let us know, maybe we can raise an issue to look into this more. :slightly_smiling_face:

I’ve raised an issue on GitHub here with a suggested feature that I think would solve this problem in a better way: https://github.com/cloudquery/cloudquery/issues/17291