Hey all, I’m looking at setting up a PostgreSQL → GCS extract. I have both pieces working, but I was wondering if there’s a way to configure the GCS destination based on the table/schema of the PostgreSQL?
I would like the path to look something like:
path/to/file/<postgres_schema>/<postgres_table>/*.csv
for that table. It looks like I could do this if I made a source/destination for each table, but that seems rather cumbersome.
Hi! I believe that right now the pg
source only works by syncing one schema only, so you would need to set up a source and destination per schema, and then you can have the above structure.
I’m currently only syncing the public schema. What’s the configuration to change the path for each table?
Check out this. The path
can be postgre_schema/
, and then it will create each table in its own file if this is what you are looking for.
I don’t think so, I’m looking to have each table be in its own directory in GCS. As in: if I have two tables: ‘actions’ and ‘orders’. The path in GCP for actions would be:
gs://bucket-name/path/to/file/actions/actions.csv
and for orders:
gs://bucket-name/path/to/file/orders/orders.csv
Is that possible?
I see. I believe this is not possible right now. Can you open an issue please? What is the reason that gs://bucket-name/path/to/file/actions.csv
wouldn’t work?
Sure! I use the directories as a basis for organization currently within our data lake. I suppose it’s mostly convenience and organization. But in general, some mild templating for the destination could be helpful, especially if it includes a timestamp of when the sync runs.
And thanks for your help!
GitHub Issue #15097