Hey! I’m trying to load a large Postgres table to S3/Parquet. It’s a batch job, so no CDC. I can see that the file/S3 destination supports certain partitioning such as {{TABLE}} and the current year, month, day, etc.
However, I’d like to partition the resulting Parquet files by a value from a Postgres column, for example, {{USERID}}. Is there any way I can achieve that?
I don’t think that’s possible right now. Feel free to raise an issue if you think this is something you’ll need again in the future; this sounds like something that would be useful to others as well.
If you need a solution right now, I think the only way would be a pre-transformation step. For example, first do a transformation in the database so every user ID gets its own table, then sync to S3. I don’t know how feasible this would be in your case!
Alternatively, you can do a post-transformation step with something like Glue to achieve the same thing.