Hello folks! We are using the S3 destination, writing in Parquet format. We have a number of JSON fields across our data sources. They seem to be landing in S3 as base64-encoded.
Can you please confirm this behavior? Is it configurable? Can you point me to the source code?
What warehouse do you use to query it? Is it Athena? If yes, there is a flag (S3 Overview) that you need to use called athena that should make JSON columns compatible with Athena as different destinations expect the JSON columns in slightly different ways.
I’ll also let other folks weigh in here on Monday in case I’m not up to date with the S3 destination.
So the CloudQuery JSON type is an Arrow extension, which uses binary as its base type. I can imagine how a bug there, or maybe in the Arrow parquet writer when dealing with extension types, could lead to it being stored as base64, but it shouldn’t be the case as far as I know.
I also just tested it myself, loading a parquet file with JSON columns from S3 and it showed as varchar, not base64 encoded. I tested by downloading the file and then importing it with DuckDB.
Could you share a reproduction by any chance? Maybe the source plugin is using an older version of the SDK, or a different language?
I’ll take a look and post here once I have some ideas.
OK, I do see a way to improve things: we use Append call on the bytes, which shouldn’t be, as the Append call for JSONBuilder marshals the data itself.
I’ll conjure a quick fix, but I’d like to give it for you to test (along with the change PR to see the changes, if you wish), so that we’re on the same page here.
We tried registering now on our end from different machines and it seems to work. Can you try from incognito? (Just to make sure there are no cache issues.) Also, which browser do you use? (We will try to replicate the issue.)
I tried from both Chrome and Brave, but even if it’s on your end, then the message should be more descriptive, even though it’s a Firebase error, so we don’t control that part much. I’m still trying to investigate. Do you have Firefox or anything else? (Just trying to narrow down the cause of this.)