Hey all, I’m trying to set up the extraction of data from a Stripe account into an S3 bucket, and so far I’ve managed to have it working for full extractions (without the incremental part) using the following configuration:
kind: source
spec:
name: "stripe"
path: "cloudquery/stripe"
version: "v2.1.11"
registry: github
tables: ["stripe_charges"]
destinations: ["s3"]
spec:
api_key: ${STRIPE_TOKEN}
---
kind: destination
spec:
name: "s3"
path: "cloudquery/s3"
version: "v4.8.3"
write_mode: "append"
spec:
bucket: "datalake-bucket"
region: "us-east-1"
path: "raw_data/stripe_raw_data/{{TABLE}}/dt={{YEAR}}-{{MONTH}}-{{DAY}}/{{UUID}}.parquet"
format: "parquet"
athena: true
Now I’m trying to change it to have incremental extractions working, but I’m always getting the error:
Error: failed to sync v3 source stripe: unexpected error from sync client receive: rpc error: code = Unknown desc = failed to sync records: failed to sync unmanaged client: failed to create state client: rpc error: code = Unknown desc = failed to sync records: failed to read: operation error S3: GetObject, https response error StatusCode: 404, RequestID: XXXXXXXXXXXXXXX, HostID: (...), NoSuchKey:
My suspicion is that an S3 destination cannot be used to store and manage state tables, but given that I can’t find documentation about this detail, I’m not sure how to proceed to have the incremental extractions working. Can anyone help?
My current configuration is as follows:
kind: source
spec:
name: "stripe"
(...)
backend_options:
table_name: "cq_state_stripe"
connection: "@@plugins.s3.connection"
---
kind: destination
spec:
(...)
spec:
bucket: "datalake-bucket"
region: "us-east-1"
path: "raw_data/stripe_raw_data/{{TABLE}}/dt={{YEAR}}-{{MONTH}}-{{DAY}}/{{YEAR}}-{{MONTH}}-{{DAY}}.parquet"
format: "parquet"
athena: true
no_rotate: true