Struggling with incremental extractions from Stripe to S3 using CloudQuery

Hey all, I’m trying to set up the extraction of data from a Stripe account into an S3 bucket, and so far I’ve managed to have it working for full extractions (without the incremental part) using the following configuration:

kind: source
spec:
  name: "stripe"
  path: "cloudquery/stripe"
  version: "v2.1.11"
  registry: github
  tables: ["stripe_charges"]
  destinations: ["s3"]
  spec:
    api_key: ${STRIPE_TOKEN}
---
kind: destination
spec:
  name: "s3"
  path: "cloudquery/s3"
  version: "v4.8.3"
  write_mode: "append" 
  spec:
    bucket: "datalake-bucket"
    region: "us-east-1"
    path: "raw_data/stripe_raw_data/{{TABLE}}/dt={{YEAR}}-{{MONTH}}-{{DAY}}/{{UUID}}.parquet"
    format: "parquet"
    athena: true

Now I’m trying to change it to have incremental extractions working, but I’m always getting the error:

Error: failed to sync v3 source stripe: unexpected error from sync client receive: rpc error: code = Unknown desc = failed to sync records: failed to sync unmanaged client: failed to create state client: rpc error: code = Unknown desc = failed to sync records: failed to read: operation error S3: GetObject, https response error StatusCode: 404, RequestID: XXXXXXXXXXXXXXX, HostID: (...), NoSuchKey: 

My suspicion is that an S3 destination cannot be used to store and manage state tables, but given that I can’t find documentation about this detail, I’m not sure how to proceed to have the incremental extractions working. Can anyone help?

My current configuration is as follows:

kind: source
spec:
  name: "stripe"
  (...)
  backend_options:
    table_name: "cq_state_stripe"
    connection: "@@plugins.s3.connection"
---
kind: destination
spec:
  (...)
  spec:
    bucket: "datalake-bucket"
    region: "us-east-1"
    path: "raw_data/stripe_raw_data/{{TABLE}}/dt={{YEAR}}-{{MONTH}}-{{DAY}}/{{YEAR}}-{{MONTH}}-{{DAY}}.parquet"
    format: "parquet"
    athena: true
    no_rotate: true

Hi @immune-sculpin,

For incremental syncs to work, you’ll need a destination that supports overwrite mode. You can find more information on this in the CloudQuery documentation.

Keep in mind that S3 only supports append mode, as noted in the CloudQuery S3 plugin documentation.

You might want to consider using Postgres as your destination.

:thinking: It makes sense! The problem with Postgres is that I would need to spin up an RDS just to save that state, which is kind of overkill.

Currently, in our stack, we are only using S3 with Glue and Athena to store and query data. Is there some other destination that I could use that would fit into this stack?

Something like sqlite, having the actual sqlite database persisted in an S3 bucket?

sqlite supports overwrite, so it should work. Please let me know if it doesn’t.
You might need to download the DB file from S3 before the sync, though.