Duplicate records appearing in daily CloudQuery syncs with no log visibility

selected-bonefish · June 7, 2024, 7:13pm

Hi CQ team,

I am running syncs once per day, but I do see duplicate records for a particular day. Sync starts every day at 12 AM and runs pretty fast, completing in about 20 minutes. Currently, I don’t have visibility into logs since the pods that I am running sync on are terminated. Could this be due to built-in retries, or am I missing something here?

ben · June 7, 2024, 7:53pm

Interesting, can you share more details about your setup including information like is this running on a VM or a container? What is handling the scheduling? Can you share a redacted version of the config?

Also, just so you know, you can use the --console-log flag with the --log-format json so that all of the logs will be outputted to the console, and if you are using something like ECS, the logs will be available in CloudWatch Logs.

selected-bonefish · June 7, 2024, 8:57pm

here is the config:

kind: source
spec:
  name: "aws-${REGION}"
  registry: local
  path: /app/plugins/aws
  tables:
    - aws_dynamodb_tables
    - aws_rds_instances
    - aws_rds_clusters
    - aws_rds_reserved_instances
    - aws_secretsmanager*
  destinations: ["postgresql"]
  spec:
    concurrency: 100
    initialization_concurrency: 4
    aws_debug: false
    regions:
      - ${REGION}
---
kind: destination
spec:
  name: postgresql
  registry: local
  path: /app/plugins/postgresql
  write_mode: append
  spec:
    connection_string: ${PG_CONNECTION_STR}

I run the sync using:

/app/cloudquery sync <config_file> --log-console --no-log-file --log-format json --log-level debug

The deployment is done using a helm chart.
schedule: 0 0 * * *

ben · June 7, 2024, 10:13pm

CloudQuery itself cannot retry if the entire sync fails, but the Kubernetes scheduler can possibly have a retry mechanism, though I am not sure.

Topic		Replies	Views
Need help with cloudquery log format for clear sync process indicators CloudQuery Plugins	2	5	November 7, 2023
Handling duplications in CloudQuery with multiple containers CloudQuery Plugins	5	7	February 26, 2024
Unable to run cloudquery syncs in parallel seeking configuration guidance CloudQuery Plugins	4	20	November 19, 2024
CloudQuery sync time for AWS data in PG database on EKS CloudQuery Plugins	24	14	October 23, 2023
Questions about AWS event-based sync in CloudQuery CloudQuery Plugins	6	18	December 13, 2023

Duplicate records appearing in daily CloudQuery syncs with no log visibility

Related topics