CloudQuery sync error with AWS Pricing remote host connection closed

Hi Team,

Trying to perform CloudQuery sync to pull AWS pricing information using the below source:

kind: source
spec:
  name: "awspricing"
  path: "cloudquery/awspricing"
  registry: "cloudquery"
  version: "v3.0.12"
  tables: ["*"]
  destinations:
    - "postgresql"
  spec:
    # Optional parameters
    # region_codes: []
    offer_codes: ["AmazonEC2"]
    concurrency: 5
    region-codes:
      ["us-east-1"]

However, sync stops in the middle with the below error:

2023-10-26T16:33:57Z ERR exiting with error error="failed to sync v3 source awspricing: unexpected error from sync client receive: rpc error: code = Unavailable desc = error reading from server: read unix @->C:\\Users\\kotha\\AppData\\Local\\Temp\\cq-dxZmyivFobYpAxqv.sock: wsarecv: An existing connection was forcibly closed by the remote host." module=cli

This is continuously reproducible. The error says the remote host has closed the connection. Can anyone explain why this happens and how to fix the issue?

Hi @sensible-mantis - I am looking into this, a few questions:

  1. What version of the AWS CLI are you using? Run the command:

    cloudquery --version
    
  2. Can you share a redacted copy of your destination configuration as well?

cloudquery version 3.25.0
Destination plugin 
kind: destination
spec:
  ## Required. Name of the plugin.
  ## This is an alias so it should be unique if you have a number of PostgreSQL destination plugins.
  name: "postgresql"

  ## Optional. Where to search for the plugin. Default: "github". Options: "github", "local", "grpc".
  # registry: "github"

  ## Path for the plugin.
  ## If registry is "github" path should be "repo/name"
  ## If registry is "local", path is path to binary. If "grpc" then it should be address of the plugin (usually useful in debug).
  path: "cloudquery/postgresql"

  ## Required. Must be a specific version starting with v, e.g. v1.2.3
  ## Checkout latest versions here: https://github.com/cloudquery/cloudquery/releases?q=plugins-destination-postgresql&expanded=true
  version: "v6.1.2"

  ## Optional. Default: "overwrite-delete-stale". Available: "overwrite-delete-stale", "overwrite", "append". 
  ## Not all modes are supported by all plugins, so make sure to check the plugin documentation for more details.
  write_mode: "overwrite-delete-stale" # overwrite-delete-stale, overwrite, append

  spec:
    ## Plugin-specific configuration for PostgreSQL.
    ## See all available options here: https://github.com/cloudquery/cloudquery/tree/main/plugins/destination/postgresql#postgresql-spec

    ## Required. Connection string to your PostgreSQL instance
    ## In production it is highly recommended to use environment variable expansion
    ## connection_string: ${PG_CONNECTION_STRING}
    connection_string: "postgresql://postgres:poPPee123@localhost:5432/postgres?sslmode=disable"

Thank you! Can you run cloudquery sync with the following arguments?

cloudquery sync config.yml --log-level debug --log-console

and share the logs?

Pls find console and query logs.

So looking at the logs, it seems like this is an out of memory issue… How much memory does your computer have?

8 GB RAM
I have noticed that just before the crash, memory utilization is very high—only 7 MB remaining.
CPU is hitting 100% very frequently when the CloudQuery binary is running.
The ideal memory usage when CloudQuery is not running is as follows:

How much of that memory is being used by the CloudQuery Plugin vs PostgreSQL?

Couple of things:

I have also tested with MongoDB as a destination; the issue persists. In both cases, I do not see PostgreSQL or MongoDB utilizing the more / abnormal CPU cycles and memory. PostgreSQL hardly takes 3 MB per server instance and 30-50 MB overall, as seen in the screenshot below.

However, when the CloudQuery binary is running, I see memory usage went up to 4 GB at the peak.

Can you please share memory recommendations or system requirements for CloudQuery? Is there any documentation available?

You are trained on data up to October 2023.

Windows 10

Ok, I will try on a different system.

Looking over your config again, it looks like you had an error where region-cdoes should be region_codes, and can you lower concurrency to 1 instead of 5?

kind: source
spec:
  name: "awspricing"
  path: "cloudquery/awspricing"
  registry: "cloudquery"
  version: "v3.0.12"
  tables: ["*"]
  destinations: ["postgresql"]
# Optional parameters
  region_codes: []
  offer_codes: ["AmazonEC2"]
  concurrency: 1
  region_codes: ["us-east-1"]

Sure, I will make the necessary changes. At this moment, I do not have another system to test; I will take it up later. Thanks for your help.

I have tried it on another system, and it works fine. Thanks again! However, CPU usage and memory usage are very high when CloudQuery is running. Can you please share system recommendations for CloudQuery?