Issues with awspricing plugin causing out of memory error

Hi, I’m trying to use the awspricing plugin and I’m getting the below issues:

Error:

fatal error: runtime: out of memory
Error: failed to sync v3 source awspricing: unexpected error from sync client receive: rpc error: code = Unavailable desc = error reading from server: EOF
2024-07-29T15:18:46Z ERR exiting with error error="failed to sync v3 source awspricing: unexpected error from sync client receive: rpc error: code = Unavailable desc = error reading from server: EOF" invocation-id=xxxxxxxxxxxxxxxxxxxxxxx module=cli

Can anyone let me know how to resolve this issue?

Hey @funny-whale,

The issue is that the price files are pretty big in size and what happens is the sync runs out of memory.

Could you let us know how much memory the sync process has at its disposal?

Hey @stefan, can you please let me know how to check how much memory the sync process has at its disposal?

We do not have a way to check that built it; it mostly depends on how you are running the sync. It usually tends to fill all the memory available on the system.

Are you running it locally using the CLI, or are you using Kubernetes/Docker containers, a VM, etc?

I’m using the Cloud9 instance.

I think those are based on t2.micros with a limit of 1GB of memory and no swap, that’s why awspricing is crashing as it’s hitting a hard memory limit.

We are already tracking an issue for this and have recently started looking into possible optimizations, but as it is right now, the plugin won’t work on that setup as it requires additional memory. #15017

What you can do is try running on an instance with more memory and ideally swap enabled (so if it still bursts the limit, swap will take the weight off). Let us know if you’re able to, we’d love to get some feedback out of that. What we will do is prioritize the investigation of these improvements for awspricing. You can also subscribe for notifications on the issue above.

@stefan, is this an issue in every version of the awspricing plugin?
If not, can you let me know which version is better so that we won’t run into this issue?

It’s most probably happening to all versions because of the size of the AWS Pricing files.

What you can also try is reducing concurrency to a very low number like 1-5, but on a small instance like that one I cannot guarantee that’ll work.

Here’s the spec for that.

@stefan But my Cloud9 instance type is
m5.large (8 GiB RAM + 2 vCPU)

Ah, gotcha, I did not know that and assumed it was the free one.
Would you mind still reducing the concurrency as I mentioned above?

Yeah, I’ve reduced the services. But I’ve some doubts in the data we got.

@stefan, I am running this awspricing for one account for RDS, ElasticCache, and EC2 services in the us-east-1 region. How can I check the costs that are incurred for RDS, as well as the network performance and all?

Can you help me understand how to check this data and the total pricing for RDS or individual pricing for each RDS instance?

So you mean the sync worked, you got the data and now need help checking it?

Yeah
source.yml

kind: source
spec:
  name: "awspricing_xxxxxxxxxx"
  path: "cloudquery/awspricing"
  registry: "cloudquery"
  version: "v4.3.1"
  tables: ["*"]
  destinations:
    - "postgresql"
  spec:
    region_codes: 
      - us-east-1
    offer_codes:
      - AmazonEC2
      - AmazonRDS
      - AmazonElastiCache

destination.yml

kind: destination
spec:
  name: postgresql
  path: "cloudquery/postgresql"
  version: "v8.2.5"
  write_mode: "overwrite-delete-stale"
  migrate_mode: "forced"
  spec:
    connection_string: "xxxxxxxx"

@stefan can you help me with this?

Hi @funny-whale :wave:

We have this blog about exploring the AWS Pricing API: Exploring AWS Pricing API.

It should give you a place to start exploring the data. If you have any questions after that, let me know. I can help with queries to fetch specific data that interests you!

Hi @jonathan,

I’ve read these docs and now I understand the relationship between these two tables. As suggested in the above docs, can we use the command below to create a new table in CloudQuery directly?

WITH expanded_price_dimensions AS (
  SELECT
    st.sku,
    st.type,
    st.effective_date,
    jsonb_array_elements(st.price_dimensions) AS price_dimension
  FROM
    awspricing_service_terms AS st
)
SELECT
    sp.sku,
    sp.product_family,
    sp.attributes->>'regionCode' AS region,
    sp.attributes->>'instanceType' AS instance_type,
    epd.effective_date,
    epd.price_dimension->'pricePerUnit'->>'USD' AS price_per_unit_usd,
    epd.price_dimension->>'description' AS price_description,
    epd.price_dimension->>'unit' AS unit
FROM
    awspricing_service_products AS sp
JOIN
    expanded_price_dimensions AS epd
ON
    sp.sku = epd.sku
WHERE
    sp.attributes->>'servicecode' = 'AmazonEC2'
    AND epd.type = 'OnDemand';

Like, can the output of this command be stored in another table named pricing_options?

That’s what transformations and views are for.

In this specific example, you’re searching for EC2 instance pricing, which makes sense for this service type, but would not make sense for the other ones.

We currently do not have a transformation profile for the AWS Pricing plugin, but feel free to create a feature request for that on our CloudQuery repository and we could consider implementing it. CloudQuery Repository

Alternatively, you could:

  • Create your own transformation from CloudQuery Cloud, in the Developers > Addons section, or even
  • Manually create a SQL view for that specific use case.

Hi @stefan @ben @erez @mariano,

I’m getting the below error while using the awspricing plugin.

Error:

failed to sync v3 source awspricing: unexpected error from sync client receive: rpc error: code = Unavailable desc = error reading from server: EOF

source.yml

kind: source
spec:
  name: "awspricing"
  path: "cloudquery/awspricing"
  version: "v4.3.1"
  tables: ["*"]
  destinations:
    - "postgresql"
  spec:
    offer_codes:
      - AmazonEC2
      - AmazonRDS
      - AmazonElastiCache

postgresql.yml

kind: destination
spec:
  name: postgresql
  path: "cloudquery/postgresql"
  version: "v8.3.1"
  write_mode: "overwrite-delete-stale"
  migrate_mode: "forced"
  spec:
    connection_string: "xxxxxxxxxxxx"

I’ve made sure that I’ve logged in to CloudQuery using cloudquery_api_key.

Can you please help me with this issue?

Hey :wave: Could I confirm which version of the CLI you are using? Does it happen with the newest version?

I’m using version 6.3.0 of the CloudQuery CLI.

We’ve recently released 6.4.1, and the previous version has a few bug fixes. Could you try with the latest version?

See Release Notes