Need assistance with CloudQuery data training issue

@crucial-mallard @optimum-racer If you haven’t already, could you please report this to AWS as well, as this might not relate to CloudQuery directly.

We’ll also report it to AWS and continue investigating from our side.

Sure. I’ll respond here once our support gets back to us.

@crucial-mallard @optimum-racer could you share the ARN of the GuardDuty finding with me via DM? I will relay this information to AWS support.
An update from our side:

  1. We contacted AWS support. They are asking for the account ID associated with the GuardDuty finding in order to assist further.
  2. We are attempting to reproduce the issue. We created an EC2 instance based on Amazon Linux with a read-only policy role attached and running CLI v4.4.0 with the following spec:
kind: source
spec:
  name: aws
  path: cloudquery/aws
  registry: cloudquery
  version: "v23.4.0"
  tables: ["*"]
  destinations: ["sqlite"]
  spec:
    concurrency: 50
---
kind: destination
spec:
  name: sqlite
  path: cloudquery/sqlite
  registry: cloudquery
  version: "v2.4.20"
  spec:
    connection_string: ./db.sql

Is there anything else you can think of that might be relevant?

That looks pretty close to what we had configured.
I can provide you the account ID in a DM.

Great, thank you. We also have an idea for how to narrow this down. If you have access to CloudTrail logs for the instance that was running CloudQuery, I can send you some more detailed instructions shortly. Essentially, if you could filter for that instance ID and then look in the Source IP Address column for an IP that differs from the instance’s one, that would tell us what API calls were made and help us narrow it down.

I did a dump of all CloudTrail activity associated with our CloudQuery IAM role, but haven’t had time to comb through it yet. If there’s something I should look for in there, let me know.

Can you try and look for different values of the source IP? Ideally, there should be one, but my guess is that there will be some that’s from a different IP, and that’s likely to be the other account. If we can get some details about those calls, that would really help.