Broken AWS link in CloudQuery guide affecting access to resources

The AWS link in this guide seems to be broken: CloudQuery Blog on Open Source CSPM.

Hi :wave:, it seems to work for me. What do you see broken on your end?

We are fixing the broken link to the AWS Compliance Transformation. If you notice any others, please let us know so we can fix them as well!

Yeah, sorry, it was late, so my “AWS link was broken” in an AWS guide wasn’t much help lol. It was this link:

https://hub.cloudquery.io/addons/transformation/cloudquery/aws-compliance-premium/

but it is now working.
I’ve been following a bunch of these guides, and while this was the first one with something broken, a few of the guides are like a cock tease that get you 90% of the way there, but if you aren’t a seasoned developer, drop you flat with no known next steps. I assume this is on purpose for some reason, but if not and you want more feedback on guides, let me know. If you intend that to be the way, then I will pass over.
Main example, the Go Source Plugin guide.

I don’t think it’s for a reason, probably some challenges that we are also dependent on external tools such as dbt or other transformation tools, which means some more learning curve is needed. But yes, definitely, please share where the rough edges are on the documentation?

I understand external dependency issues with guides and such, so I have ignored those. These are ones related to CloudQuery strictly. Like the Go scaffold that’s currently out is ahead of the documentation for the Go Source Plugin guide, it seems. I suck with Go and am trying to learn, since it’s the only supported source/destination option right now I went with. But once you get past the version differences and figure that out, the end of the guide assumes you know enough that the client.Client section needs no explanation, but that’s where I got stuck.

FYI, this tool has me like a crackhead thinking of all the ways it can be used and what I could possibly automate, so I have been using it every day for a couple of weeks now. Mainly, I’ve been trying to use Kestra and CloudQuery as a package setup to automate/update our internal documentation, so I will try to document what works and what doesn’t as I go.

Creating 1100+ tables in a DB off of an Azure & AWS sync was kind of fun to see, lol. That said, I have found performance limitations with Postgres and certain vendors. For example, Supabase can only handle about 7 entries per second, Azure flexible Postgres can hit about 20-30/s, but an on-prem Postgres setup hits 200+ entries a second, so there’s a drastic export time difference.

Would this be mainly due to network/latency, you think? I’m wondering if I tried from an Azure VM if the difference would be noticeable or if it’s a limitation of their services.

Can you give a bit more details on what you mean by performance limitations? What are those performance hits and in which scenarios?

Yes, so when you are doing a sync, the resource/hr as it states there usually shows resources/seconds. However, I needed a quick screenshot, so that amount there, which I assume is entries into the DB, is what has limitations on different platforms. Does that make more sense?

I’m not sure it’s the DB; it can be memory or concurrency options depending on your machine and on how many accounts or how big your environment is. You can find more information about this here.

If possible, I suggest doing a call next week (anything starting Tuesday should work). You can schedule it here. This will help us understand the requirements better, and we can loop in our sales engineer for support so you can get CloudQuery up and running faster.

The tests were all done from the same laptop, with the only difference being the location of the Postgres DB. But a meeting wouldn’t hurt. Please send an invite to adam.witt@iaawg.com for any time Tuesday.

It can be network latency. In production, it is usually better to have CloudQuery and the database be close to each other.
Sounds great! Which timezone are you at?

So, lmao, once you get your setup tuned and cranking, MS slaps you and says stop:

RESPONSE 429: 429 Too Many Requests
ERROR CODE: TooManyRequests
--------------------------------------------------------------------------------
{
  "error": {
    "code": "TooManyRequests",
    "message": "The request is being throttled as the limit has been reached for operation type - Read_ObservationWindow_00:05:00. For more information, see - https://aka.ms/srpthrottlinglimits"
}

That’s an hour…

GIF

Needless to say, if you do a * for the tables at the tenant level, and you own 15 subscriptions, GetRekt trying to do it all at once lol.

I’ll add also @ben to the thread, but I think we can help with optimizing some of that. We will also weigh in on tomorrow’s meeting.