AWS account attribution by product teams and inspector findings grouping

AWS, Attributing Accounts to Product Teams

I want to attribute the data from AWS such that I could look up/group etc. the rows based on the name of the team that manages the AWS account.

For example, I would like to get the Inspector findings that belong to all accounts that a specific team owns.

This could be done with some sort of relations and transformations, I’m sure, but has anyone done this already?

(More info in the thread)

In your organization, how do you currently connect an AWS account to a team?

We have the owner/team of the account documented in Confluence.

My current attempt:

I created 3 tables in my database:

from sqlalchemy import Column, ForeignKey, Integer, String, Table, PrimaryKeyConstraint
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship

Base = declarative_base()

# Define the association table for the many-to-many relationship
team_accounts = Table('team_accounts', Base.metadata,
    Column('team_id', Integer, ForeignKey('product_teams.team_id')),
    Column('account_id', String(12), ForeignKey('aws_accounts.account_id')),
    PrimaryKeyConstraint('team_id', 'account_id')
)

class AwsAccount(Base):
    __tablename__ = 'aws_accounts'

    account_id = Column(String(12), primary_key=True)  # AWS Account ID
    friendly_name = Column(String(100))  # AWS account alias
    team_id = Column(Integer, ForeignKey('product_teams.team_id'))
    env = Column(String(100))  # Is the account used as Production, Dev, or Test 
    team = relationship("ProductTeam", back_populates="accounts")

class ProductTeam(Base):
    __tablename__ = 'product_teams'

    team_id = Column(Integer, primary_key=True)
    team_name = Column(String(100), unique=True)  # Name of product team
    unit = Column(String(100))  # Business unit
    accounts = relationship("AwsAccount", secondary=team_accounts, back_populates="team")

Then I tried to learn how to use dbt to attribute the findings to teams:

{% set aws_accounts_table = 'aws_accounts' %}
{% set product_teams_table = 'product_teams' %}
{% set findings_table =  'aws_inspector2_findings' %}

SELECT
    a.account_id,
    a.friendly_name AS account_friendly_name,
    t.team_id,
    t.team_name,
    f.arn AS finding_id,
    f.severity,
    f.description,
    f.last_observed_at,
    f.first_observed_at,
    f.type,
    f.exploit_available,
    f.title,
    f.status
FROM
    {{ aws_accounts_table }} a
JOIN
    {{ product_teams_table }} t ON a.team_id = t.team_id
JOIN
    {{ findings_table }} f ON a.account_id = f.aws_account_id

This did technically work, but I feel like there is a better way to do it that doesn’t require me to basically copy each AWS table. I assume there is a way to do this so that I don’t have to write the same dbt query with different column names for each either. Additionally, this doesn’t let me do a query like:

SELECT aws_inspector2_findings FROM product_teams WHERE team_name = 'team1'

I use AWS account tags for this personally, but I realize that’s not for everyone.

Here is the documentation on AWS tagging.

We use that to know where to send fallback notifications on resources under an account without a direct link to a specific team/group.

yeah, i have been trying to get that properly done, but for now i have to do with what i have :sweat_smile:

ha I know that feeling

A few things:

  1. I would switch the order of the joins so that the base table is the findings_table rather than aws_accounts_table:

    FROM
        {{ findings_table }} f
    LEFT JOIN
        {{ aws_accounts_table }} a ON a.team_id = f.aws_account_id
    LEFT JOIN
        {{ product_teams_table }} t ON a.account_id = t.team_id 
    
  2. I would investigate using a view. That way you have a single table where each row has all of the information you are interested in. Depending on how much you are using DBT, you can materialize the query you have and it will appear like a table.

By the way, do you happen to know how I can extract a value from a column that contains JSON?
(The SecurityHub findings severity column contains a JSON string where one of the keys is “Label” that I want to extract into a new column for easier querying.)

You are using Postgres, right?

Postgres has built-in functionality that allows you to extract a value from a JSON object: PostgreSQL JSON Functions

What is your setup for that like? I found out that at least for some of the accounts, the tagging has actually been done already.

Tagging is a rabbit hole that typically requires organizational buy-in and automated tooling to enforce. Many companies have successfully adopted CloudCustodian for something like this.

I’m thinking more in the CQ and dbt, etc. end, as it does seem that the tags are set up, but I previously had no access to them.

This goes over it pretty well from when I was at Zapier, but as @ben hinted, it did very much rely on Cloud-Custodian. https://marcyoung.us/post/kubecon_2021/

It’s not for everyone and scales differently, but that’s how we did it.