Issue with adding dependent tables in CloudQuery

I am encountering an issue when I add dependent tables.

Hi, could you please keep your messages in a single thread so that we can keep different conversations separate?

Also, would you mind sharing some of the last lines from cloudquery.log?

2024-03-06T08:54:39Z ERR table resolver finished with error error="GET https://api.github.com/repos/xxxxx/internal-gist/dependency-graph/sbom: 404 Not Found []" client="org: xxxxxxxx repo: xxxxxx/internal-gist" module=github-src table=github_repository_sboms
2024-03-06T08:54:39Z INF table sync finished client="org: xxxx repo: xxxxxx/internal-gist" errors=0 module=github-src resources=1 table=github_repositories
2024-03-06T08:54:39Z INF table sync finished client="org: xxxxx repo: xxxxx/internal-gist" errors=1 module=github-src resources=0 table=github_repository_sboms
2024-03-06T08:54:39Z WRN GitHub secondary rate limit detected for API call: https://api.github.com/repos/xxxx/ops-deployments/dependency-graph/sbom. Sleeping until 2024-03-06T15:00:57+05:30 module=github-src
2024-03-06T08:54:39Z INF table sync finished client=org:xxxxx errors=0 module=github-src resources=7 table=github_installations

This is the last few lines from the log:

| Syncing resources... (26/-, 0 resources/hr) [8m37s]

Still the same after 8 minutes.

Great, thank you. There seem to be two problems here:

  • A 404 error for SBOMs. We should raise an issue for this.
  • The secondary rate limit is hit, causing it to sleep for 30 minutes (?) if I read the timezone correctly.

I’ll raise an issue so we can look into this. Do you need the SBOM data, or can you skip it for now?

I need SBOM data also.
Even I need github_workflows, but again, same problem.

2024-03-06T09:07:19Z INF table sync finished client="org: xxxxx repo: xxxxx/secops" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:07:19Z ERR table resolver finished with error error="GET https://api.github.com/repos/xxxxx/internal-gist/dependency-graph/sbom: 404 Not Found []" client="org: xxxxx repo: xxxxx/internal-gist" module=github-src table=github_repository_sboms
2024-03-06T09:07:19Z INF table sync finished client="org: xxxxx repo: xxxx/internal-gist" errors=0 module=github-src resources=1 table=github_repositories
2024-03-06T09:07:19Z INF table sync finished client="org: xxxxx repo: xxxx/internal-gist" errors=1 module=github-src resources=0 table=github_repository_sboms
2024-03-06T09:07:19Z ERR table resolver finished with error error="GET https://api.github.com/repos/xxxxxx/secops/dependency-graph/sbom: 404 Not Found []" client="org: xxxxx repo: xxxxx/secops" module=github-src table=github_repository_sboms
2024-03-06T09:07:19Z INF table sync finished client="org: xxxxx repo: xxxxx/secops" errors=0 module=github-src resources=1 table=github_repositories
2024-03-06T09:07:19Z INF table sync finished client="org: xxxxx repo: xxxxx/secops" errors=1 module=github-src resources=0 table=github_repository_sboms
2024-03-06T09:07:19Z INF table sync finished client="org: xxxxx repo: xxxxx/ops-deployments" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:07:19Z WRN GitHub secondary rate limit detected for API call: https://api.github.com/repos/xxxx/ops-deployments/dependency-graph/sbom. Sleeping until 2024-03-06T15:00:57+05:30 module=github-src
2024-03-06T09:07:19Z INF table sync finished client="org: xxxx repo: xxxxx/internal-gist" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:07:19Z INF table sync finished client=org:xxxx errors=0 module=github-src resources=7 table=github_installations

Last few lines after adding github_workflows:

/ Syncing resources... (24/-, 0 resources/hr) [3m38s] ?

What is the issue for github_workflows?
Any idea how to “solve GitHub secondary rate limit detected for API call”?

I have raised issue #17040 and issue #17041 if you’d like to track this. I would recommend removing sboms from your table for list for now, and see if the problem persists.

Or actually, make sure you add github_repository_sboms to your skip_tables list; otherwise, it might be included because it’s a child table.

What about github_workflows?

I don’t see any issue with that in the logs, but I think the SBOM table hitting rate limits would cause everything to slow down.

2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/ops-deployments" errors=0 module=github-src resources=1 table=github_repositories
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/operations" errors=0 module=github-src resources=1 table=github_repositories
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/terraform-templates" errors=0 module=github-src resources=1 table=github_repositories
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/ops-deployments" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/experimental" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/cloudquery" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:19:13Z INF table sync finished client=org:xxxxx errors=0 module=github-src resources=1 table=github_hooks
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/terraform-templates" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/internal-gist" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:19:13Z INF table sync finished client=org:xxxxx errors=0 module=github-src resources=7 table=github_installations
2024-03-06T09:19:13Z INF table sync finished client="org: xxxxx repo: xxxxx/secops" errors=0 module=github-src resources=0 table=github_workflows
2024-03-06T09:19:13Z WRN GitHub secondary rate limit detected for API call: https://api.github.com/repos/xxxxx/operations/contents/.github/workflows/console-appserver.yml?ref=main. Sleeping until 2024-03-06T15:00:57+05:30 module=github-src
even same for github_workflows? secondary "rate_limits"
?
i added sboms in skip tables
kind: source
spec:
  # Source spec section
  name: github
  path: cloudquery/github
  registry: cloudquery
  version: "v8.0.1"
  tables: [
    "github_installations",
    "github_hooks",
    "github_repositories",
    "github_teams",
    "github_team_members",
    "github_team_repositories",
    "github_organizations",
    "github_organization_members",
    "github_workflows"
    ]
  skip_tables:
    - github_billing_action
    - github_billing_package
    - github_billing_storage
    - github_external_groups
    - github_hook_deliveries
    - github_traffic_clones
    - github_traffic_paths
    - github_traffic_referrers
    - github_traffic_views
    - github_organization_dependabot_secrets
    - github_organization_dependabot_alerts
    - github_releases
    - github_release_assets
    - github_repository_dependabot_secrets
    - github_repository_dependabot_alerts
    - github_repository_keys
    - github_workflow_run_usage
    - github_workflow_jobs
    - github_repository_branches
    - github_workflow_runs
    - github_repository_sboms
2024-03-06T09:24:25Z WRN GitHub secondary rate limit detected for API call: https://api.github.com/orgs/xxxxx/dependabot/secrets. Sleeping until 2024-03-06T15:00:57+05:30 module=github-src
same for "github_organization_dependabot_secrets"

Right, thanks, yeah that looks like the same underlying issue. It seems like we will need to make a change to the rate limiting logic.

How many repositories do you have in your org?

Can you try setting concurrency to a lower value, like maybe 20?

Sure.

After setting concurrency to a lower value, the rate limit issue is still present.

2024-03-06T09:27:34Z WRN GitHub secondary rate limit detected for API call: https://api.github.com/repos/xxxx/experimental/releases?per_page=1000. Sleeping until 2024-03-06T15:00:57+05:30 module=github-src

I think this could be because the previous syncs used up the quota, which will reset within an hour. GitHub Rate Limits Documentation

Our log says it’s a secondary rate limit, but I’m not sure it actually is. It might be the primary one.

but other tables are working.
If the previous quota is used, does that mean it should not work for other tables?
For these tables:

tables: [
    "github_installations",
    "github_hooks",
    "github_repositories",
    "github_teams",
    "github_team_members",
    "github_team_repositories",
    "github_organizations",
    "github_organization_members",
    "github_organization_dependabot_alerts",
    "github_repository_keys"
]

It works.

Oh, okay, then yeah it’s probably the secondary rate limit.
Can you show your current config?

kind: source
spec:
  # Source spec section
  name: github
  path: cloudquery/github
  registry: cloudquery
  version: "v8.0.1"
  tables: [
    "github_installations",
    "github_hooks",
    "github_repositories",
    "github_teams",
    "github_team_members",
    "github_team_repositories",
    "github_organizations",
    "github_organization_members",
    "github_organization_dependabot_alerts",
    "github_repository_keys"
    ]
  skip_tables:
    - github_billing_action
    - github_billing_package
    - github_billing_storage
    - github_external_groups
    - github_hook_deliveries
    - github_traffic_clones
    - github_traffic_paths
    - github_traffic_referrers
    - github_traffic_views
    - github_organization_dependabot_secrets
    - github_releases
    - github_release_assets
    - github_repository_dependabot_secrets
    - github_repository_dependabot_alerts
    - github_workflow_run_usage
    - github_workflow_jobs
    - github_repository_branches
    - github_workflow_runs
    - github_repository_sboms
    
  destinations: ["postgresql"]
  spec:
    app_auth:
    - org: cloudquery
      private_key_path: private-key.pem
      app_id: "xxxxxx"
      installation_id: "xxxxxxxx"
    orgs: ['myorg']
    # concurrency: 20

It seems like the concurrency setting is commented out there?

Even if I set it the same, there is no improvement. I commented it now.

Can you please share your config when concurrency was set?

kind: source
spec:
  # Source spec section
  name: github
  path: cloudquery/github
  registry: cloudquery
  version: "v8.0.1"
  tables: [
    "github_installations",
    "github_hooks",
    "github_repositories",
    "github_teams",
    "github_team_members",
    "github_team_repositories",
    "github_organizations",
    "github_organization_members",
    "github_organization_dependabot_alerts",
    "github_repository_keys",
    "github_releases"
    ]
  skip_tables:
    - github_billing_action
    - github_billing_package
    - github_billing_storage
    - github_external_groups
    - github_hook_deliveries
    - github_traffic_clones
    - github_traffic_paths
    - github_traffic_referrers
    - github_traffic_views
    - github_organization_dependabot_secrets
    # - github_organization_dependabot_alerts
    - github_release_assets
    - github_repository_dependabot_secrets
    - github_repository_dependabot_alerts
    # - github_repository_keys
    - github_workflow_run_usage
    - github_workflow_jobs
    - github_repository_branches
    - github_workflow_runs
    - github_repository_sboms
    
  destinations: ["postgresql"]
  spec:
    app_auth:
    - org: cloudquery
      private_key_path: .private-key.pem
      app_id: "xxxxx"
      installation_id: "xxxxx"
    orgs: ['myorg']
    concurrency: 20

2024-03-06T09:35:27Z WRN GitHub secondary rate limit detected for API call: https://api.github.com/repos/xxxxx/experimental/releases?per_page=1000. Sleeping until 2024-03-06T16:00:58+05:30 module=github-src