CloudQuery table name prefixing and lowercase formatting options

Hi CQ team,

I am stuck after not finding the topic in the docs, so I thought I’d pop in here to see if I could get an answer.

kind: source
spec:
  name: mysql
  path: cloudquery/mysql
  registry: cloudquery
  version: "v4.1.2"
  tables:
    - Table
  destinations: ["s3"]
  spec:
    connection_string: "obfuscated"

What are my options to, in this example, prefix the table name and at the same time lower case it for consistency? I would want to be placed in S3 as prefix_table.

Hey :wave: The S3 destination allows you to set the path where the tables will be saved, but it doesn’t allow you to modify the table name (you can only interpolate it via {{Table}} in the path part of the spec, e.g.:

path: "path/to/files/{{TABLE}}/{{UUID}}.{{FORMAT}}"

However, you can use the new transformer plugin in order to add a prefix to the table names. You can’t lowercase them at the moment, unfortunately.

Here are the docs for it:

CloudQuery Transformer Plugin Documentation

And here’s a blog post on transformers:

Introducing Transformer Plugins

Let me know if you need further help to get it to work, or if something I said didn’t make sense.

Thanks @mariano. I figured as much, but was looking for a way to reuse the same S3 file. For instance, the AWS plugin already prefixes all tables with “aws_” without setting a prefix in the S3 destination plugin.

No “table_prefix” option in the source plugin config? Okay, let me have a look at the transformer plugin. Do I need to be on a certain version of the destination plugin for it to work?

This is my destination config:

kind: destination
spec:
  name: "s3"
  path: "cloudquery/s3"
  registry: "cloudquery"
  version: "v7.4.2"
  write_mode: "append"
  transformers:
    - basic
  spec:
    bucket: "bucket"
    region: "eu-west-1"
    path: ".stage/{{UUID}}.{{FORMAT}}"
    format: "parquet"
    format_spec:
      # CSV-specific parameters:
      # delimiter: ","
      # skip_header: false
    athena: true
    batch_size: 100000
    batch_size_bytes: 52428800

This is my transformer:

kind: transformer
spec:
  name: "basic"
  registry: cloudquery
  path: "cloudquery/basic"
  version: "v1.0.0"
  spec:
    transformations:
      - kind: change_table_names
        tables: ["*"]
        new_table_name_template: "prefix_{{.OldName}}"

I execute the sync and get this error message:

cloudquery sync source-mysql-kapp-ec1p.yml destination-s3-kapp.yml

Loading spec(s) from source-mysql-kapp-ec1p.yml, destination-s3-kapp.yml
Error: failed to load spec(s) from source-mysql-kapp-ec1p.yml, destination-s3-kapp.yml. Error: failed to unmarshal file destination-s3-kapp.yml: failed to decode spec: json: unknown field "transformers"

Removing only the transformer config from the destination file and the sync works.

Hey :wave: Apologies for not mentioning that, you’re absolutely right. Transformers support was added in 6.3.0 (Release Notes), but I’d encourage you to upgrade to the latest version which is 6.4.1 (Release Notes).

Your config looks correct.

@mariano thank you for sharing. I must be missing something somewhere. Do you mind having a look?

xxx:~/cloudquery/test$ cloudquery --version
cloudquery version 6.4.1


xxx:~/cloudquery/test$ cat source.yml
kind: source
spec:
  name: "mysql"
  path: "cloudquery/mysql"
  registry: "cloudquery"
  version: "v4.1.2"
  tables:
    - Table
  destinations: ["s3"]
  spec:
    connection_string: "xxx"


xxx:~/cloudquery/test$ cat destination.yml
kind: destination
spec:
  name: "s3"
  path: "cloudquery/s3"
  registry: "cloudquery"
  version: "v7.4.2"
  write_mode: "append"
  transformers:
    - "basic"
  spec:
    bucket: "xxx"
    region: "eu-west-1"
    path: ".stage/{{UUID}}.{{FORMAT}}"
    format: "parquet"
    format_spec:
      # CSV-specific parameters:
      # delimiter: ","
      # skip_header: false
    athena: true
    batch_size: 100000
    batch_size_bytes: 52428800


xxx:~/cloudquery/test$ cat transformer.yml
kind: transformer
spec:
  name: "basic"
  registry: "cloudquery"
  path: "cloudquery/basic"
  version: "v1.0.0"
  spec:
    transformations:
      - kind: change_table_names
        tables: ["*"]
        new_table_name_template: "prefix_{{.OldName}}"


xxx:~/cloudquery/test$ cloudquery sync source.yml destination.yml
Loading spec(s) from source.yml, destination.yml
Error: failed to load spec(s) from source.yml, destination.yml. Error: destination s3 references unknown transformer basic

Your config is correct. You just didn’t include the YAML in the sync command you ran. Btw, most people just put the three YAML snippets in the same file to make it easier; you just need to add three dashes before the start of each one (---).

I’m sorry, but I don’t understand what you mean. If the config is correct, why am I getting the error?

Error: destination s3 references unknown transformer basic
xxx:~/cloudquery/test$ cloudquery sync source.yml destination.yml

Put transformer.yaml there

cloudquery sync source.yml destination.yml transformer.yml

Gotcha, I was under the impression the source destination format remains and it is the destination YAML that is configured and loads the transformer. This makes a bunch of sense now. Thank you! :roll_eyes: :smile:

cloudquery sync source.yml destination.yml transformer.yml
Loading spec(s) from source.yml, destination.yml, transformer.yml
Error: failed to load spec(s) from source.yml, destination.yml, transformer.yml. Error: failed to strip yaml comments in file transformer.yml (section 1): yaml: line 10: did not find expected key

At least it’s something new now.

I reproduced your issue. I think you just have a rogue pipe at the end of the file here:

        new_table_name_template: "prefix_{{.OldName}}"|

Remove that | at the end.

Thanks

Thanks so much. It’s always the small things.

Just a question this time: If I combine all of my sources, destinations, and transformers into one file, am I able to specify that only certain sources use transformers?

Having separate files or one file is just a cosmetic difference. The functionality doesn’t change.

Transformers are specified in their destinations, not on sources, so if you don’t want some destination having a transformation, just don’t add the transformer in the destination’s spec.

okay gotcha. I understand

You could also have two separate destination specs that route to the same underlying database. One with a transformer entry, and the other without it.

Yes, that’s how I understood it. Cheers, and thanks for all the help!