Use Workload Identity in GCP

This documentation is for the CARTO Self-Hosted Legacy Version. Use only if you've installed this specific version. Explore our latest documentation for updated features.

What is Workload Identity?

Applications running on Google Kubernetes Engine might need access to Google Cloud APIs such as Compute Engine API, BigQuery API, or Storage APIs.

Workload Identity allows a Kubernetes service account in your GKE cluster to act as an IAM service account. Pods that use the configured Kubernetes service account automatically authenticate as the IAM service account when accessing Google Cloud APIs. Using Workload Identity allows you to assign distinct, fine-grained identities and authorization for each application in your cluster.

Enabling Workload Identity in your Self-Hosted installation is just available for the orchestrated container deployment of CARTO.

How does Workload Identity work?

When you enable Workload Identity on a cluster, GKE automatically creates a fixed workload identity pool for the cluster's Google Cloud project. A workload identity pool allows IAM to understand and trust Kubernetes service account credentials. GKE uses this pool for all clusters in the project that use Workload Identity. The workload identity pool has the following format:

PROJECT_ID.svc.id.goog

When you configure a Kubernetes service account in a namespace to use Workload Identity, IAM authenticates the credentials using the following member name:

serviceAccount:PROJECT_ID.svc.id.goog[KUBERNETES_NAMESPACE/KUBERNETES_SERVICE_ACCOUNT]

In this member name:

  • PROJECT_ID: your Google Cloud project ID.

  • KUBERNETES_NAMESPACE: the namespace of the Kubernetes service account.

  • KUBERNETES_SERVICE_ACCOUNT: the name of the Kubernetes service account making the request.

The process of configuring Workload Identity includes using an IAM policy binding to bind the Kubernetes service account member name to an IAM service account that has the permissions your workloads need. Any Google Cloud API calls from workloads that use this Kubernetes service account are authenticated as the bound IAM service account.

Configure CARTO deployment to use Workload Identity

In order to enable Workload Identity in your CARTO Self-Hosted installation, you'll have to follow these steps:

  1. Create an IAM service account for your application, or use an existing IAM service account instead.

gcloud iam service-accounts create {IAM_SERVICE_ACCOUNT_NAME} \
    --project={PROJECT_ID}
  • IAM_SERVICE_ACCOUNT_NAME: name of the new service account.

  • PROJECT_ID: ID of the project where the GKE cluster is deployed.

Service Account needs roles/iam.serviceAccountTokenCreator role to sign URLs, you can grant it with this command:

gcloud iam service-accounts add-iam-policy-binding \
  {IAM_SERVICE_ACCOUNT_EMAIL} \
  --member=serviceAccount:{IAM_SERVICE_ACCOUNT_EMAIL} \
  --role=roles/iam.serviceAccountTokenCreator
  • IAM_SERVICE_ACCOUNT_NAME: name of the new service account used in previous step

  • IAM_SERVICE_ACCOUNT_EMAIL: email of the service account generated with the previous command.

  1. Send email to CARTO Support Team support@carto.com with Service Account Contact CARTO Support to let us know the Service Account you want to use for Workload Identity. We will ensure that your Service Account is granted the required roles to run CARTO Self-Hosted.

IMPORTANT: You cannot change the Service Account without contacting support.

  1. Add the following lines to your customizations.yaml file:

commonBackendServiceAccount:
  enableGCPWorkloadIdentity: true
  annotations:
    iam.gke.io/gcp-service-account: "{IAM_SERVICE_ACCOUNT_EMAIL}"
  • IAM_SERVICE_ACCOUNT_EMAIL: email of the service account generated in the first step.

The chart gives the possibility of disabling commonBackendServiceAccount account creation with commonBackendServiceAccount.create: false but this is not compatible with enableGCPWorkloadIdentity: true

  1. Allow the Kubernetes service account that is going to be created in your GKE cluster to impersonate the IAM service account by adding an IAM policy binding between the two service accounts. This binding allows the Kubernetes service account to act as the IAM service account.

gcloud iam service-accounts add-iam-policy-binding {IAM_SERVICE_ACCOUNT_EMAIL} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:{PROJECT_ID}.svc.id.goog[{KUBERNETES_NAMESPACE}/{KUBERNETES_SERVICE_ACCOUNT}]"
  • IAM_SERVICE_ACCOUNT_EMAIL: email of the service account generated in the first step.

  • PROJECT_ID: ID of the project where the GKE cluster is deployed.

  • KUBERNETES_NAMESPACE: namespace where CARTO application is deployed.

  • KUBERNETES_SERVICE_ACCOUNT: name of the kubernetes service account used by CARTO application. Default value is carto-common-backend.

You can find the gcloud command with the KUBERNETES_NAMESPACE and KUBERNETES_SERVICE_ACCOUNT values in the helm output notes once you execute the installation process.

Create a BigQuery connection managed using Workload Identity

CARTO Self-Hosted running on a GKE cluster can take advantage of GKE Workload Identity feature to create a connection between the CARTO Self-Hosted platform and BigQuery without any user action.

Configuration

  1. Setup GKE Workload Identity for CARTO Self-Hosted following the documentation.

  2. Grant your Workload Identity service account with BigQuery required permissions to your data warehouse project.

  3. Add the following environment variables in your customizations.yaml file:

workspaceApi:
  extraEnvVars:
    - name: WORKSPACE_SYNC_DATA_ENABLED
      value: "true"
    - name: WORKSPACE_WORKLOAD_IDENTITY_WORKFLOWS_TEMP
      value: {WORKFLOWS_TEMP_LOCATION}
    - name: WORKSPACE_WORKLOAD_IDENTITY_BILLING_PROJECT
      value: {BILLING_PROJECT_ID}
    - name: WORKSPACE_WORKLOAD_IDENTITY_SERVICE_ACCOUNT_EMAIL
      value: {WORKLOAD_IDENTITY_SA_EMAIL}
    - name: WORKSPACE_WORKLOAD_IDENTITY_CONNECTION_OWNER_ID
      value: {CARTO_OWNER_ID}
workspaceSubscriber:
  extraEnvVars:
    - name: WORKSPACE_SYNC_DATA_ENABLED
      value: "true"
    - name: WORKSPACE_WORKLOAD_IDENTITY_WORKFLOWS_TEMP
      value: {WORKFLOWS_TEMP_LOCATION}
    - name: WORKSPACE_WORKLOAD_IDENTITY_BILLING_PROJECT
      value: {BILLING_PROJECT_ID}
    - name: WORKSPACE_WORKLOAD_IDENTITY_SERVICE_ACCOUNT_EMAIL
      value: {WORKLOAD_IDENTITY_SA_EMAIL}
    - name: WORKSPACE_WORKLOAD_IDENTITY_CONNECTION_OWNER_ID
      value: {CARTO_OWNER_ID}
  • WORKFLOWS_TEMP_LOCATION: BigQuery dataset ID used for storing temporary tables (i.e. my_gcp_project.my_dataset).

  • BILLING_PROJECT_ID: GCP project to be charged with the BigQuery costs.

  • WORKLOAD_IDENTITY_SA_EMAIL: Service account email configured for Workload Identity.

  • CARTO_OWNER_ID: ID of the CARTO user who will be the owner of the connection (i.e. "auth0|3idsj230990sj4wsddd10"). This can be obtained by running the following curl command:

    curl -s 'https://accounts.app.carto.com/users/me' \
      -H 'Authorization: Bearer <your_carto_jwt_token>' \
      | jq '.user_id'
  1. Follow the previous command output and grant the service account the following role:

gcloud iam service-accounts add-iam-policy-binding \
{WORKLOAD_IDENTITY_SA_EMAIL} \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:{PROJECT_ID}.svc.id.goog[{KUBERNETES_NAMESPACE}/carto-common-backend]" \
--project {PROJECT_ID}
  • WORKLOAD_IDENTITY_SA_EMAIL: Service account email configured for Workload Identity.

  • PROJECT_ID: ID of the project where the GKE cluster is deployed.

  • KUBERNETES_NAMESPACE: namespace where CARTO application is deployed.

Once you've applied the changes performed in your customizations.yaml file, your CARTO deployment will automatically create a new BigQuery connection using Workload Identity owned by the CARTO user specified in the deployment configuration!

Last updated