Installation in a Google Cloud VPC

This guide will walk you through the process of configuring the CARTO Analytics Toolbox to work within a VCP with a CARTO Self-hosted installation within Google Cloud Platform.

Is your CARTO Self Hosted deployment in a Google Cloud VPC?

When the CARTO platform is self hosted within a Google Cloud VPC, the functions and procedures of the Analytics Toolbox need to be accessed from within the same VPC.

That makes this installation method the only suitable one for this kind of CARTO platform's deployment.

Install CARTO Analytics Toolbox inside your BigQuery project

The first step would be to install the Analytics Toolbox in a BigQuery project of your own.

Once the Analytics Toolbox is installed in your project, use this guide to deploy the AT Gateway in your VPC.

Deploy the infrastructure needed to allow Location Data Services usage

Some functionalities of the CARTO Analytics Toolbox for BigQuery require making external calls from BigQuery to CARTO services. These calls are implemented via BigQuery Remote Functions:

  • AT Gateway: Creation of isolines, geocoding and routing require making calls to CARTO LDS API. Some functions of the Analytics Toolbox require making a request to the CARTO Platform backend (like importing from a URL or the 'Send by Email' component in Workflows) . For this purpose, Cloud Run functions need to be deployed in your VPC.

When installing the Analytics Toolbox manually in your own project, there is some configuration required:

  • Create a BigQuery connection that will allow to call Cloud Run functions from BigQuery.

  • An AT Gateway endpoint inside your VPC.

Architecture overview

To deploy the Analytics Toolbox within a VPC, the CARTO platform needs to deploy some additional infrastructure pieces within your GCP project. In the following diagram, you can check how all these pieces interact with each other:

We'll set up the following pieces inside your project to start using the Analytics Toolbox on your CARTO Self-hosted platform:

  • One BQ connection used to perform requests agains two different Cloud Run services.

  • One subnetwork used to deploy the containers created by the two Cloud Run services that are required.

  • One Cloud Run service needed for BigQuery to interact with the Self-hosted platform.

  • One VPC Serverless Access Connector that will be used by the Cloud Run services to access your VPC.

  • An internal DNS record pointing to the IP address of your CARTO Self-hosted platform.

You just need to follow the following steps to set up the required infrastructure pieces:

All following commands and instructions should be executed from the Cloud Shell in your console or from authenticated gcloudand bqCLI sessions.

1. Configure a BQ connection to enable requests to Cloud Run services

Your BigQuery project will need to make requests to the two Cloud Run services configured in this guide. To configure the BQ connection that allows this usage, you'll need to run the following command:

Create a connection from a command line:

bq mk \
    --connection \
    --project_id={PROJECT_ID} \
    --location={REGION} \
    --connection_type=CLOUD_RESOURCE \
    carto-conn

Replace the following:

  • PROJECT_ID: your Google Cloud project ID

  • REGION: your connection region. US and EU regions are not available, so you'll have to select a more specific GCP region. You can check the list of available regions here

Once the connection has been configured, GCP will automatically create a service account that we'll need to use to grant permissions to access the cloud runs. You can check that the service account has been created correctly by running the following command:

Obtain the Service Account created when configuring a BQ connection:

bq show --format json \ 
    --connection {PROJECT_ID}.{REGION}.carto-conn

Replace the following:

  • PROJECT_ID: your Google Cloud project ID

  • REGION: your connection region

2. Deploy the AT Gateway container in Cloud Run

The BQ connection created in the previous step will have to a Cloud Run service to use the AT. This service is the AT Gateway container, and prior to creating the services we'll need to create a subnetwork for it, as we'll have to use a VPC Access Connector for them:

  • Create subnet for the VPC Access Connector:

gcloud compute networks subnets create vpc-conn-carto \ 
    --network={VPC_NETWORK} \
    --range={SUBNETWORK_IPS_RANGE} \
    --region={REGION} \
    --project={PROJECT_ID} \
    --enable-private-ip-google-access

Replace the following:

  • VPC_NETWORK: the name of the network created in your VPC project

  • SUBNETWORK_IPS_RANGE: the range of IPs that this subnetwork will use

The IPs range selected for the subnetwork must be created using a CIDR /28 block

  • REGION: the same GCP region used when creating the BQ connection in the previous step. This region has to be exactly the same used to create the BQ connection

  • PROJECT_ID: your Google Cloud project ID

Now that the subnet is correctly configured, you'll need to create a Serverless VPC Access connector for the Cloud Run services.

  • Create connector for the Cloud Run services:

gcloud compute networks vpc-access connectors create carto-vpc-access-conn \
    --project={PROJECT_ID} \
    --region={REGION} \
    --subnet=vpc-conn-carto \
    --min-instances=2 \
    --max-instances=5 \
    --machine-type=e2-micro

Replace the following variables:

  • PROJECT_ID: your Google Cloud project ID

  • REGION: the same GCP region used when creating the BQ connection in the previous step

Once the connector has been correctly created, we can proceed with the Cloud Run services deployment. You'll have to execute the following commands:

  • Deploy AT Gateway service

gcloud run deploy carto-at-gateway  \
    --project={PROJECT_ID} \
    --region={REGION} \
    --tag=carto-at-gateway \
    --allow-unauthenticated \
    --vpc-connector=carto-vpc-access-conn \
    --vpc-egress=all-traffic \
    --ingress=internal \
    --port=8080 \
    --set-env-vars=AT_GATEWAY_CLOUD_RUN_REGION={REGION},NODE_TLS_REJECT_UNAUTHORIZED=0 \
    --command=npm \
    --args=run,start:cloud-run \
    --image=gcr.io/carto-onprem-artifacts/at-gateway/cloud-run:latest

The NODE_TLS_REJECT_UNAUTHORIZED environment variable is used to disable the verification of custom TLS certificates in the Self-hosted deployment

Replace the following variables:

  • PROJECT_ID: your Google Cloud project ID

  • REGION: the same GCP region used when creating the BQ connection in the previous step

3. Create DNS entry for CARTO Self-hosted platform

The AT Gateway service will need to access the CARTO Self-hosted LDS API to perform requests to the different LDS providers. As the requests will be handled inside the VPC, it's mandatory to add an internal DNS registry so that the Cloud Run service can reach the CARTO platform APIs.

Firstly, we have to obtain the internal IP address of the CARTO Self-hosted platform. Once the internal IP has been obtained, you can create a DNS zone inside GCP using the following command:

If you already have an internal DNS configured in your GCP project you can skip this step and directly add a new domain pointing to the CARTO platform internal IP address.

gcloud dns managed-zones create carto-io \
    --project={PROJECT_ID} \
    --dns-name={DNS_ZONE_NAME} \
    --description="Internal DNS zone for CARTO selfhosted" \
    --networks={VPC_NETWORK} \
    --visibility=private

Replace the following variables:

  • PROJECT_ID: your Google Cloud project ID

  • DNS_ZONE_NAME: the name that will use your new DNS zone

  • VPC_NETWORK: name of the VPC network created in your GCP project

Then we'll have to create a new registry inside the new DNS zone, configuring a domain that points to CARTO Self-hosted platform's internal IP address:

  1. Start a transaction to add a record in your DNS zone

gcloud dns record-sets transaction start \
    --project={PROJECT_ID} \
    --zone={DNS_ZONE}
  1. Add the new domain to your DNS zone

gcloud dns record-sets transaction add {CARTO_PLATFORM_IP} \
    --project={PROJECT_ID} \
    --name={INTERNAL_DOMAIN} \
    --ttl=300 \
    --type=A \
    --zone={DNS_ZONE}
  1. Execute the transaction to write the new changes in your DNS zone

gcloud dns record-sets transaction execute \
    --project={YOUR_PROJECT_ID} \
    --zone={DNS_ZONE}

Replace the following:

  • PROJECT_ID: your Google Cloud project ID

  • DNS_ZONE: the name of your DNS zone

  • CARTO_PLATFORM_IP: internal IP address of your CARTO Self-hosted deployment

  • INTERNAL_DOMAIN: the internal domain that will be pointing to your CARTO Self-hosted deployment inside your VPC

You'll have to change CARTO_PLATFORM_IP variable in the previous command for the one used by your CARTO Self-hosted installation.

4. Check firewall rules to ensure that Cloud Run can reach the Self-hosted instance

Cloud Run services needs access to the CARTO Self-hosted environment, so you'll have to check that the firewall rules configured on your project allow the traffic between these two pieces.

The CARTO Self-hosted platform has to be accessible through the 443 port, and it should be allowed to respond requests performed by the Cloud Run services deployed in the previous steps.

All requests will be handled inside the VPC, so all network traffic involved in this process will take place between the subnetworks created and the CARTO Self-hosted instance.

Configure the AT Gateway in your CARTO Analytics Toolbox installation

Now that we've both installed the Analytics Toolbox and deployed the required infrastructure pieces in GCP, we have to configure the Analytics Toolbox so that it's able to use the AT Gateway.

The Analytics Toolbox provides a procedure to update the required configuration values to start using the remote functions needed. These functions can be configured executing the following query in your BigQuery project:

CALL carto.SETUP("""{
  "connection": "{CONNECTION}",
  "endpoint": "{ENDPOINT}",
  "api_base_url": "{API_BASE_URL}",
  "api_access_token": "{API_ACCESS_TOKEN}"
}'""");

Replace the following:

  • CONNECTION: name of the connection created in the previous step. The default value is {PROJECT_ID}.{REGION}.carto-conn

  • ENDPOINT: endpoint of the AT Gateway function deployed in Cloud Run

  • API_BASE_URL: the API base URL of your CARTO Self-hosted platform.

  • API_ACCESS_TOKEN: access token generated inside CARTO platform with permissions to use the LDS API

The ENDPOINT expected value can be obtained executing the following command:

gcloud run services describe carto-at-gateway
--project={PROJECT_ID}
--region={REGION}
--format="value(status.address.url)"

After running the previous query, the CARTO Analytics Toolbox should be ready to work in your BigQuery project. In order to check if the installation process has worked as expected, you can execute the following queries in the BigQuery console. It will create a table called geocode_test_table containing a gecoded address.

CREATE TABLE {DATASET}.geocode_test_table AS (
  SELECT "Madrid" AS address
)

CALL carto.GEOCODE_TABLE(NULL,NULL,'{PROJECT}.{DATASET}.geocode_test_table','address',NULL, NULL, NULL);

Now, remember to setup your connections to BigQuery with the correct Analytics Toolbox location setting to ensure that all queries generated by CARTO applications use it.

Last updated