# Installation in a Google Cloud VPC

This guide will walk you through the process of configuring the CARTO Analytics Toolbox to work within a VCP with a CARTO Self-hosted installation within Google Cloud Platform.

{% hint style="info" %}
**Is your CARTO Self Hosted deployment in a Google Cloud VPC?**

When the CARTO platform is self hosted within a Google Cloud VPC, the functions and procedures of the Analytics Toolbox need to be accessed from within the same VPC.

That makes this installation method the only suitable one for this kind of CARTO platform's deployment.
{% endhint %}

## Install CARTO Analytics Toolbox inside your BigQuery project

The first step would be to [install the Analytics Toolbox in a BigQuery project of your own](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/getting-access/manual-installation-in-your-own-project).

Once the Analytics Toolbox is installed in your project, use this guide to deploy the AT Gateway in your VPC.

## Deploy the infrastructure needed to allow Location Data Services usage

Some functionalities of the CARTO Analytics Toolbox for BigQuery require making external calls from BigQuery to CARTO services. These calls are implemented via [**BigQuery Remote Functions**](https://cloud.google.com/bigquery/docs/remote-functions):

* AT Gateway: Creation of isolines, geocoding and routing require making calls to CARTO LDS API. Some functions of the Analytics Toolbox require making a request to the CARTO Platform backend (like importing from a URL or the 'Send by Email' component in Workflows) . For this purpose, Cloud Run functions need to be deployed in your VPC.

When installing the Analytics Toolbox manually in your own project, there is some configuration required:

* Create a [BigQuery connection](https://cloud.google.com/bigquery/docs/connections-api-intro) that will allow to call Cloud Run functions from BigQuery.
* An AT Gateway endpoint inside your VPC.

### Architecture overview

To deploy the Analytics Toolbox within a VPC, the CARTO platform needs to deploy some additional infrastructure pieces within your GCP project. In the following diagram, you can check how all these pieces interact with each other:

<figure><img src="https://3029946802-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FybPdpmLltPkzGFvz7m8A%2Fuploads%2Fgit-blob-f3b8887a3cb0361ebed8af45a2db7eca1093bd56%2FSelf-Hosted%20AT%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

We'll set up the following pieces inside your project to start using the Analytics Toolbox on your CARTO Self-hosted platform:

* One BQ connection used to perform requests agains two different Cloud Run services.
* One subnetwork used to deploy the containers created by the two Cloud Run services that are required.
* One Cloud Run service needed for BigQuery to interact with the Self-hosted platform.
* One VPC Serverless Access Connector that will be used by the Cloud Run services to access your VPC.
* An internal DNS record pointing to the IP address of your CARTO Self-hosted platform.

You just need to follow the following steps to set up the required infrastructure pieces:

{% hint style="info" %}
All following commands and instructions should be executed from the Cloud Shell in your console or from authenticated `gcloud`and `bq`CLI sessions.
{% endhint %}

### 1. Configure a BQ connection to enable requests to Cloud Run services

Your BigQuery project will need to make requests to the two Cloud Run services configured in this guide. To configure the BQ connection that allows this usage, you'll need to run the following command:

{% tabs %}
{% tab title="bq" %}
Create a connection from a command line:

```sh
bq mk \
    --connection \
    --project_id={PROJECT_ID} \
    --location={REGION} \
    --connection_type=CLOUD_RESOURCE \
    carto-conn
```

Replace the following:

* `PROJECT_ID`: your Google Cloud project ID
* `REGION`: your connection region. `US` and `EU` regions are not available, so you'll have to select a more specific GCP region. You can check the list of available regions [here](https://cloud.google.com/bigquery/docs/locations#regions)
  {% endtab %}
  {% endtabs %}

Once the connection has been configured, GCP will automatically create a service account that we'll need to use to grant permissions to access the cloud runs. You can check that the service account has been created correctly by running the following command:

{% tabs %}
{% tab title="bq" %}
Obtain the Service Account created when configuring a BQ connection:

{% code overflow="wrap" %}

```
bq show --format json \ 
    --connection {PROJECT_ID}.{REGION}.carto-conn
```

{% endcode %}

Replace the following:

* `PROJECT_ID`: your Google Cloud project ID
* `REGION`: your connection region
  {% endtab %}
  {% endtabs %}

### 2. Deploy the AT Gateway container in Cloud Run

The BQ connection created in the previous step will have to a Cloud Run service to use the AT. This service is the AT Gateway container, and prior to creating the services we'll need to create a subnetwork for it, as we'll have to use a [VPC Access Connector](https://cloud.google.com/vpc/docs/serverless-vpc-access) for them:

* Create subnet for the VPC Access Connector:

{% tabs %}
{% tab title="gcloud" %}
{% code overflow="wrap" %}

```bash
gcloud compute networks subnets create vpc-conn-carto \ 
    --network={VPC_NETWORK} \
    --range={SUBNETWORK_IPS_RANGE} \
    --region={REGION} \
    --project={PROJECT_ID} \
    --enable-private-ip-google-access
```

{% endcode %}

Replace the following:

* `VPC_NETWORK`: the name of the network created in your VPC project
* `SUBNETWORK_IPS_RANGE`: the range of IPs that this subnetwork will use

{% hint style="info" %}
The IPs range selected for the subnetwork must be created using a CIDR /28 block
{% endhint %}

* `REGION`: the same GCP region used when creating the BQ connection in the previous step. This region has to be exactly the same used to create the BQ connection
* `PROJECT_ID`: your Google Cloud project ID
  {% endtab %}
  {% endtabs %}

Now that the subnet is correctly configured, you'll need to create a Serverless VPC Access connector for the Cloud Run services.

* Create connector for the Cloud Run services:

{% tabs %}
{% tab title="gcloud" %}

```bash
gcloud compute networks vpc-access connectors create carto-vpc-access-conn \
    --project={PROJECT_ID} \
    --region={REGION} \
    --subnet=vpc-conn-carto \
    --min-instances=2 \
    --max-instances=5 \
    --machine-type=e2-micro
```

Replace the following variables:

* `PROJECT_ID`: your Google Cloud project ID
* `REGION`: the same GCP region used when creating the BQ connection in the previous step
  {% endtab %}
  {% endtabs %}

Once the connector has been correctly created, we can proceed with the Cloud Run services deployment. You'll have to execute the following commands:

* Deploy AT Gateway service

{% tabs %}
{% tab title="gcloud" %}

```bash
gcloud run deploy carto-at-gateway  \
    --project={PROJECT_ID} \
    --region={REGION} \
    --tag=carto-at-gateway \
    --allow-unauthenticated \
    --vpc-connector=carto-vpc-access-conn \
    --vpc-egress=all-traffic \
    --ingress=internal \
    --port=8080 \
    --set-env-vars=AT_GATEWAY_CLOUD_RUN_REGION={REGION},NODE_TLS_REJECT_UNAUTHORIZED=0 \
    --image=gcr.io/carto-onprem-artifacts/at-gateway/cloud-run:latest
```

{% hint style="info" %}
The NODE\_TLS\_REJECT\_UNAUTHORIZED environment variable is used to disable the verification of custom TLS certificates in the Self-hosted deployment
{% endhint %}

Replace the following variables:

* `PROJECT_ID`: your Google Cloud project ID
* `REGION`: the same GCP region used when creating the BQ connection in the previous step
  {% endtab %}
  {% endtabs %}

### 3. Create DNS entry for CARTO Self-hosted platform

The AT Gateway service will need to access the CARTO Self-hosted LDS API to perform requests to the different LDS providers. As the requests will be handled inside the VPC, it's mandatory to add an internal DNS registry so that the Cloud Run service can reach the CARTO platform APIs.

Firstly, we have to **obtain the internal IP address of the CARTO Self-hosted platform**. Once the internal IP has been obtained, you can create a DNS zone inside GCP using the following command:

{% tabs %}
{% tab title="gcloud" %}
{% hint style="danger" %}
If you already have an internal DNS configured in your GCP project you can skip this step and directly add a new domain pointing to the CARTO platform internal IP address.
{% endhint %}

```sh
gcloud dns managed-zones create carto-io \
    --project={PROJECT_ID} \
    --dns-name={DNS_ZONE_NAME} \
    --description="Internal DNS zone for CARTO selfhosted" \
    --networks={VPC_NETWORK} \
    --visibility=private
```

Replace the following variables:

* `PROJECT_ID`: your Google Cloud project ID
* `DNS_ZONE_NAME`: the name that will use your new DNS zone
* `VPC_NETWORK`: name of the VPC network created in your GCP project
  {% endtab %}
  {% endtabs %}

Then we'll have to create a new registry inside the new DNS zone, configuring a domain that points to CARTO Self-hosted platform's internal IP address:

{% tabs %}
{% tab title="gcloud" %}

1. Start a transaction to add a record in your DNS zone

```
gcloud dns record-sets transaction start \
    --project={PROJECT_ID} \
    --zone={DNS_ZONE}
```

2. Add the new domain to your DNS zone

```
gcloud dns record-sets transaction add {CARTO_PLATFORM_IP} \
    --project={PROJECT_ID} \
    --name={INTERNAL_DOMAIN} \
    --ttl=300 \
    --type=A \
    --zone={DNS_ZONE}
```

3. Execute the transaction to write the new changes in your DNS zone

```
gcloud dns record-sets transaction execute \
    --project={YOUR_PROJECT_ID} \
    --zone={DNS_ZONE}
```

Replace the following:

* `PROJECT_ID`: your Google Cloud project ID
* `DNS_ZONE`: the name of your DNS zone
* `CARTO_PLATFORM_IP`: internal IP address of your CARTO Self-hosted deployment
* `INTERNAL_DOMAIN`: the internal domain that will be pointing to your CARTO Self-hosted deployment inside your VPC
  {% endtab %}
  {% endtabs %}

{% hint style="danger" %}
You'll have to change `CARTO_PLATFORM_IP` variable in the previous command for the one used by your CARTO Self-hosted installation.
{% endhint %}

### 4. Check firewall rules to ensure that Cloud Run can reach the Self-hosted instance

Cloud Run services needs access to the CARTO Self-hosted environment, so you'll have to check that the firewall rules configured on your project allow the traffic between these two pieces.

The CARTO Self-hosted platform has to be accessible through the 443 port, and it should be allowed to respond requests performed by the Cloud Run services deployed in the previous steps.

All requests will be handled inside the VPC, so all network traffic involved in this process will take place between the subnetworks created and the CARTO Self-hosted instance.

## Configure the AT Gateway in your CARTO Analytics Toolbox installation

Now that we've both installed the Analytics Toolbox and deployed the required infrastructure pieces in GCP, we have to configure the Analytics Toolbox so that it's able to use the AT Gateway.

The Analytics Toolbox provides a procedure to update the required configuration values to start using the remote functions needed. These functions can be configured executing the following query in your BigQuery project:

{% tabs %}
{% tab title="bq" %}

```sql
CALL carto.SETUP("""{
  "connection": "{CONNECTION}",
  "endpoint": "{ENDPOINT}",
  "api_base_url": "{API_BASE_URL}",
  "api_access_token": "{API_ACCESS_TOKEN}"
}'""");
```

Replace the following:

* `CONNECTION`: name of the connection created in the previous step. The default value is `{PROJECT_ID}.{REGION}.carto-conn`
* `ENDPOINT`: endpoint of the AT Gateway function deployed in Cloud Run
* `API_BASE_URL`: the [API base URL](https://docs.carto.com/carto-user-manual/developers/managing-credentials/api-base-url) of your CARTO Self-hosted platform.
* `API_ACCESS_TOKEN`: access token generated inside CARTO platform with permissions to use the LDS API

The `ENDPOINT` expected value can be obtained executing the following command:

```
gcloud run services describe carto-at-gateway
--project={PROJECT_ID}
--region={REGION}
--format="value(status.address.url)"
```

{% endtab %}
{% endtabs %}

After running the previous query, the CARTO Analytics Toolbox should be ready to work in your BigQuery project. In order to check if the installation process has worked as expected, you can execute the following queries in the BigQuery console. It will create a table called `geocode_test_table` containing a gecoded address.

```sql
CREATE TABLE {DATASET}.geocode_test_table AS (
  SELECT "Madrid" AS address
)

CALL carto.GEOCODE_TABLE(NULL,NULL,'{PROJECT}.{DATASET}.geocode_test_table','address',NULL, NULL, NULL);
```

:tada:**Congratulations!** If the previous query execution finishes and you can obtain the geocoded address querying the table that has been created, your CARTO Analytics Toolbox is successfully installed and configured inside your VPC.

Now, remember to setup your connections to BigQuery with the correct [**Analytics Toolbox location**](https://docs.carto.com/carto-user-manual/connections/bigquery#advanced-options) setting to ensure that all queries generated by CARTO applications use it.
