# Installation in an Amazon Web Services VPC

This guide will walk you through the process of configuring the CARTO Analytics Toolbox to work within a VPC with a CARTO Self-hosted installation on Amazon Web Services.

## Overview

When deploying the Analytics Toolbox in a VPC environment with CARTO Self-hosted, you need to:

1. Set up the VPC infrastructure (subnet, security group, VPC endpoint)
2. Create IAM roles with VPC access permissions
3. Run the Analytics Toolbox installer with the pre-configured roles
4. Update the Lambda functions with VPC configuration
5. Configure DNS for the CARTO Self-hosted platform
6. Configure the AT Gateway

### Architecture overview

To deploy the Analytics Toolbox within a VPC, the following infrastructure pieces are needed:

<figure><img src="https://3029946802-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FybPdpmLltPkzGFvz7m8A%2Fuploads%2Fgit-blob-724c7b7c5e6c64d076160c013ac846a543cc0365%2FSelf-Hosted%20AT.png?alt=media" alt=""><figcaption></figcaption></figure>

* One subnetwork used to deploy the containers created by the Lambda function
* Lambda functions for Redshift to interact with the Self-hosted platform
* An internal DNS record pointing to the IP address of your CARTO Self-hosted platform
* A VPC endpoint to allow communication between your Redshift instance and the VPC where CARTO Self-Hosted platform is installed

{% hint style="info" %}
All following commands and instructions should be executed from an authenticated `aws` CLI session.
{% endhint %}

## Step 1: Prepare VPC Infrastructure

Before running the installer, you need to set up the networking infrastructure that the Lambda functions will use.

### 1.1 Create a subnet for the Lambda function

{% code overflow="wrap" %}

```bash
aws ec2 create-subnet \
    --vpc-id {VPC_NETWORK} \
    --cidr-block {SUBNETWORK_IPS_RANGE} \
    --region {REGION} \
    --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value={SUBNETWORK_NAME}}]'
```

{% endcode %}

Replace the following:

* `VPC_NETWORK`: the ID of the network created in your VPC project
* `SUBNETWORK_IPS_RANGE`: the range of IPs that this subnetwork will use

{% hint style="info" %}
The IPs range selected for the subnetwork must be created using a CIDR /24 block
{% endhint %}

* `REGION`: the region used to create the subnetwork
* `SUBNETWORK_NAME`: the name of the subnetwork that will be created

Save the **Subnet ID** from the output for later use.

### 1.2 Create a security group for the Lambda function

{% code overflow="wrap" %}

```bash
aws ec2 create-security-group \
  --group-name {GROUP_NAME} \
  --region {REGION} \
  --description "Security Group for AT Gateway Lambda Function" \
  --vpc-id {VPC_NETWORK}
```

{% endcode %}

{% hint style="info" %}
This security group will be used by your AT Gateway Lambda Function. It should allow requests to and from Redshift and your CARTO Self-Hosted installation.
{% endhint %}

Replace the following:

* `GROUP_NAME`: the name of the security group
* `REGION`: the region used to create the security group
* `VPC_NETWORK`: the ID of the network created in your VPC project

Save the **Security Group ID** from the output for later use.

### 1.3 Provision a VPC endpoint for Lambda

{% code overflow="wrap" %}

```bash
aws ec2 create-vpc-endpoint \
    --vpc-id {VPC_NETWORK} \
    --service-name com.amazonaws.{REGION}.lambda \
    --vpc-endpoint-type Interface \
    --security-group-ids {SECURITY_GROUP_ID} \
    --region {REGION} \
    --private-dns-enabled
```

{% endcode %}

Replace the following:

* `VPC_NETWORK`: the ID of the network created in your VPC project
* `REGION`: the region used to create the VPC endpoint
* `SECURITY_GROUP_ID`: ID of the security group created in the previous step

## Step 2: Create IAM Role for Lambda with VPC Access

Create a Lambda execution role that has permissions to access VPC resources.

### 2.1 Create the Lambda execution role

```bash
aws iam create-role \
    --role-name {ROLE_NAME} \
    --description "Role for CARTO AT Gateway Lambda Function with VPC access" \
    --assume-role-policy-document file://lambda-trust-policy.json
```

* `ROLE_NAME`: Name of the role (e.g., `CartoATLambdaVPCRole`)

The `lambda-trust-policy.json` file should contain:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowLambdaToAssumeRole",
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

### 2.2 Attach VPC execution permissions to the role

```bash
aws iam attach-role-policy \
    --role-name {ROLE_NAME} \
    --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
```

* `ROLE_NAME`: Name of the role created in the previous step

Save the **Role ARN** for use with the installer.

## Step 3: Configure Security Groups

Ensure that the security groups allow traffic between the Lambda function and the CARTO Self-hosted environment.

The CARTO Self-hosted platform must be accessible through port 443, and it should allow responses to requests from the Lambda function deployed in the previous steps.

All requests will be handled inside the VPC, so all network traffic will occur between the created subnetwork and the CARTO Self-hosted instance.

## Step 4: Create DNS Entry for CARTO Self-hosted Platform

The Lambda functions need to access the CARTO Self-hosted LDS API. Since requests are handled inside the VPC, you need an internal DNS entry for the Lambda functions to reach the CARTO platform APIs.

First, obtain the **internal IP address** of the CARTO Self-hosted platform.

{% hint style="danger" %}
If you already have an internal DNS configured in your AWS project, you can skip creating a new hosted zone and directly add a new record pointing to the CARTO platform internal IP address.
{% endhint %}

### 4.1 Create a DNS zone (if needed)

```bash
aws route53 create-hosted-zone \
    --name {DNS_ZONE} \
    --vpc '{"VPCRegion":"{REGION}","VPCId":"{VPC_ID}"}' \
    --caller-reference $(date +%s)
```

* `DNS_ZONE`: the name of your DNS zone
* `REGION`: region where the zone is going to be created
* `VPC_ID`: your AWS VPC ID

### 4.2 Create a DNS record pointing to CARTO Self-hosted

```bash
aws route53 change-resource-record-sets \
  --hosted-zone-id {DNS_ZONE_ID} \
  --change-batch '{
    "Changes": [
      {
        "Action": "CREATE",
        "ResourceRecordSet": {
          "Name": "{INTERNAL_DOMAIN}",
          "Type": "A",
          "TTL": 300,
          "ResourceRecords": [
            {
              "Value": "{CARTO_PLATFORM_IP}"
            }
          ]
        }
      }
    ]
  }'
```

Replace the following:

* `DNS_ZONE_ID`: the ID of your DNS zone
* `INTERNAL_DOMAIN`: the internal domain that will point to your CARTO Self-hosted deployment inside your VPC
* `CARTO_PLATFORM_IP`: internal IP address of your CARTO Self-hosted deployment

## Step 5: Run the Analytics Toolbox Installer

Now that the VPC infrastructure is ready, run the Analytics Toolbox installer with the pre-created Lambda execution role.

{% hint style="info" %}
The Analytics Toolbox for Redshift is available for CARTO customers. Please get in touch with [**support@carto.com**](mailto:support@carto.com) to get the installation package.
{% endhint %}

### 5.1 Extract and prepare the installer

```bash
# Extract the package
unzip carto-at-redshift-<version>.zip
cd carto-at-redshift-<version>

# Setup Python environment
python3 -m venv .venv && source .venv/bin/activate
pip install -r scripts/requirements.txt
```

### 5.2 Run the installer with the VPC-enabled role

```bash
python scripts/install.py \
  --non-interactive \
  --aws-region {REGION} \
  --rs-lambda-prefix {LAMBDA_PREFIX} \
  --rs-lambda-execution-role {LAMBDA_ROLE_ARN} \
  --rs-host {REDSHIFT_HOST} \
  --rs-database {REDSHIFT_DATABASE} \
  --rs-user {REDSHIFT_USER} \
  --rs-password "{REDSHIFT_PASSWORD}" \
  --rs-schema carto
```

Replace the following:

* `REGION`: the AWS region where your Redshift cluster is deployed
* `LAMBDA_PREFIX`: prefix for Lambda function names (e.g., `carto-at-vpc-`)
* `LAMBDA_ROLE_ARN`: ARN of the Lambda execution role created in Step 2
* `REDSHIFT_HOST`: your Redshift cluster endpoint
* `REDSHIFT_DATABASE`: your Redshift database name
* `REDSHIFT_USER`: Redshift admin username
* `REDSHIFT_PASSWORD`: Redshift admin password

## Step 6: Update Lambda Functions with VPC Configuration

After the installer completes, you need to update the deployed Lambda functions with VPC configuration so they can access your CARTO Self-hosted platform.

### 6.1 List the deployed Lambda functions

The installer creates Lambda functions with your specified prefix. List them to get the function names:

```bash
aws lambda list-functions --query "Functions[?starts_with(FunctionName, '{LAMBDA_PREFIX}')].FunctionName" --output table
```

### 6.2 Update each Lambda function with VPC configuration

For each Lambda function, run:

```bash
aws lambda update-function-configuration \
  --function-name {LAMBDA_FUNCTION_NAME} \
  --vpc-config SubnetIds={SUBNET_ID},SecurityGroupIds={SECURITY_GROUP_ID} \
  --environment "Variables={NODE_TLS_REJECT_UNAUTHORIZED=0}" \
  --region {REGION}
```

Replace the following:

* `LAMBDA_FUNCTION_NAME`: name of the Lambda function to update
* `SUBNET_ID`: ID of the subnet created in Step 1.1
* `SECURITY_GROUP_ID`: ID of the security group created in Step 1.2
* `REGION`: AWS region

{% hint style="info" %}
The `NODE_TLS_REJECT_UNAUTHORIZED=0` environment variable is used to disable the verification of custom TLS certificates in Self-hosted deployments.
{% endhint %}

### 6.3 Update Lambda retry configuration

For each Lambda function, disable retries to prevent duplicate operations:

```bash
aws lambda put-function-event-invoke-config \
  --function-name {LAMBDA_FUNCTION_NAME} \
  --maximum-retry-attempts 0 \
  --region {REGION}
```

## Step 7: Configure the AT Gateway

Now configure the Analytics Toolbox to use the deployed Lambda functions for LDS and other gateway functionalities.

Connect to your Redshift database and run the SETUP procedure:

```sql
CALL carto.SETUP('{
   "lambda": "{LAMBDA_FUNCTION_NAME}",
   "roles": "{REDSHIFT_INVOKE_ROLE_ARN}",
   "api_base_url": "{API_BASE_URL}",
   "api_access_token": "{API_ACCESS_TOKEN}"
}');
```

Replace the following:

* `LAMBDA_FUNCTION_NAME`: name of the AT Gateway Lambda function (with your prefix, e.g., `carto-at-vpc-lds`)
* `REDSHIFT_INVOKE_ROLE_ARN`: ARN of the role created by the installer to allow Redshift to invoke Lambda (check your Redshift cluster's associated IAM roles)
* `API_BASE_URL`: the [API base URL](https://docs.carto.com/carto-user-manual/developers/managing-credentials/api-base-url) of your CARTO Self-hosted platform
* `API_ACCESS_TOKEN`: access token generated inside CARTO platform with permissions to use the LDS API

## Congratulations!

Your CARTO Analytics Toolbox is now successfully installed and configured inside your VPC.

{% hint style="info" %}
After an installation or update of the Analytics Toolbox is performed, the CARTO connection needs to be refreshed by the owner of the connection by clicking on the refresh button on the connection's card.

<img src="https://3029946802-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FybPdpmLltPkzGFvz7m8A%2Fuploads%2Fgit-blob-acbc5e6073857c9a52e36f50e941c334c4ea9fdc%2FScreenshot%202024-01-10%20at%2016.43.37.png?alt=media" alt="" data-size="original">
{% endhint %}

Now you can start using the functions in the [SQL reference](https://docs.carto.com/analytics-toolbox-redshift/sql-reference/overview/)
