Detailed descriptions and requirements to guarantee a successful CARTO Self-Hosted deployment in your own cloud.
Depending on your chosen cloud provider and type of deployment, there can be different possible architectural setups to deploy CARTO Self-Hosted as illustrated below.
The following diagram describes the different components of the CARTO Self-hosted. This set of components maps both with a container or external service in the deployment.
router-http
It's the principal entry point of the application. It's an nginx reverse proxy to route the HTTPs traffic to the right components.
metadata-database
PostgreSQL database to manage the metadata of the CARTO platform.
Subscribers and APIs use this database to perform their operations.
More info at the external database section in deployment requirements.
message-broker
Message broker to exchange messages between the different pieces of the platform using the producer-consumer paradigm.
APIs or scheduler processes often produce messages to be consumed by the subscriber. For example, Imports API produces a message to import a file, and the import-worker consumes the message and performs the import operation.
It's not a container included in the Self-Hosted, it uses Google Cloud Pub/Sub.
In-memory redis cache for the APIs and subscribers.
CARTO Self-Hosted can work without this cache, the platform will just fall back the queries against the metadata database.
You can use an external service of your cloud provider as explained in more detail here.
The Web Application of the workspace. It's a modularized React application. Applications in the CARTO platform such as Builder or Workflows are inside this container.
It's the root path (https://carto.mycompany.com) of the Self-Hosted, a basic nginx container with static files like HTML, CSS, and JS.
The Web Application manages the login and signup process. It's a modularized React application.
It's at /acc/ path (https://carto.mycompany.com/acc) of the Self-Hosted, a basic nginx container with static files like HTML, CSS, and JS.
It requires a connection with the CARTO accounts API to perform different operations: create accounts, create users, invite users, etc.
An HTTP cache that works as a CDN for the maps-api and some endpoints of workspace-api. It uses Varnish HTTP Cache.
API to support the Web Application of the workspace (workspace-www).
Maps and SQL high-performance API. It's the component with higher traffic as it's deeply used by other components.
It doesn't perform batch operations. Batch operations of SQL API are executed by sql-worker.
You should create multiple instances of this container to scale at your needs.
Import API. This component doesn't perform actual import operation, it just creates jobs for import-worker.
Location Data Services (LDS) API.
Consumer of the messages related to workspace at the Message Broker. It reads the messages published on that bus and performs the required actions.
Consumer of the messages related to imports at the Message Broker. It uploads geospatial files into the customer Data Warehouse.
It requires 8GB of RAM to process up to 1GB of geospatial files.
Consumer of the messages related to SQL at the Message Broker.
This worker is mainly used to execute SQL in batch for customers using PostgreSQL or Redshift. BigQuery and Snowflake data warehouse won't use this component as it uses the batch APIs (jobs) provided by the data warehouse.
Consumer of the messages related to invalidation at the Message Broker. It invalidates content at http-cache.
kots (Only for "Single VM" and standard "Orchestrated container" deployments)
Containers used for the deployment of the Admin Console. They serve the statics needed to handle the Admin Console.
This piece manages the changes applied to your CARTO Self-Hosted configuration, and it handles licensing related processes and upgrade checks.
In order to run CARTO on your own infrastructure setup, the following requirements must be met at a minimum:
Before proceeding with the installation, it is recommended that the individual performing the setup is familiar with cloud environments, specifically GCP (Google Cloud Platform), AWS (Amazon Web Services), or Azure (Microsoft Azure). This prior experience ensures a smoother deployment process and a better understanding of the underlying infrastructure.
Cloud Platform Proficiency: Basic proficiency in the chosen cloud platform is recommended. This includes the ability to navigate the respective console, manage instances or clusters, and configure networking settings.
Account Authorization: Ensure that you have the necessary permissions and access rights within your cloud platform account. This typically involves appropriate role assignments.
Resource Understanding: A grasp of fundamental concepts such as virtual machines, Kubernetes, storage, and networking within your chosen cloud environment will enhance your ability to deploy and manage resources effectively.
Having a solid understanding of cloud services will empower you to navigate the deployment process with confidence.
The hardware and software requirements below must be met to ensure an optimal performance of the CARTO platform:
Ubuntu 22.04, Debian 11 or above
60 GB disk
8 CPUs (x86)
32 GB RAM memory
Kubernetes 1.12 or above
Helm 3.6.0 or above
At least 3 nodes with 3x vCPUs and 16 GB of memory
An isolated namespace in which CARTO resources could be deployed. In case you're deploying more than one CARTO instances, there should be a namespace per installation.
Persistent volumes configured in your cluster, as the Admin Console will store configuration changes in a persistent volume.
CARTO requires a dedicated PostgreSQL database to manage its metadata. The metadata information stored in this database is the following:
Configuration of Maps: data sources, layers, tooltips, legends, etc.
Configuration of Workflows.
Configuration of Applications.
Connection credentials to other data warehouses like BigQuery, Snowflake, PostgreSQL, Redshift, or Databricks.
Other CARTO internal metadata
This metadata database must be maintained (in terms of updates, backups, high availability, ...) by you. Our recommendation is to use the managed service provided by your cloud provider:
Google: Cloud SQL for PostgreSQL.
Azure: Azure Database for PostgreSQL.
The current PostgreSQL recommended version is 14 or above. The minimum requirements for production are:
1 vCPU
2 GB of RAM memory
20 GB of SSD storage
CARTO might need to be accessible to other people in your company (or the internet if you desire it) who needs using it. In order to do that, you need to configure:
A full domain/subdomain that will be pointing to the machine.
(Optional) A TLS certificate for the domain/subdomain. If no TLS certificate is provided, a self-signed certificate will be generated. The TLS certificate private key can't be protected with a passphrase.
Access to HTTPS port (443). HTTP port (80) is optional and is going to redirect to HTTPS.
A full domain is required. You cannot install CARTO in a domain path like https://my.domain.com/carto
The CARTO Self-Hosted deployment requires access to some external services. Some of them are required for the software to work, and others depend on the cloud and data warehouse you will run and connect CARTO to. Finally, there is a set of optional services that you will need to open in case you will use those services with CARTO. For these services, HTTP/HTTPS domains must be “accepted”.
Required services:
auth.carto.com
Auth system at CARTO based on Auth0, a leading provider for authentication and authorization.
pubsub.googleapis.com
& www.googleapis.com
Used as a message broker between CARTO servers and the Self-Hosted to transfer information about the license and telemetry.
*.self-hosted.carto.com
Used to deliver new Self-Hosted releases.
docker.io
Needed for downloading the images to execute the Admin Console.
api.openai.com
Required to use AI Agents in your maps.
Cloud/Data warehouse specific requirements:
Depending on the cloud you are deploying and the data warehouse you are using, you will also need to open certain services to connect your data.
Google Cloud
bigquery.googleapis.com
& oauth2.googleapis.com
& bigquerydatatransfer.googleapis.com
If you are going to use BigQuery.
These are also needed if you are going to use the CARTO Data Warehouse.
storage.googleapis.com
Access to CARTO platform buckets.
AWS
.amazonaws.com
Required if you are going to use AWS S3 buckets.
Azure
.blob.core.windows.net
Required if you are going to use Azure Blob storage.
Snowflake
*.snowflakecomputing.com
If you are going to use Snowflake.
Databricks
*.databricks.com
If you are going to use Databricks.
LDS specific requirements:
These are the requirements for Location Data Services:
api.tomtom.com
Used for geocoding and routing.
api.traveltimeapp.com
Used for isolines.
isoline.router.hereapi.com
Used for isolines if Here is the configured provider.
Deploying CARTO Self-Hosted platform on cloud vendors like GCP, AWS, and Azure involves several external services and configurations. Below is a general guide to document the external services needed for deploying CARTO on these cloud platforms.
Compute Engine: virtual machines for hosting CARTO. Just required if you'd like to deploy the single vm deployment in GCP.
GKE: managed kubernetes service for hosting CARTO orchestrated container deployment. Just required if you'd like to deploy the orchestrated container deployment in GCP.
Cloud Storage: mandatory for storing data and configurations in GCP.
Cloud SQL: managed database service for PostgreSQL mandatory for storing the metadata database.
Cloud DNS: for managing domain names and DNS records.
EC2 Instances: virtual machines for hosting CARTO. Just required if you'd like to deploy the single vm deployment in AWS.
EKS: managed kubernetes service for hosting CARTO orchestrated container deployment. Just required if you'd like to deploy the orchestrated container deployment in AWS.
S3: object storage for data. Mandatory if you'd like to store your data in AWS.
RDS: managed database service for PostgreSQL. Mandatory in AWS for storing our metadata database.
Route 53: mandatory for domain management and DNS if you're configuring it in AWS.
Virtual Machines: for hosting CARTO single vm deployment.
AKS: for hosting CARTO orchestrated container deployment.
Azure Blob Storage: for storing data and configurations.
Azure Database for PostgreSQL: managed database service.
Azure DNS: for domain management and DNS.
A CARTO installation package that contains your environment configuration and a license key is required during the installation process. If you don't have these, you should request them at support@carto.com.