Analytics Toolbox for BigQuery

Analytics Toolbox for BigQuery

Go back

Opening a new Pizza Hut location in Honolulu

Context

Identifying an optimal location for a new store is not always an easy task, and we often do not have enough data at our disposal to build a solid model to predict potential revenues across an entire territory. In these cases, managers rely on different business criteria in order to make a sound decision for their expansion strategy. For example, they rely on defining their target market and segmenting population groups accordingly in order to locate the store closer to where the target market lives (e.g. areas with a great presence of youngsters).

In this example, we are going to use CARTO’s Analytics Toolbox for BigQuery to explore good locations to open a new Pizza Hut restaurant in Honolulu, Hawaii. For that, we will perform two different spatial analyses based on spatial indices (H3): Commercial Hotspots and Local Outlier Factor.

Area of study

We will start by defining an area of interest for our study, which in our case is a buffer of 5 km around downtown Honolulu.

1
SELECT ST_BUFFER(ST_GEOGPOINT(-157.852587, 21.304390), 5000);

Pizza Huts locations in Honolulu

Next, we will find all Pizza Hut restaurants in Honolulu using OpenStreetMaps’s Planet Nodes dataset, available through CARTO’s Data Observatory. An extract of this table containing only the Points of Interest in Honolulu can be found in cartobq.docs.honolulu_planet_nodes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
DECLARE honolulu_buffer GEOGRAPHY;
-- We use the ST_BUFFER to define a 5 km buffer centered in Honolulu
SET honolulu_buffer = ST_BUFFER(ST_GEOGPOINT(-157.852587, 21.304390), 5000);
 
SELECT
tag.value AS brand, geometry,
FROM
`cartobq.docs.honolulu_planet_nodes` d,
UNNEST(all_tags) as tag
WHERE ST_CONTAINS(honolulu_buffer, geometry)
AND ((tag.value in ("Pizza Hut") AND tag.key = 'brand'))

Polyfill area of study

For our analysis, we will subdivide the area of study into H3 grid cells of resolution 10 using the H3_POLYFILL function. The result is stored in cartobq.docs.honolulu_pizza_aos.

1
2
3
4
5
6
7
8
DECLARE honolulu_buffer GEOGRAPHY;
-- We use the ST_BUFFER to define a 5 km buffer centered in Honolulu
SET honolulu_buffer = ST_BUFFER(ST_GEOGPOINT(-157.852587, 21.304390), 5000);
 
CREATE TABLE `cartobq.docs.honolulu_pizza_aos` AS (
   SELECT h3id
   FROM UNNEST(`carto-un`.carto.H3_POLYFILL(honolulu_buffer, 10)) h3id
)

Enrich area of study

Our customer is interested in looking for areas with a high density of males and females between 15 and 34 years old and that do not have an existing Pizza Hut restaurant nearby.

To perform the demographics analysis, we are going to enrich the H3 grid of our area of interest with a set of population variables from the ACS Sociodemographics dataset at census block group level, available for free through our Data Observatory. We will make use of the enrichment capabilities of CARTO’s Analytics Toolbox.

First, you need to subscribe to the chosen dataset from the Data Observatory section of the CARTO Workspace. You can follow this guide to do so.

Once subscribed to the dataset, navigate to the Data Explorer and expand the Data Observatory section. Pick any of your data subscriptions and click on the “Access in” button on the top right of the page. Copy the BigQuery project and dataset from any of the table locations that you see on the screen, as this is where your data subscriptions are. We are going to need this information in the next step.

Find the location of your Data Observatory subscriptions

Now you can go to your BigQuery console and double check that you are correctly subscribed to the dataset of choice by using the DATAOBS_SUBSCRIPTIONS procedure, which takes as input the location of your Data Observatory subscriptions that we found in the previous step.

1
CALL `carto-un`.carto.DATAOBS_SUBSCRIPTIONS('carto-data.ac_yyr4gtk5','');
 Check Data Observatory subscriptions

Next, you can check the list of variables that are available to perform the enrichment using the DATAOBS_SUBSCRIPTION_VARIABLES procedure. Particularly, we are interested in those variables that contain the word population in the description, so we are going to take advantage of the second input of the procedure, which enables to filter the results.

1
CALL `carto-un`.carto.DATAOBS_SUBSCRIPTION_VARIABLES('carto-data.ac_yyr4gtk5',"variable_description LIKE '%population%'");

After selecting the variables of interest, which in our case are the population values for males and females between 15 and 34 years old, we can perform the enrichment using the DATAOBS_ENRICH_GRID procedure. Please note that the variables are uniquely identified using their variable slug. The result of this process can be found in cartobq.docs.honolulu_pizza_aos_enriched.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
CALL `carto-un`.carto.DATAOBS_ENRICH_GRID
('h3',
R'''
SELECT * from `cartobq.docs.honolulu_pizza_aos`
''',
'h3id',
[('female_15_to_17_eb1658f1', 'sum'), ('female_18_to_19_6d791436', 'sum'), ('female_20_f727dc', 'sum'), 
('female_21_77f0174a', 'sum'), ('female_22_to_24_121a63e5', 'sum'), ('female_25_to_29_a90c21d6', 'sum'), 
('female_30_to_34_50344313', 'sum'), ('male_15_to_17_8dd9d9ac', 'sum'), ('male_18_to_19_bb6956b', 'sum'), 
('male_20_5264b51', 'sum'), ('male_21_72217bc7', 'sum'), ('male_22_to_24_74d5e2b8', 'sum'), 
('male_25_to_29_cfc3a08b', 'sum'), ('male_30_to_34_36fbc24e', 'sum')],
NULL,
['cartobq.docs.honolulu_pizza_aos_enriched'],
'carto-data.ac_yyr4gtk5')

If you get stuck in any of the steps of this process, please refer to this guide, where you will find an example notebook that performs an enrichment using Data Observatory data that you can run using your own BigQuery account.

Process enrich data

We are going to process our resulting enriched table to aggregate all the male and female population variables and store the result into a single column. The result can be found in cartobq.docs.honolulu_pizza_aos_enriched_sum.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
DECLARE features ARRAY<STRING>;
DECLARE query STRING;
 
-- We get the names of all columns in the enriched table except for ('h3id')
SET features =
(
 SELECT ARRAY_AGG(column_name)
 FROM `cartobq.docs`.INFORMATION_SCHEMA.COLUMNS
 WHERE table_name = 'honolulu_pizza_aos_enriched' AND NOT column_name IN ('h3id')
);
 
SET query =
 """ CREATE TABLE `cartobq.docs.honolulu_pizza_aos_enriched_sum` AS (SELECT h3id, """
 || ARRAY_TO_STRING(features, " + ")  ||
 """ AS sum_pop FROM `cartobq.docs.honolulu_pizza_aos_enriched`)""";
 
-- We execute such query
EXECUTE IMMEDIATE query;

Calculate the minimum distance to a Pizza Hut

In addition to the population, Pizza Hut would like to consider for the analysis the distance to the closest existing Pizza Hut as they would like to avoid cannibalization between their own restaurants. For that, for every cell of our grid we are going to compute the distance to every Pizza Hut restaurant using the H3_DISTANCE function, and then keep the minimum value. Please note that these distances are computed as the number of H3 cells of resolution 10 that separate a pair of locations, and therefore are only an approximation. The result can be found in cartobq.docs.honolulu_pizza_aos_enriched_sum_wdist.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
DECLARE honolulu_buffer GEOGRAPHY;
SET honolulu_buffer = ST_BUFFER(ST_GEOGPOINT(-157.852587, 21.304390), 5000);
 
CREATE TABLE `cartobq.docs.honolulu_pizza_aos_enriched_sum_wdist` AS
(
WITH t1 AS (
 SELECT `carto-un`.carto.H3_FROMGEOGPOINT(geometry, 10) as h3id,
 FROM `cartobq.docs.honolulu_planet_nodes` d,
 UNNEST(all_tags) as tag
 WHERE ST_CONTAINS(honolulu_buffer, geometry)
 AND ((tag.value in ("Pizza Hut") AND tag.key = 'brand'))
  ),
 
t2 AS (
SELECT * FROM `cartobq.docs.honolulu_pizza_aos_enriched_sum`
)
 
SELECT t2.h3id, ANY_VALUE(t2.sum_pop) AS sum_pop, MIN(`carto-un`.carto.H3_DISTANCE(t2.h3id, t1.h3id)) AS dist
FROM t1
CROSS JOIN t2
WHERE sum_pop IS NOT NULL
GROUP BY t2.h3id
);

Find suitable locations

Now, we are going to identify areas that meet Pizza Hut requirements, i.e., locations with large populations aged 15-34 and far from existing Pizza Hut restaurants.

In order to identify these locations, we use the COMMERCIAL_HOTSPOTS procedure, which is part of the retail module of the Analytics Toolbox. This functionality identifies areas with values that are significantly higher than the average.

As can be seen in the query below, we are using both the sum_pop (total population aged 15-34) and dist (distance to the closest Pizza Hut) variables to identify our hotspots. These variables are given a weight of 0.7 and 0.3, respectively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CALL `carto-un`.carto.COMMERCIAL_HOTSPOTS(
  'cartobq.docs.honolulu_pizza_aos_enriched_sum_wdist',
  NULL,
  'h3id',
  'h3',
  ['sum_pop','dist'],
  [0.7, 0.3],
   2,
   0.01
)

Competitor analysis

Finally, we are going to extract Pizza Hut’s competitors from the cartobq.docs.honolulu_planet_nodes table and compute the local outlier factor (using the LOF function) to identify those that are very close to one another and those far from the others. This will give us an additional insight to decide where it would be more interesting to open a new restaurant. The result can be found in cartobq.docs.honolulu_competitors_lof.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
DECLARE honolulu_buffer GEOGRAPHY;
SET honolulu_buffer = ST_BUFFER(ST_GEOGPOINT(-157.852587, 21.304390), 5000);
 
-- We get all amenities tagged as restaurants or fast_food POIS in Honolulu
WITH fast_food AS (
 SELECT CAST(id AS STRING) AS id , tag.value, geometry as geom
 FROM `cartobq.docs.honolulu_planet_nodes` d,
 UNNEST(all_tags) as tag
 WHERE ST_CONTAINS(honolulu_buffer, geometry)
 AND ((tag.value in ('fast_food', 'restaurant') AND tag.key = 'amenity'))
),
 
-- We calculate the Local Outlier Factor in order to identify restaurants without competition.
lof_output as (
SELECT `carto-un`.carto.LOF(ARRAY_AGG(STRUCT(id,geom)), 5) as lof FROM fast_food
)
SELECT lof.* FROM lof_output, UNNEST(lof_output.lof) AS lof

Visual analysis

We can look for suitable new locations for Pizza Hut by plotting all the information of our analysis using a CARTO Builder map. In the map below, we can explore where our target population lives in the context of the identified commercial hotspots and the location of Pizza Hut’s competitors:

EU flag

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 960401.