Links

Scoring areas based on demographic data to select advertising panels for a campaign

Context

A common use case of Out-of-home advertising is geosegmentation. Geosegmentation helps in identifying where target audiences live in or commute through, and in association where panels should be located. This is particularly useful if the goal of the advertising campaign is not necessarily to direct traffic to nearby locations, but rather to maximize the population the panel appeals to.
In this example, we will use sociodemographic and income data to identify where a theoretical target audience for Starbucks is located in New York. We will construct a spatial index score for the panels using the geosegmentation criteria, so that we prioritize panels based on their score.

Steps to reproduce

  1. 1.
    Go to the CARTO signup page.
    • Click on Log in.
    • Enter your email address and password. You can also log in with your existing Google account by clicking Continue with Google.
    • Once you have entered your credentials: click Continue.
  2. 2.
    Firstly, we need to subscribe to a dataset where we will be able to find sociodemographic and income data. A dataset with this information which is free to subscribe to is the “Sociodemographics” dataset from the American Community Survey. Navigate to the Data Observatory within CARTO and search for “Sociodemographics - United States of America (Census Block Group, 2018, 5yrs)”.
  3. 3.
    Subscribe to the dataset. After the subscription is finished you should be able to see the following screen in your Data Explorer, in the Data Observatory section.
    Navigate to the “Data” tab and explore the dataset. You will see we have variable for age bands and for median income
  4. 4.
    We should also subscribe to the dataset with geographic boundaries from CARTO, for the year 2018. Navigate to the Data Observatory and search for “County - United States of America (2018)”. Subscribe to the dataset as done in the prior steps.
  5. 5.
    Now let’s start creating the map. From the Navigation Menu in the left panel, select Maps. On the top-right corner, select “New map”. This should take you to a new instance of a Builder map:
  6. 6.
    We will define the overall Area of study and divide it into hexagons by running an SQL query. Select the “Data Observatory” tab and the "CARTO Data Warehouse" connection. Select the subscription of the USA county boundaries.
  7. 7.
    We will now modify the default SQL query that loads the data in the map.
  8. 8.
    We define the study area and create the h3 layer and save the h3 layer as a new table for the study area. For this project, we’re using the greater NYC area as our study area. To define that area we need to select the four polygons representing Brooklyn, Queens, Manhattan and the Bronx, and merge them together. Remember that in order to execute this query you need to use the location of the data subscription in your account, so replacing "carto-data.ac_dp1glsh" in the query below, for what you see in the default SQL that has loaded the data into the map. Make sure then that you center your map in the area of New York City.
-- define study area
WITH study_area AS (
SELECT ST_UNION_AGG(geom) AS geom
FROM `carto-data.ac_dp1glsh.sub_carto_geography_usa_county_2018`
WHERE geoid = '36047' or geoid = '36081' or geoid = '36005' or geoid = '36061'
)
-- make h3 geometry layer
SELECT
geoid,
`carto-un`.carto.H3_BOUNDARY(geoid) AS geom
from study_area, unnest(
`carto-un`.carto.H3_POLYFILL(study_area.geom,9)
) as geoid
9. Then, save the resulting source as a separate table. Navigate to the bottom-right of the SQL editor, select “Create table from query”. Select the CARTO Data Warehouse connection, and save the new table under “organization data”, “shared”. Name the table as “ooh_geosegmentation_h3_area_of_study”. You can also use your own connection.
10. Now let’s enrich the H3 area of study. Go to the Data Explorer, CARTO Data Warehouse connection, and open the last table “ooh_geosegmentation_h3_area_of_study”, where it has been saved. On the top right click on “Enrich table”. Read the introduction guide and click Continue, you should now see this screen.
11. In the first step, select the subscription of Sociodemographics for the United States of America from the American Community Survey. Click continue.
Let’s enrich with columns which we can use to construct our audience, in this case. This would be women between the ages of 18 to 40, with an income above $70k. To construct this audience we need to select the following variables. Sum the age fields and average the income field.
-- Enrich the following fields
female_18_to_19
female_20
female_21
female_22_to_24
female_25_to_29
female_30_to_34
female_35_to_39
median_income
In the final step of enrichment, select “Enrich current table”.
12. Now go back to the map, click on the 3 dots on the right hand side of the latest source we created. Select “Show data table”. Observe that the table is now enriched with the columns from the sociodemographics table.
13. Now we can build some custom variables in our new table. Let’s first consolidate all age values per H3 into a single variable, and subsequently normalize it. As before, we will write a custom query. Select the “Add source from…” at the bottom left of the page. Select the tab named “Custom Query (SQL)”, and click on the “CARTO Data Warehouse” connection. Insert the query below, but remember that in order to execute this query you need to use the location of the data subscription in your account, so replacing "carto-dw-ac-dp1glsh" in the query below, for that location belonging to your account.
WITH pop_stats AS (
SELECT
*, female_18_to_19_6d791436_sum + female_20_f727dc_sum + female_21_77f0174a_sum + female_22_to_24_121a63e5_sum + female_25_to_29_a90c21d6_sum + female_30_to_34_50344313_sum + female_35_to_39_7cbd1009_sum AS female_18_40_pop,
FROM
`carto-dw-ac-dp1glsh.shared.ooh_geosegmentation_h3_area_of_study`
),
pop_stats_norm AS (
SELECT *, ML.MIN_MAX_SCALER(female_18_40_pop) OVER() AS female_18_40_norm FROM pop_stats
WHERE geom IS NOT NULL
)
SELECT * FROM pop_stats_norm
Style the resulting layer as seen below. As we can see, population of women between 18 and 40 is high in most parts of New York metropolitan area, slightly higher in Manhattan East, while as we move away from Kings and Queens population drops.
You can now delete the previous source and layer and keep the new one, as we will keep building on it.
14. Let’s now add variables to represent geographies where the median annual income is above $70,000, and subsequently normalize it. As before, we will write a custom query, but we will run this in the same layer as before, in other words we will perform further analysis on the same layer. Replace the previous query with the query below. Remember to replace "carto-dw-ac-dp1glsh" for the equivalent belonging to your account.
WITH pop_stats AS (
SELECT
*, female_18_to_19_6d791436_sum + female_20_f727dc_sum + female_21_77f0174a_sum + female_22_to_24_121a63e5_sum + female_25_to_29_a90c21d6_sum + female_30_to_34_50344313_sum + female_35_to_39_7cbd1009_sum AS female_18_40_pop,
FROM
`carto-dw-ac-dp1glsh.shared.ooh_geosegmentation_h3_area_of_study`
),
pop_stats_norm AS (
SELECT *, ML.MIN_MAX_SCALER(female_18_40_pop) OVER() AS female_18_40_norm FROM pop_stats
WHERE geom IS NOT NULL
),
-- Since we're only interested in incomes over 70000, doing this is a form of normalization
-- to remove positive impact of incomes <70000 on the final index score
income AS (
SELECT *, CASE
WHEN median_income_6eb619a2_avg <70000 THEN 0
WHEN median_income_6eb619a2_avg >= 70000 THEN median_income_6eb619a2_avg-70000
END AS income_adj
FROM pop_stats_norm
WHERE geom IS NOT NULL AND median_income_6eb619a2_avg IS NOT NULL
),
normed_income AS (
SELECT *, ML.MIN_MAX_SCALER(income_adj) OVER() AS income_norm FROM income
WHERE geom IS NOT NULL
)
SELECT * FROM normed_income
Style the resulting layer as seen below. As we can see, income is higher in lower Manhattan and Brooklyn.
15. Finally, let’s create an index score to represent both income and audience population levels within our geography. We will use the normalized values for both population and income level, and we will combine both in a single score by allocating the population value double the weight to the income value. We will once again normalize the resulting score. Remember to replace "carto-dw-ac-dp1glsh" for the equivalent belonging to your account.
As before, we will write a custom query, adding to the previous analysis. Replace the previous query with the query below:
WITH pop_stats AS (
SELECT
*, female_18_to_19_6d791436_sum + female_20_f727dc_sum + female_21_77f0174a_sum + female_22_to_24_121a63e5_sum + female_25_to_29_a90c21d6_sum + female_30_to_34_50344313_sum + female_35_to_39_7cbd1009_sum AS female_18_40_pop,
FROM
`carto-dw-ac-dp1glsh.shared.ooh_geosegmentation_h3_area_of_study`
),
pop_stats_norm AS (
SELECT *, ML.MIN_MAX_SCALER(female_18_40_pop) OVER() AS female_18_40_norm FROM pop_stats
WHERE geom IS NOT NULL
),
-- Since we're only interested in income over 70000, doing this is a form of normalization
-- to remove positive impact of incomes <70000 on the final index score
income AS (
SELECT *, CASE
WHEN median_income_6eb619a2_avg <70000 THEN 0
WHEN median_income_6eb619a2_avg >= 70000 THEN median_income_6eb619a2_avg-70000
END AS income_adj
FROM pop_stats_norm
WHERE geom IS NOT NULL AND median_income_6eb619a2_avg IS NOT NULL
),
normed_income AS (
SELECT *, ML.MIN_MAX_SCALER(income_adj) OVER() AS income_norm FROM income
WHERE geom IS NOT NULL
),
build_index AS (
SELECT *, (female_18_40_norm * 200) + (income_norm * 100) AS index_score
FROM normed_income
WHERE geom IS NOT NULL
),
build_index_norm AS (
SELECT *, ML.MIN_MAX_SCALER(index_score) OVER() * 100 as norm_score FROM build_index
WHERE geom IS NOT NULL
)
SELECT * FROM build_index_norm
Then, save the resulting source as a separate table. Navigate to the bottom-right of the SQL editor, select “Create table from query”. Select the CARTO Data Warehouse connection, and save the new table under “organization data”, “shared”. Name the table as “ooh_geosegmentation_starbucks_audience”. You can also use your own connection.
A new source and layer will load, so you can delete the previous one.
Style the resulting layer as seen below.
Set the colors to a custom color scale, as seen below.
Rename the layer to “Audience index score”. As we can see the score is highest in Lower Manhattan and Brooklyn, but there are also some interesting areas in Queens and Upper Manhattan.
You can load the work we have done so far separately, as we have already created this table. You can access it by navigating to the “demo_tables” folder in the Data Explorer, and creating a map using the “newyork_ooh_sample_audience_h3” table. Once you load the table, you can style as above.
16. Let’s create a tooltip for this layer, so that we can see the score, population and median income within each geography. Navigate to the Interactions section, and enable the tooltip for the latest layer. Configure the tooltip to show values when hovering, and add the variables as below:
17. We have our demographics score layer. It is now time to add the panel inventory. We already have a sample table with panels in New York and New Jersey, extracted from Open Street Maps. You can load only the panels located within our area of interest by intersecting the panel dataset with our latest layer. Create a new source and run the custom query below:
SELECT
t.*
FROM
`carto-demo-data.demo_tables.newyork_ooh_sample_audience_h3` n,
`carto-demo-data.demo_tables.newyork_newjersey_ooh_panels` t
WHERE
ST_INTERSECTS(n.geom,t.geom)
Rename the layer “Area of Interest panels”. Leave the default styling for now.
18. We now need to assign a score to each of the panels, to understand which are best located with respect to the audience we have constructed. To do that, we need to take into account a catchment area around all panels, and aggregate the scores of all hexagons overlapping with the defined catchment area. To start, let’s create the catchment areas around each panel. In this case we can simulate them as buffers of 500m (roughly a 5min walk).
Click on the three dots on the “Area of interest panels” source, and select “Add SQL analysis”. Select “Create Buffers” and click on Continue.
19. When the dialog box opens, select 0.5km as the distance of the buffers, we keep the number of tracts as 1 (one concentric buffer created), and keep the configuration for individual rather than combined buffers (we need to perform the analysis for each buffer individually). For more information on the Create Buffers analysis please visit the page here.
20. After finishing with the configuration of the analysis, click on “Run SQL analysis”. You will see we that now have buffers around each panel.
Rename the layer “Panel 500m buffers” and shift above the “Audience index score” layer to give prominence. Style as seen below.
21. Next we should intersect the panel buffers with the h3 layer to generate an average score for each panel buffer.
Click on the three dots on the “Panel 500m buffers” source, and select “Add SQL analysis”. Select “Intersect and aggregate” and click on Continue.
For more information on the Intersect and Aggregate analysis please visit the page here.
22. Next we should select the “Audience index score” as the second source we will be getting data from (first being the buffer source). We will aggregate normalized scores for each buffer using an averaging operation, rather than summing up, so that we can get a final score between 0 and 100 for each panel.
Click on “Run SQL analysis”. This action will create another layer. Leave the layer in its current styling for now. Click on the 3 dots on the right of the new layer and select “Show data table”. Explore the table and observe the resulting column for average normalized score.
23. Now that we have the score, let’s allocate it to the panels so that we can explore which ones have the highest score. For this we will use a third analysis, which is to join the “avg_norm_score” column to the Source with all panels.
Click on the three dots on the “Area of interest panels” source, and select “Add SQL analysis”. Select “Add columns from second source” and click on Continue.
For more information on the join analysis please visit the page here.
24. In the resulting dialog box, select the second source as the latest layer we have created, and configure column “full_id” in both tables as the key column as the one we will use to join. On the right hand side, keep all columns from the base source, and add column “avg_norm_score” from the second source.
Click on “Run SQL analysis”. Rename the resulting layer as “Panel audience score”. Keep only the layers and sources named “Audience index score” (third in prominence), “Panel 500m buffers” (second in prominence, hidden) and “Panel audience score” (first in prominence), delete the rest.
25. Style the “Panel audience score” layer as seen below
As we can see on the map, the panels with the highest scores are concentrated in Manhattan and Brooklyn, however we can take further steps to cherry pick the panels based on their score or on their type.
26. First let’s create a tooltip for the panels, to explore their characteristics and score. Navigate to the Interactions section, and enable the tooltip for the “Panel audience score” layer. Configure the tooltip to show values when hovering, and add the variables as below:
27. One way we can start narrowing down our inventory is to filter by panel type. Navigate to the Widgets section, create a new widget using Source “Panel audience score”. Create a categorical widget, COUNTING on the “panel type” column. Style as seen below.
28. Finally, and most importantly, we can filter by the constructed audience score. Create another widget, again on the “Panel audience score” source. Create a histogram widget on the “avg_norm_score” column.
29. Time to select some panels. Let’s say we want to focus on panels with a score of more than 30. You can see the result below. We have identified 65 panels, well-distributed across Manhattan. Most importantly we have eliminated a lot of the panels within Manhattan and Brooklyn which score less on our constructed score. In this way we are able to strategically select panels, cut costs and increase the likelihood of achieving a high ROI for our campaigns.
30. Change the name of the map to “Selecting out-of-home panels using geosegmentation”
31. Finally we can make the map public and share the link to anybody in the organization. For that you should go to “Share” on the top right corner and set the map as Public. For more details, see Publishing and sharing maps.
32. Finally, we can visualize the result.