CARTO User Manual

CARTO User Manual

Go back

Analyzing Airbnb ratings in Los Angeles

Context

Airbnb was founded in 2008 and has already become a very popular service for travellers around the world.

Having a better understanding on which are the key variables for listings success could help improving the service, as well as detecting main factors that attract tourism in a certain area.

Users provide both an overall rating and more specific ratings on 6 variables: accuracy, communication, cleanliness, location, check in and value.

In this tutorial we will aim to extract useful insights of what the overall impression of Airbnb users depends on, by relating the overall rating score with different variables (specifically: value, cleanliness and location) while taking into account the geographical neighbours behavior through a Geographically Weighted Regression model.

Additionally, we’ll analyze more in-depth the areas where the location score drives the overall rating, and inspect sociodemographic attributes on these by enriching our visualization with data from the Data Observatory.

Steps to reproduce

Setting up

In this first step we will go through basic setup, including creating a CARTO account and importing the data that will be used for this tutorial.

  1. Go to the new CARTO platform access page: https://app.carto.com

    Log in Email and password

  2. Create a new CARTO organization. Check this guide to get started.

  3. The first time that you access the Workspace, you will see a Welcome banner with links providing quick access to different actions to get you started with CARTO, like creating your first connection or your first map.

    Welcome banner Homepage first new landing

  4. From the Navigation Menu in the left panel, select Data Explorer.

    Menu features data explorer

  5. To import the Airbnb listings dataset that we will be using, click on the upload icon and select URL, then input the following URL.

1
https://storage.googleapis.com/carto-academy-public-data/b02_pub_airbnb_reviews_gwr/01_listings_la_2021_5_reviews.geojson

Import file from URL

Import the file into your connection or CARTO Data Warehouse as 01_listings_la_2021_5_reviews.

Exploring Airbnb listings distribution through Spatial Indexes (H3)

We will inspect how Airbnb listings are distributed accross Los Angeles and aggregate the raw data to have a better understanding on how different variables vary geographically within the city.

  1. Inspect the data from the 01_listings_la_2021_5_reviews dataset view within the Data Explorer, then click on the Create map button.

  2. Rename the map to Map 1 Airbnb initial data exploration. Then click on Layer 1 and apply the next style changes.

    • Name: Airbnb listings
    • Color: Dark yellow
    • Radius: 2,5

Style map layer

  1. Add a new layer with source ‘Your connection’ and type ‘SQL query’. Input the SQL query below.

With this SQL query we will create an H3 grid and aggregate the AirBnB listings into it by computing the average on our key variables. Read more information on Spatial Indexes such as H3 here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
WITH
  h3_airbnb AS (
  SELECT
    `carto-un`.carto.H3_FROMGEOGPOINT(geom,
      8) AS h3_id,
      *
  FROM
    `carto-academy.b02_pub_airbnb_reviews_gwr.01_listings_la_2021_5_reviews`),
  aggregated_h3 AS (
  SELECT
    h3_id,
    ROUND(AVG(price_num), 2) price,
    ROUND(AVG(review_scores_rating), 2) overall_rating,
    ROUND(AVG(review_scores_value), 2) value_vs_price_rating,
    ROUND(AVG(review_scores_cleanliness), 2) cleanliness_rating,
    ROUND(AVG(review_scores_location), 2) location_rating,
    COUNT(*) AS total_listings
  FROM
    h3_airbnb
  GROUP BY
    h3_id
	HAVING COUNT(*) > 3)
SELECT
  * EXCEPT(h3_id),
  `carto-un`.carto.H3_BOUNDARY(h3_id) AS geom
FROM
  aggregated_h3
  1. Style the new layer.

    • Name: H3 Airbnb aggregation

    • Order in display: 2

    • Fill color: 10 steps blue-red ramp based on column price_num

    • No stroke

    • Toggle the Height button and style this parameter using:

      • Method: sqrt
      • Value: 20
      • Column: total_listings

Inspect the map results carefully. Notice where most listings are located and where the areas with highest prices are.

Optionally, play with different variables and color ramps.

Estimating variables influence on the overall rating score

Next we will apply a Geospatially Weighted Regression (GWR) model using the GWR_GRID function to our Airbnb H3 aggregated data. We’ve already seen where different variables rate higher on our previous map.

This model will allow us to extract insights of what the overall impression of Airbnb users depends on, by relating the overall rating score with different variables (specifically we will use: value, cleanliness and location)

We will also visualize where the location score variable significantly influences the ‘Overall rating’ result.

  1. To save map results and continue working on a separate map, lets duplicate the map, disable the 3D view and rename the map copy to Map 2 GWR Model map

    Map duplicate

  2. Add a new layer with source ‘Your connection’ and type ‘SQL query’. Input the following SQL query

We will use a materialized version of the H3 aggregation SQL query that we applied before as input, choose value_vs_price_rating, cleanliness_rating and location_rating as input variables and overall_rating as the target variable.

1
2
3
4
5
6
7
CALL `carto-un`.carto.GWR_GRID(
    'carto-academy.b02_pub_airbnb_reviews_gwr.02_listings_la_2021_5_reviews_h3_z8_agg',
    ['value_vs_price_rating', 'cleanliness_rating', 'location_rating'], -- [ different ratings features ]
    'overall_rating', -- overall rating (target variable)
    'h3_z8', 'h3', 3, 'gaussian', TRUE,
    NULL
)
  1. Style the layer.

    • Name: Location relevance (Model)
    • Order: 3
    • Fill Color: 10 steps blue-red ramp based on location_rating_coef_estimate
    • No stroke

Optionally, style the layer by different attributes.

  1. Change the basemap to Google Maps Roadmap basemap.

    Google Basemap Roadmap change

  2. Click on the Dual map view button to toggle the split map option.

    Dual map view button

    • Left map: disable the Location relevance (Model) and Airbnb listings layers
    • Right map: disable the H3 AirBnB aggregation and Airbnb listings layers

    Visible layers dual map

    The map result would be similar to the following.

    Inspect the model results in detail to understand where the location matters the most for users' overall rating score and how the location rating values are distributed.

Enriching the visualization with a Data Observatory Tileset

So far we have seen how the Airbnb listings locations and its main variables are distributed across the city of Los Angeles. Next, we will try to combine this information with additional data by adding another source to our map: the Spatial Features H3 Resolution 8 dataset from the CARTO Data Observatory.

This dataset holds information that can be useful to explore the influence of different factors, including variables such as the total population, the urbanity level or the presence of certain type of points of interests in different areas.

We will use CARTO Analytics Toolbox BigQuery Tiler to create a Tileset, a special type of table that allows visualizing large spatial datasets such as this one.

  1. To save map results and continue working on a separate map, let’s duplicate the previous map once again, and disable the dual map view (close the left panel), then rename the map copy to Map 3 Airbnb Spatial Features

  2. From the main menu, click on ‘Data Observatory’ to browse the Spatial Data Catalog and apply these filters:

    • Countries: United States of America
    • Licenses: Public data
    • Sources: CARTO

Select the Spatial Features - United States of America (H3 Resolution 8) dataset and click on Subscribe for free. This action will redirect us to the subscription level at the Data Explorer menu.

Subscribe Spatial Features

  1. From the subscription level at the Data Explorer menu, click on the Create button, then select ‘Create a tileset’ and complete the steps with the following settings.

    • Output tileset name: cdb_spatial_fea_94e6b1f
    • Zoom: 9-12
    • Columns: geoid, population, tourism and urbanity

    Tileset Spatial Features

  2. Once the Tileset has been created, we can add it to our map. To do so first open the map and then click on Add source from… and select the tileset from the tree menu.

    Add Tileset source

Once the tileset layer has been added, rename the layer to Spatial Features and zoom into the Los Angeles area.

  1. Style the layer to have color opacity 0 in order to keep it hidden while displaying the information in the pop-up and the widgets that we will add next.
  1. Add the following widgets to the map
  • Listings summary

    • Layer: A Airbnb listings
    • Type: Table
    • Columns: review_scores_cleanliness, review_scores_location, review_scores_value, review_scores_rating and price_num
  • Population Spatial Features

    • Layer: C Spatial Features
    • Type: Formula
    • Operation: SUM
    • Column: population
  • Tourism POIs

    • Layer: C Spatial Features
    • Type: Formula
    • Operation: SUM
    • Column: tourism
  • Urbanity level

    • Layer: C Spatial Features
    • Type: Category
    • Operation: COUNT
    • Column: urbanity

Navigate the map and observe how widget values vary depending on the viewport area. Check out specific areas by hovering over them and review pop-up attributes.

See how the final map would look like here.

  1. Optionally, use the Lasso tool to create geometries and filter more specific areas of interest.

Lasso Tool filter

  1. Finally we can make the map public and share the link to anybody in the organization. For that you should go to “Share” on the top right corner and set the map as Public. For more details, see Publishing and sharing maps.

    Sharing map