Links

Analyzing Airbnb ratings in Los Angeles

Context

Airbnb was founded in 2008 and has already become a very popular service for travelers around the world.
Having a better understanding on which are the key variables for listings success could help improving the service, as well as detecting main factors that attract tourism in a certain area.
Users provide both an overall rating and more specific ratings on 6 variables: accuracy, communication, cleanliness, location, check in and value.
In this tutorial we will aim to extract useful insights of what the overall impression of Airbnb users depends on, by relating the overall rating score with different variables (specifically: value, cleanliness and location) while taking into account the geographical neighbors behavior through a Geographically Weighted Regression model.
Additionally, we’ll analyze more in-depth the areas where the location score drives the overall rating, and inspect sociodemographic attributes on these by enriching our visualization with data from the Data Observatory.

Steps to reproduce

Setting up

In this first step we will go through basic setup, including creating a CARTO account and importing the data that will be used for this tutorial.
  1. 1.
    Go to the new CARTO platform access page: https://app.carto.com
  2. 2.
    Create a new CARTO organization. Check this guide to get started.
  3. 3.
    The first time that you access the Workspace, you will see a Welcome banner with links providing quick access to different actions to get you started with CARTO, like creating your first connection or your first map.
  4. 4.
    From the Navigation Menu in the left panel, select Data Explorer.
  5. 5.
    To import the Airbnb listings dataset that we will be using, click on the upload icon and select URL, then input the following URL.
    Tip: Check this guide on importing data if it’s your first time importing data into CARTO.
    https://storage.googleapis.com/carto-academy-public-data/b02_pub_airbnb_reviews_gwr/01_listings_la_2021_5_reviews.geojson
    Import the file into your connection or CARTO Data Warehouse as 01_listings_la_2021_5_reviews.
    Note: The dataset used for this training corresponds with open data from Airbnb that has been pre-filtered specifically for this exercise.

Exploring Airbnb listings distribution through Spatial Indexes (H3)

We will inspect how Airbnb listings are distributed across Los Angeles and aggregate the raw data to have a better understanding on how different variables vary geographically within the city.
  1. 1.
    Inspect the data from the 01_listings_la_2021_5_reviews dataset view within the Data Explorer, then click on the Create map button.
  2. 2.
    Rename the map to Map 1 Airbnb initial data exploration. Then click on Layer 1 and apply the next style changes.
    • Name: Airbnb listings
    • Color: Dark yellow
    • Radius: 2,5 ``
  3. 3.
    With this SQL query we will create an H3 grid and aggregate the AirBnB listings into it by computing the average on our key variables. Read more information on Spatial Indexes such as H3 here.
    Add a new layer with source ‘Your connection’ and type ‘SQL query’. Input the SQL query below.
    Note: Replace carto-academy.b02_pub_airbnb_reviews_gwr.01_listings_la_2021_5_reviews with the project and dataset name where the 01_listings_la_2021_5_reviews had been imported.
    It is possible to get the qualified table name including the project and dataset name from the Data Explorer.
    WITH
    h3_airbnb AS (
    SELECT
    `carto-un`.carto.H3_FROMGEOGPOINT(geom,
    8) AS h3_id,
    *
    FROM
    `carto-academy.b02_pub_airbnb_reviews_gwr.01_listings_la_2021_5_reviews`),
    aggregated_h3 AS (
    SELECT
    h3_id,
    ROUND(AVG(price_num), 2) price,
    ROUND(AVG(review_scores_rating), 2) overall_rating,
    ROUND(AVG(review_scores_value), 2) value_vs_price_rating,
    ROUND(AVG(review_scores_cleanliness), 2) cleanliness_rating,
    ROUND(AVG(review_scores_location), 2) location_rating,
    COUNT(*) AS total_listings
    FROM
    h3_airbnb
    GROUP BY
    h3_id
    HAVING COUNT(*) > 3)
    SELECT
    * EXCEPT(h3_id),
    `carto-un`.carto.H3_BOUNDARY(h3_id) AS geom
    FROM
    aggregated_h3
  4. 4.
    Style the new layer.
    • Name: H3 Airbnb aggregation
    • Order in display: 2
    • Fill color: 10 steps blue-red ramp based on column price_num
    • No stroke
    • Toggle the Height button and style this parameter using:
      • Method: sqrt
      • Value: 20
      • Column: total_listings
Inspect the map results carefully. Notice where most listings are located and where the areas with highest prices are.
Optionally, play with different variables and color ramps.

Estimating variables influence on the overall rating score

Next we will apply a Geospatially Weighted Regression (GWR) model using the GWR_GRID function to our Airbnb H3 aggregated data. We’ve already seen where different variables rate higher on our previous map.
This model will allow us to extract insights of what the overall impression of Airbnb users depends on, by relating the overall rating score with different variables (specifically we will use: value, cleanliness and location)
We will also visualize where the location score variable significantly influences the ‘Overall rating’ result.
  1. 1.
    To save map results and continue working on a separate map, lets duplicate the map, disable the 3D view and rename the map copy to Map 2 GWR Model map
  2. 2.
    (Optional) Run the model in your Data Warehouse
    Using the CARTO Analytics Toolbox in your Google BigQuery console, run the GWR model using a materialized version of the H3 aggregation SQL query that we applied before as input. Choose value_vs_price_rating, cleanliness_rating and location_rating as input variables and overall_rating as the target variable. All of that means the following query
    CALL `carto-un`.carto.GWR_GRID(
    'carto-academy.b02_pub_airbnb_reviews_gwr.02_listings_la_2021_5_reviews_h3_z8_agg',
    ['value_vs_price_rating', 'cleanliness_rating', 'location_rating'], -- [ different ratings features ]
    'overall_rating', -- overall rating (target variable)
    'h3_z8', 'h3', 3, 'gaussian', TRUE,
    NULL
    )
    Once you’ve run this query in your Google BigQuery console, feel free to save it or simply use our materialized results
  3. 3.
    Add a new layer with source ‘Your connection’ and type ‘SQL query’. Choose h3 as the sptial data type. Now use the results of the GWR model with the following query:
    SELECT
    h3_z8 as h3,
    value_vs_price_rating_coef_estimate,
    cleanliness_rating_coef_estimate,
    location_rating_coef_estimate,
    intercept
    FROM `cartobq.docs.airbnb_la_h3_gwr`
    Where cartobq.docs.airbnb_la_h3_gwr is the result from step 11. Run this query.
  4. 4.
    Style the layer.
    • Name: Location relevance (Model)
    • Order: 3
    • Fill Color: 10 steps blue-red ramp based on location_rating_coef_estimate
    • No stroke
    Optionally, style the layer by different attributes.
  5. 5.
    Change the basemap to Google Maps Roadmap basemap.
  6. 6.
    Click on the Dual map view button to toggle the split map option.
    • Left map: disable the Location relevance (Model) and Airbnb listings layers
    • Right map: disable the H3 AirBnB aggregation and Airbnb listings layers
    The map result would be similar to the following.
Inspect the model results in detail to understand where the location matters the most for users' overall rating score and how the location rating values are distributed.
Style the map layers depending on other variables to have a better understanding on how different variables influence model results.

Enriching the visualization with a Data Observatory Tileset

So far we have seen how the Airbnb listings locations and its main variables are distributed across the city of Los Angeles. Next, we will try to combine this information with additional data by adding another source to our map: the Spatial Features H3 Resolution 8 dataset from the CARTO Data Observatory.
This dataset holds information that can be useful to explore the influence of different factors, including variables such as the total population, the urbanity level or the presence of certain type of points of interests in different areas.
We will use CARTO Analytics Toolbox BigQuery Tiler to create a Tileset, a special type of table that allows visualizing large spatial datasets such as this one.
  1. 1.
    To save map results and continue working on a separate map, let’s duplicate the previous map once again, and disable the dual map view (close the left panel), then rename the map copy to Map 3 Airbnb Spatial Features
  2. 2.
    From the main menu, click on ‘Data Observatory’ to browse the Spatial Data Catalog and apply these filters:
    • Countries: United States of America
    • Licenses: Public data
    • Sources: CARTO
    Select the Spatial Features - United States of America (H3 Resolution 8) dataset and click on Subscribe for free. This action will redirect us to the subscription level at the Data Explorer menu.
  3. 3.
    From the subscription level at the Data Explorer menu, click on the Create button, then select ‘Create a tileset’ and complete the steps with the following settings.
    • Output tileset name: cdb_spatial_fea_94e6b1f
    • Zoom: 9-12
    • Columns: geoid, population, tourism and urbanity
  4. 4.
    Once the tileset layer has been added, rename the layer to Spatial Features and zoom into the Los Angeles area.
    Once the Tileset has been created, we can add it to our map. To do so first open the map and then click on Add source from… and select the tileset from the tree menu.
  5. 5.
    Style the layer to have color opacity 0 in order to keep it hidden while displaying the information in the pop-up and the widgets that we will add next.
    Tip: Optionally, style the layer as desired to visualize how different variables behave across the territory.
  6. 6.
    Add the following widgets to the map
    • Listings summary
      • Layer: A Airbnb listings
        • Type: Table
        • Columns: review_scores_cleanliness, review_scores_location, review_scores_value, review_scores_rating and price_num
      • Population Spatial Features
        • Layer: C Spatial Features
        • Type: Formula
        • Operation: SUM
        • Column: population
      • Tourism POIs
        • Layer: C Spatial Features
        • Type: Formula
        • Operation: SUM
        • Column: tourism
      • Urbanity level
        • Layer: C Spatial Features
        • Type: Category
        • Operation: COUNT
        • Column: urbanity
      Navigate the map and observe how widget values vary depending on the viewport area. Check out specific areas by hovering over them and review pop-up attributes.
      See how the final map would look like here.
  7. 7.
    Optionally, use the Lasso tool to create geometries and filter more specific areas of interest.
  8. 8.
    Finally we can make the map public and share the link to anybody in the organization. For that you should go to “Share” on the top right corner and set the map as Public. For more details, see Publishing and sharing maps.