Analyzing Airbnb ratings in Los Angeles
Airbnb was founded in 2008 and has already become a very popular service for travelers around the world.
Having a better understanding on which are the key variables for listings success could help improving the service, as well as detecting main factors that attract tourism in a certain area.
Users provide both an overall rating and more specific ratings on 6 variables: accuracy, communication, cleanliness, location, check in and value.
In this tutorial we will aim to extract useful insights of what the overall impression of Airbnb users depends on, by relating the overall rating score with different variables (specifically: value, cleanliness and location) while taking into account the geographical neighbors behavior through a Geographically Weighted Regression model.
Additionally, we’ll analyze more in-depth the areas where the location score drives the overall rating, and inspect sociodemographic attributes on these by enriching our visualization with data from the Data Observatory.
In this first step we will go through basic setup, including creating a CARTO account and importing the data that will be used for this tutorial.
- 1.
- 2.
- 3.The first time that you access the Workspace, you will see a Welcome banner with links providing quick access to different actions to get you started with CARTO, like creating your first connection or your first map.
- 4.From the Navigation Menu in the left panel, select Data Explorer.
- 5.To import the Airbnb listings dataset that we will be using, click on the upload icon and select URL, then input the following URL.https://storage.googleapis.com/carto-academy-public-data/b02_pub_airbnb_reviews_gwr/01_listings_la_2021_5_reviews.geojsonImport the file into your connection or CARTO Data Warehouse as
01_listings_la_2021_5_reviews
.Note: The dataset used for this training corresponds with open data from Airbnb that has been pre-filtered specifically for this exercise.
We will inspect how Airbnb listings are distributed across Los Angeles and aggregate the raw data to have a better understanding on how different variables vary geographically within the city.
- 1.Inspect the data from the
01_listings_la_2021_5_reviews
dataset view within the Data Explorer, then click on the Create map button. - 2.Rename the map to
Map 1 Airbnb initial data exploration
. Then click on Layer 1 and apply the next style changes.- Name:
Airbnb listings
- Color: Dark yellow
- Radius:
2,5
``
- 3.With this SQL query we will create an H3 grid and aggregate the AirBnB listings into it by computing the average on our key variables. Read more information on Spatial Indexes such as H3 here.Add a new layer with source ‘Your connection’ and type ‘SQL query’. Input the SQL query below.Note: Replace
carto-academy.b02_pub_airbnb_reviews_gwr.01_listings_la_2021_5_reviews
with the project and dataset name where the01_listings_la_2021_5_reviews
had been imported.It is possible to get the qualified table name including the project and dataset name from the Data Explorer.WITHh3_airbnb AS (SELECT`carto-un`.carto.H3_FROMGEOGPOINT(geom,8) AS h3_id,*FROM`carto-academy.b02_pub_airbnb_reviews_gwr.01_listings_la_2021_5_reviews`),aggregated_h3 AS (SELECTh3_id,ROUND(AVG(price_num), 2) price,ROUND(AVG(review_scores_rating), 2) overall_rating,ROUND(AVG(review_scores_value), 2) value_vs_price_rating,ROUND(AVG(review_scores_cleanliness), 2) cleanliness_rating,ROUND(AVG(review_scores_location), 2) location_rating,COUNT(*) AS total_listingsFROMh3_airbnbGROUP BYh3_idHAVING COUNT(*) > 3)SELECT* EXCEPT(h3_id),`carto-un`.carto.H3_BOUNDARY(h3_id) AS geomFROMaggregated_h3 - 4.Style the new layer.
- Name:
H3 Airbnb aggregation
- Order in display: 2
- Fill color: 10 steps blue-red ramp based on column
price_num
- No stroke
- Toggle the Height button and style this parameter using:
- Method:
sqrt
- Value:
20
- Column:
total_listings
Inspect the map results carefully. Notice where most listings are located and where the areas with highest prices are.
Optionally, play with different variables and color ramps.
Next we will apply a Geospatially Weighted Regression (GWR) model using the GWR_GRID function to our Airbnb H3 aggregated data. We’ve already seen where different variables rate higher on our previous map.
This model will allow us to extract insights of what the overall impression of Airbnb users depends on, by relating the overall rating score with different variables (specifically we will use: value, cleanliness and location)
We will also visualize where the location score variable significantly influences the ‘Overall rating’ result.
- 1.To save map results and continue working on a separate map, lets duplicate the map, disable the 3D view and rename the map copy to
Map 2 GWR Model map
- 2.(Optional) Run the model in your Data WarehouseUsing the CARTO Analytics Toolbox in your Google BigQuery console, run the GWR model using a materialized version of the H3 aggregation SQL query that we applied before as input. Choose
value_vs_price_rating
,cleanliness_rating
andlocation_rating
as input variables andoverall_rating
as the target variable. All of that means the following queryCALL `carto-un`.carto.GWR_GRID('carto-academy.b02_pub_airbnb_reviews_gwr.02_listings_la_2021_5_reviews_h3_z8_agg',['value_vs_price_rating', 'cleanliness_rating', 'location_rating'], -- [ different ratings features ]'overall_rating', -- overall rating (target variable)'h3_z8', 'h3', 3, 'gaussian', TRUE,NULL)Once you’ve run this query in your Google BigQuery console, feel free to save it or simply use our materialized results - 3.Add a new layer with source ‘Your connection’ and type ‘SQL query’. Choose
h3
as the sptial data type. Now use the results of the GWR model with the following query:SELECTh3_z8 as h3,value_vs_price_rating_coef_estimate,cleanliness_rating_coef_estimate,location_rating_coef_estimate,interceptFROM `cartobq.docs.airbnb_la_h3_gwr`Wherecartobq.docs.airbnb_la_h3_gwr
is the result from step 11. Run this query. - 4.Style the layer.
- Name:
Location relevance (Model)
- Order: 3
- Fill Color: 10 steps blue-red ramp based on
location_rating_coef_estimate
- No stroke
Optionally, style the layer by different attributes. - 5.Change the basemap to Google Maps Roadmap basemap.
- 6.Click on the Dual map view button to toggle the split map option.
- Left map: disable the
Location relevance (Model)
andAirbnb listings
layers - Right map: disable the
H3 AirBnB aggregation
andAirbnb listings
layers
The map result would be similar to the following.
Inspect the model results in detail to understand where the location matters the most for users' overall rating score and how the location rating values are distributed.
Style the map layers depending on other variables to have a better understanding on how different variables influence model results.
So far we have seen how the Airbnb listings locations and its main variables are distributed across the city of Los Angeles. Next, we will try to combine this information with additional data by adding another source to our map: the
Spatial Features H3 Resolution 8
dataset from the CARTO Data Observatory.This dataset holds information that can be useful to explore the influence of different factors, including variables such as the total population, the urbanity level or the presence of certain type of points of interests in different areas.
We will use CARTO Analytics Toolbox BigQuery Tiler to create a Tileset, a special type of table that allows visualizing large spatial datasets such as this one.
- 1.To save map results and continue working on a separate map, let’s duplicate the previous map once again, and disable the dual map view (close the left panel), then rename the map copy to
Map 3 Airbnb Spatial Features
- 2.From the main menu, click on ‘Data Observatory’ to browse the Spatial Data Catalog and apply these filters:
- Countries: United States of America
- Licenses: Public data
- Sources: CARTO
Select theSpatial Features - United States of America (H3 Resolution 8)
dataset and click on Subscribe for free. This action will redirect us to the subscription level at the Data Explorer menu. - 3.From the subscription level at the Data Explorer menu, click on the Create button, then select ‘Create a tileset’ and complete the steps with the following settings.
- Output tileset name:
cdb_spatial_fea_94e6b1f
- Zoom: 9-12
- Columns:
geoid
,population
,tourism
andurbanity
- 4.Once the tileset layer has been added, rename the layer to
Spatial Features
and zoom into the Los Angeles area.Once the Tileset has been created, we can add it to our map. To do so first open the map and then click on Add source from… and select the tileset from the tree menu. - 5.Style the layer to have color opacity
0
in order to keep it hidden while displaying the information in the pop-up and the widgets that we will add next.Tip: Optionally, style the layer as desired to visualize how different variables behave across the territory. - 6.
- Listings summary
- Layer: A Airbnb listings
- Type: Table
- Columns:
review_scores_cleanliness
,review_scores_location
,review_scores_value
,review_scores_rating
andprice_num
- Population Spatial Features
- Layer: C Spatial Features
- Type: Formula
- Operation:
SUM
- Column:
population
- Tourism POIs
- Layer: C Spatial Features
- Type: Formula
- Operation:
SUM
- Column:
tourism
- Urbanity level
- Layer: C Spatial Features
- Type: Category
- Operation:
COUNT
- Column:
urbanity
Navigate the map and observe how widget values vary depending on the viewport area. Check out specific areas by hovering over them and review pop-up attributes.
- 7.Optionally, use the Lasso tool to create geometries and filter more specific areas of interest.
- 8.Finally we can make the map public and share the link to anybody in the organization. For that you should go to “Share” on the top right corner and set the map as Public. For more details, see Publishing and sharing maps.