Let’s plot some points on maps!

1 Balls Header

KEY QUESTION: Given a list of addresses, how can I visualize them as points on a map?

DESIRED OUTPUT: A map of the addresses

CATEGORY OF ANALYSIS: Geospatial analysis, data visualization

LEVEL OF DIFFICULTY: Easy

ESTIMATED SCRIPT TIME: less than 4 hours

DATA SOURCE: City of Toronto, Cultural Spaces

 

CONTEXT AND DESCRIPTION: You have a list of addresses: Maybe it is a list of establishments in a city. Maybe it is a list of customers and their addresses. Maybe each address represents an event (e.g. crime incident) that occurred there. Now you want to see them on a map. Maybe you want to see if there are any patterns. Which areas have high/low concentrations of points? Are the points evenly distributed? Are the points sparsely distributed? Are the points distributed in clusters? If so, what is the shape of the clusters? Are the points clustered in groups? Are they clustered in lines?

In this example, we use the City of Toronto’s open dataset of cultural spaces in the city. We plot each address as a point on the map to visualize the distribution of points throughout the city.

VALUE ADDED: Geospatial analysis offers many benefits, including a basic plot of points. Many technical and non-technical people can interpret and digest the information on a map very easily. Patterns on the distribution and clustering of points can provide many insights. If you understand the distribution and clustering of points, you can use this information to answer many questions. Where are my customers coming from? How can position my advertising in the most optimal location? Which areas have too many buildings or not enough buildings?  Which areas experience a lot of crime?

METHODOLOGY: Using R/R Studio, I programmed a script using the classic ggmap and ggplot2 libraries. The key ideas are to use the geocode function to assign latitude and longitude values for each address and then use geom_point and ggmap function to visualize the information in an aesthetically pleasing way.

Here is the script:

Figure 1: R script of ggmap script

Rscript

And voila! Here is one of the outputted maps:

Figure 2: Map of Toronto’s Cultural Spaces

Rplot

 

This is a very basic example of geospatial analysis: it plots points on a map. This R library package offers many features. You can play around with colours, dimensions, shapes, transparencies, legends, etc. Heat maps or “choropleth maps” are also very interesting to see as they as show gradients of concentrations.

What are your thoughts? Do you find business value in this kind of visualization? Is the R script easy-to-understand? What can be improved? Leave a comment!

 

EDIT: 2017-04-13 – As pointed out by Stephan in the comments, Google has a limit on how much you are permitted to geocode each day. If you go over the limit, you will receive an “OVER_QUERY_LIMIT” error. I believe the threshold is 2,500 records. In the code, there is a “write.csv” line because it would be prudent to save the records that you already geocoded. Since geocoding is very time-consuming and you have a finite amount you can run, then we should avoid duplicating work. There is no point in geocoding twice! Each day, you (or a group of your colleagues) can run the script to geocode large amounts of records.

The .xlsx file from the Toronto website was saved as .csv in the beginning before reading into R.
The other warnings are from the fact that Google could not geocode those records.

2 thoughts on “Let’s plot some points on maps!”

  1. Hi Patrick,

    Many thanks for this very helpful inspiraton on how to use ggmap. It’s been the first time that I’ve used this package and it’s been very interesting, especially the geocode function.

    Regarding the code: I got 17 warnings (one of them OVER_QUERY_LIMIT) while running the geocode line, did you get them as well? Maybe you could indicate this in your text as well? Furthermore, I wasn’t able to find the .csv on the linked website so I used the .xlsx and it seems like your plot is using zoom=14 instead of zoom=13 as mentioned in the code?

    Again, thanks for this inspiring post – I’m looking forward to the next.

    Best wishes from Germany
    Stephan

    Liked by 1 person

    1. Hi Stephan,

      Thank you for the helpful comment! I will edit my post to address your multiple points.

      Regarding the “OVER_QUERY_LIMIT” – Google has a limit on how much you are permitted to geocode each day. I believe the threshold is 2,500 records. In the code, there is a “write.csv” line because it would be prudent to save the records that you already geocoded. Since geocoding is very time-consuming and you have a finite amount you can run, then we should avoid duplicating work. There is no point in geocoding twice! Each day, you (or a group of your colleagues) can run the script to geocode large amounts of records.

      I also forgot to mention that the .xlsx file was saved as .csv in the beginning.

      If you write warnings(), you can see all of the warnings from R. In my warnings, most are from the fact that Google could not geocode those records.

      Thank you again for the wonderful feedback. After working with this function or a long time, I forgot about some of these details and so I appreciate you pointing them out.

      Best wishes from Canada,
      Patrick Roncal

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s