- 2017 Jun 01
- Data Analysis
- #python, #Dask
Note: This post has interactive Bokeh graphics which may not render well on mobile devices. Try viewing the Jupyter notebook which underlies this post on NBViewer.
Part 1 : A Gentle Introduction to the Spatial Join¶
One problem I came across when analyzing the New York City Taxi Dataset, is that from 2009 to June 2016, both the starting and stopping locations of taxi trips were given as longitude and latitude points. After July 2016, to provide a degree of anonymity when releasing data to the public, the Taxi and Limousine Commission (TLC) only provides the starting and ending "taxi zones" of a trip, and a shapefile that specifies the boundaries, available here. Let's load this up in Geopandas, and set the coordinate system to 'epsg:4326', which is latitude and longitude coordinates.
We see that the geometry column consists of polygons (from Shapely) that have vertices defined by longitude and latitude points. Let's plot using bokeh, in order of ascending LocationID.