I produce glowing visualizations of all 1.3 billion NYC Taxi trips, using Dask and Datashader. I also make a cleaned version of the Taxi Dataset available in Parquet format.
An Introduction to the spatial join and its application at scale on the New York City Taxi Dataset using GeoPandas and Dask.
A step by step tutorial for setting up a reproducible data science environment with Python, R, Apache Spark, and Docker on Ubuntu 16.04 LTS. All steps valid for both local bare-metal installs and on cloud services like Amazon EC2.