Ravi Shekhar's Technical Blog

A Technical Blog of the Data Science Process


I am a Physicist by training, and to answer scientific questions, I have picked up quite a bit of data wrangling, numerical computation, and statistical skills along the way. I have primarily worked with the Python scientific stack since 2003, analyzing hundreds of terabytes across supercomputers and clusters. This blog is aimed at

  • Analyzing, exploring, and visualizing specific datasets.
  • Sharing useful tips for data and software related tasks (Python, SQL, AWS, Spark, etc.).
  • Occasionally delving into a computational or statistical technique.

I will complete my Ph.D. in Geophysics from Yale University in July 2017. Before my Ph.D., I did my Master's in High Energy Particle Physics at Duke University through Fermi National Lab, and worked as a researcher at CERN. I am currently seeking career opportunities as a Data Scientist within the U.S.

You can find some public code on my Github


  • B.S.E.. Physics Engineering, Case Western Reserve University
  • M.S.. Physics, Duke University
  • Ph.D.. Geophysics, Yale University (Completed July 2017; Awarded December 2017)

My Ph.D. dissertation is focused on understanding year-to-year variations of monsoon rainfall over the Sahel region of Africa and the relationship of rainfall to the climate over the Sahara desert. Using observed historical data and computational atmospheric models, my research has shown that the canonical idealized theories of the tropical atmosphere, Convective Quasi-Equilibrium, and the Atmospheric Energy Budget, have limited quantitative utility over Africa for explaining year-to-year rainfall variations.

My Master's dissertation was focused on a search for the Higgs Boson in the ZH Dilepton decay channel at Fermi National Lab's (now decommissioned) Tevatron. Using a novel likelihood fitting technique with advanced numerical Monte Carlo integration, I decreased the width of the confidence interval by 30 percent, better constraining the Higgs boson, which was later discovered at the Large Hadron Collider (LHC). I also worked with collaborators to use a neural network optimized with a genetic algorithm to improve measurements of the mass of the top quark.

For my publication list, please see my Google Scholar page