Loading data in-memory is a quick and convenient way to explore smaller datasets. When you work with larger datasets with millions of rows, it can be challenging to load and transform the data. In this talk, we’ll walk through available tools that make it easy to handle very large datasets.
This talk provides an overview of the most commonly used Python packages for handling very large data. We’ll cover Polars, Vaex, and Pandas 2.0 and walk through the difference between eager and lazy evaluation.
I’m a senior data scientist at Shopify and I’m based in Toronto, Canada. I’m a big fan of Python and love the community that surrounds it. Outside of work, I love to play tennis, going on long walks outside with my dog Ziggy, and write posts for my work-in-progress blog - Normally Distributed.