Polars – the new bear in town

We all love and cherish pandas. However, there is a new bear in town: polars. It’s written in rust and claims to be better and faster though heavily influenced by pandas. Let’s evaluate these bears in terms of syntax readability and performance by using a simple DataFrame example.

I downloaded a basic world cities dataset as a csv and will use it to demonstrate the capabilities of both libraries.

First let’s read the csv:


import pandas as pd

df = pd.read_csv(
    usecols=["city", "population", "country"],
    dtype={"population": "float32"},


import polars as pl
df = pl.read_csv(
    columns=["city", "population", "country"],
    dtypes={"population": pl.Float32},

They seem almost identical in syntax, both are readable. However, the performance suggests otherwise:

Pandas read_csv: 0.07 seconds
Polars read_csv: 0.01 seconds

Now let’s perform some operations:

I want the following from this dataset:

  • Find top 5 most populated cities with population greater than a million for each country
  • For each country get the total population and average population of selected cities

This to me is a pretty realistic example. Let’s do it in Pandas first:

threshold = 1000000
df_pd = (
    df.query("population > @threshold")  # pandas can use variables in query with @ in front
    .sort_values("population", ascending=False)  # pandas has ascending = True as default
    .head(5)  # can use filter aggregate "head" on groupby
    .agg(population_sum=("population", "sum"), population_mean=("population", "mean"))  # create new columns using aggregate functions
    .sort_values("population_sum", ascending=False)
    .reset_index()  # reset index needed as groupby indexes by group name


and now with Polars:

df_pl = (
    df.filter(pl.col("population") > threshold)  # filter is similar to query but you can use variables directly
    .sort("population", descending=True)  # descending is the default sort in polars
    .head(5)  # same as pandas
    )  # this can be done in many ways in polars but I chose this to demonstrate the suffix usage which is neat

    .with_row_index()  # polars does not use index! so we need to add it manually if needed

The outputs:



and wow what a beautiful output from Polars! Pretty CLI output is the way to my heart.

and in performance Polars beats Pandas yet again:

Pandas: 0.007008 seconds
Polars: 0.003381 seconds

Conclusion: In this blog we saw example usages of Pandas and Polars libraries. If the performance is a bottleneck in your pandas operations, Polars is 100% worth a try! The output is wonderful and no index/multiindex means less errors for the user. There is a great section in Polars documentation for migrating from Pandas which is worth a read if you’re willing to make the switch.

By Zeynep Bicer
Categorized as blog

Leave a comment

Your email address will not be published. Required fields are marked *