Polars versus pandas in 2025: the real-world practice
Actualizado: 2026-05-03
Polars has been pitching itself as the natural successor to pandas for two years, and until this year that succession was more hope than reality. With Polars 1.0 released in 2024 and the 1.x series now stable, the project is in a different place: it has a frozen API, serious documentation, data-ecosystem integrations, and a community large enough to resolve concrete questions without waiting for the original author.
Key takeaways
- Polars is 3–10× faster than pandas on common aggregation, join and filter operations over 1–20 GB parquet datasets.
- Memory savings are 40–60 % vs pandas for the same dataset, thanks to native Arrow columns.
- Interoperability via Apache Arrow (near-zero conversion cost between both) removes the pressure to migrate all at once.
- The most important semantic difference: Polars has no index — everything is done with explicit column joins.
- Practical rule: if the dataset passes 100 MB or the pipeline runs in daily production, Polars is worth it.
What has changed since 2023
The first time I tried Polars, in 2023, I had the impression it was faster but immature. Minor functions were missing, the documentation assumed you came from Rust, and error messages were technical. None of that describes the 2025 version:
- The 1.x API is frozen with a clear forward-compatibility commitment.
- Documentation has examples for each function compared against the equivalent pandas API.
- Error messages have improved noticeably.
The other important difference is the arrival of the mature lazy API. In pandas each operation evaluates immediately, which prevents global optimisations. Polars has deferred execution by default when used with LazyFrame: you define the transformation graph, the optimiser reorders and fuses operations, and execution only happens when you ask for the result.
Performance measured, not promised
Polars’s synthetic benchmarks are impressive but noisy. What matters is performance on real tasks. The numbers I’ve measured with datasets between 1 and 20 GB in parquet are consistent in one direction: Polars is 3 to 10× faster than pandas on common aggregation, join and filter operations.
The improvement is larger when the operation includes large groupings or joins over millions of rows:
- In a daily sales data cleaning pipeline, migrating from pandas to Polars dropped execution time from 42 minutes to 6.
- In a smaller pipeline around 500 MB, improvement was 28 seconds to 9.
Memory savings are also real. Polars uses native Arrow columns with efficient types, while pandas drags numpy’s legacy with types like object for strings. In my tests, Polars typically uses 40–60 % of the memory pandas needs for the same dataset.
Interoperability via Arrow
The piece that makes coexisting with both tools viable is Apache Arrow. Both pandas and Polars can read and write Arrow natively, and conversion between them has near-zero cost thanks to zero-copy.
In practice this means a script can start reading data with pandas if it’s more convenient, convert to Polars for expensive operations, and return to pandas to pass the result to a library that only accepts pandas DataFrames.
Convertibility removes the pressure to migrate all at once. In real projects what I’ve done is migrate the hot loops, the parts consuming 90 % of the time, to Polars, and leave the rest in pandas.
There’s a subtlety with types. Pandas has some types like datetime-with-timezone or categorical that don’t always translate cleanly to Polars. In 2025 this friction has diminished greatly but hasn’t vanished.
Where pandas still wins
Not everything is an advantage for Polars. Three cases where pandas remains better:
- The scientific ecosystem. statsmodels, scipy and scikit-learn accept pandas DataFrames as standard input. Polars can convert at low cost, but the natural flow is still pandas.
- Interactive notebook work on small datasets. For 10,000 rows the performance difference is imperceptible, and the team’s familiarity wins. Imposing Polars to explore a 5 MB CSV is gratuitous engineering.
- BI tool support. Streamlit, Panel, Dash and most dashboard frameworks assume pandas as internal format. Polars is progressing but there are still cases where it’s second-class.
Patterns that work in production
After migrating several pipelines I’ve ended up with three repeated patterns:
- ETL pipeline with large parquet or CSV and aggregation/join transformations: Polars wins clearly and deserves to be the default option.
- Exploratory analysis on datasets of a few gigabytes: use Polars with the lazy API to tune the query and only materialise when clear on what to return. When the result goes to a chart or model, convert to pandas.
- One-off script: where simplicity beats performance. Pandas remains the default: the learning investment isn’t amortised in a week.
In Python the pattern is common:
pl.scan_parquet("sales.parquet")
.filter(pl.col("date") > "2025-01-01")
.group_by("region")
.agg(pl.col("amount").sum())
.collect()Which reads a parquet lazily, filters, groups, and executes. This deferred-execution pattern is also what makes RAG evaluation data processing efficient when evaluation datasets are large.
How to migrate without suffering
Progressive migration works better than total migration. What I’ve seen fail is teams deciding to go from pandas to Polars wholesale: they learn two APIs in parallel, stumble on subtle semantic differences and end up frustrated.
The most important semantic difference that catches people is that Polars has no index like pandas. In pandas there are operations that depend on the index for alignment between DataFrames; in Polars everything is done with explicit joins on columns. This mental-model change is the biggest obstacle to migrating mature pandas code.
The second difference: Polars applies its expressions in contexts, not outside them. In pandas you can take a series and operate on it anywhere; in Polars expressions live inside select, with_columns, filter, group_by. This seems a restriction but makes the code more declarative and more optimisable.
When it pays off
My practical rule in 2025: if the dataset passes 100 MB or the pipeline runs in daily production, migrating to Polars is worth it. If the dataset is below that threshold and the pipeline is exploratory or one-off, pandas remains sufficient.
The question unlocking the decision is economic, not technical: migration cost is the time the team invests learning a new API; the benefit is compute time saved and memory freed.
I think Polars will end up being the default option for new projects, but pandas won’t disappear. The scientific ecosystem has too much investment in pandas for the succession to be total. The likely reality is prolonged coexistence, with Arrow as bridge. A good scenario for everyone.