Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software Herramientas

dataframes datos pandas polars python rendimiento rust

Polars versus pandas in 2025: the real-world practice

June 17, 2025 12 min read 119 reads

Table of contents

Key takeaways
What has changed since 2023
Performance measured, not promised
Interoperability via Arrow
Where pandas still wins
Patterns that work in production
How to migrate without suffering
When it pays off

Actualizado: 2026-05-03

Polars has been pitching itself as the natural successor to pandas for two years, and until this year that succession was more hope than reality. With Polars 1.0 released in 2024 and the 1.x series now stable, the project is in a different place: it has a frozen API, serious documentation, data-ecosystem integrations, and a community large enough to resolve concrete questions without waiting for the original author.

Key takeaways

Polars is 3–10× faster than pandas on common aggregation, join and filter operations over 1–20 GB parquet datasets.
Memory savings are 40–60 % vs pandas for the same dataset, thanks to native Arrow columns.
Interoperability via Apache Arrow (near-zero conversion cost between both) removes the pressure to migrate all at once.
The most important semantic difference: Polars has no index — everything is done with explicit column joins.
Practical rule: if the dataset passes 100 MB or the pipeline runs in daily production, Polars is worth it.

What has changed since 2023

The first time I tried Polars, in 2023, I had the impression it was faster but immature. Minor functions were missing, the documentation assumed you came from Rust, and error messages were technical. None of that describes the 2025 version:

The 1.x API is frozen with a clear forward-compatibility commitment.
Documentation has examples for each function compared against the equivalent pandas API.
Error messages have improved noticeably.

The other important difference is the arrival of the mature lazy API. In pandas each operation evaluates immediately, which prevents global optimisations. Polars has deferred execution by default when used with LazyFrame: you define the transformation graph, the optimiser reorders and fuses operations, and execution only happens when you ask for the result.

Performance measured, not promised

Polars’s synthetic benchmarks are impressive but noisy. What matters is performance on real tasks. The numbers I’ve measured with datasets between 1 and 20 GB in parquet are consistent in one direction: Polars is 3 to 10× faster than pandas on common aggregation, join and filter operations.

The improvement is larger when the operation includes large groupings or joins over millions of rows:

In a daily sales data cleaning pipeline, migrating from pandas to Polars dropped execution time from 42 minutes to 6.
In a smaller pipeline around 500 MB, improvement was 28 seconds to 9.

Memory savings are also real. Polars uses native Arrow columns with efficient types, while pandas drags numpy’s legacy with types like object for strings. In my tests, Polars typically uses 40–60 % of the memory pandas needs for the same dataset.

Interoperability via Arrow

The piece that makes coexisting with both tools viable is Apache Arrow. Both pandas and Polars can read and write Arrow natively, and conversion between them has near-zero cost thanks to zero-copy.

In practice this means a script can start reading data with pandas if it’s more convenient, convert to Polars for expensive operations, and return to pandas to pass the result to a library that only accepts pandas DataFrames.

Convertibility removes the pressure to migrate all at once. In real projects what I’ve done is migrate the hot loops, the parts consuming 90 % of the time, to Polars, and leave the rest in pandas.

There’s a subtlety with types. Pandas has some types like datetime-with-timezone or categorical that don’t always translate cleanly to Polars. In 2025 this friction has diminished greatly but hasn’t vanished.

Where pandas still wins

Not everything is an advantage for Polars. Three cases where pandas remains better:

The scientific ecosystem. statsmodels, scipy and scikit-learn accept pandas DataFrames as standard input. Polars can convert at low cost, but the natural flow is still pandas.
Interactive notebook work on small datasets. For 10,000 rows the performance difference is imperceptible, and the team’s familiarity wins. Imposing Polars to explore a 5 MB CSV is gratuitous engineering.
BI tool support. Streamlit, Panel, Dash and most dashboard frameworks assume pandas as internal format. Polars is progressing but there are still cases where it’s second-class.

Patterns that work in production

After migrating several pipelines I’ve ended up with three repeated patterns:

ETL pipeline with large parquet or CSV and aggregation/join transformations: Polars wins clearly and deserves to be the default option.
Exploratory analysis on datasets of a few gigabytes: use Polars with the lazy API to tune the query and only materialise when clear on what to return. When the result goes to a chart or model, convert to pandas.
One-off script: where simplicity beats performance. Pandas remains the default: the learning investment isn’t amortised in a week.

In Python the pattern is common:

python

pl.scan_parquet("sales.parquet")
  .filter(pl.col("date") > "2025-01-01")
  .group_by("region")
  .agg(pl.col("amount").sum())
  .collect()

Which reads a parquet lazily, filters, groups, and executes. This deferred-execution pattern is also what makes RAG evaluation data processing efficient when evaluation datasets are large.

How to migrate without suffering

Progressive migration works better than total migration. What I’ve seen fail is teams deciding to go from pandas to Polars wholesale: they learn two APIs in parallel, stumble on subtle semantic differences and end up frustrated.

The most important semantic difference that catches people is that Polars has no index like pandas. In pandas there are operations that depend on the index for alignment between DataFrames; in Polars everything is done with explicit joins on columns. This mental-model change is the biggest obstacle to migrating mature pandas code.

The second difference: Polars applies its expressions in contexts, not outside them. In pandas you can take a series and operate on it anywhere; in Polars expressions live inside select, with_columns, filter, group_by. This seems a restriction but makes the code more declarative and more optimisable.

When it pays off

My practical rule in 2025: if the dataset passes 100 MB or the pipeline runs in daily production, migrating to Polars is worth it. If the dataset is below that threshold and the pipeline is exploratory or one-off, pandas remains sufficient.

The question unlocking the decision is economic, not technical: migration cost is the time the team invests learning a new API; the benefit is compute time saved and memory freed.

I think Polars will end up being the default option for new projects, but pandas won’t disappear. The scientific ecosystem has too much investment in pandas for the succession to be total. The likely reality is prolonged coexistence, with Arrow as bridge. A good scenario for everyone.

Was this useful?

[Total: 15 · Average: 4.3]

Post Views: 119

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software

AI editors in 2026: comparison after a year of use

Claude Code, Cursor, Aider, Copilot, Windsurf. Tras un año intenso con los principales editores asistidos por IA, esta es la comparativa que importa para quien elige hoy.

128 5 min April 28, 2026

Desarrollo de Software

AI tools for developers: the 2026 stack

El stack de herramientas IA que un desarrollador usa en 2026 es distinto al de hace dieciocho meses. Editores agénticos, herramientas de revisión, agentes de terminal y asistentes de pruebas se han estabilizado en roles reconocibles. Guía práctica por categoría.

90 13 min March 29, 2026 4.5

Desarrollo de Software

Rust in the Linux kernel: balance after several years

Cuatro años y medio después de la entrada oficial de Rust en el kernel Linux 6.1, con drivers reales de GPU Apple y NVMe en producción y tras varios conflictos mediáticos entre mantenedores, toca hacer balance técnico sin histrionismo. Qué funciona, qué cuesta y hacia dónde va la próxima fase.

71 11 min March 8, 2026 4.3

Desarrollo de Software

WASI preview 3: adoption and real cases

WASI preview 3 llegó como estándar estable a finales de 2025 y ha tenido unos meses para demostrar si realmente desbloquea los casos que preview 2 se quedaba cortos. Recorrido honesto por adopciones reales, bibliotecas maduras y patrones que empiezan a funcionar en producción.

114 13 min February 6, 2026 4.6

Polars versus pandas in 2025: the real-world practice

Key takeaways

What has changed since 2023

Performance measured, not promised

Interoperability via Arrow

Where pandas still wins

Patterns that work in production

How to migrate without suffering

When it pays off

Related posts

AI editors in 2026: comparison after a year of use

AI tools for developers: the 2026 stack

Rust in the Linux kernel: balance after several years

WASI preview 3: adoption and real cases