Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software Tecnología

apple silicon desarrolladores hardware ia inferencia local npu ryzen ai snapdragon

NPUs for developers: what you can actually do today

December 5, 2025 11 min read 78 reads

Table of contents

Key takeaways
What’s on the market
The toolchain: ONNX Runtime as common denominator
What you can do today
Where it doesn’t pay off yet
Minimal ONNX Runtime example
Conclusion

Actualizado: 2026-05-03

For a couple of years, the NPU acronym was mostly a label on laptop boxes. In 2025 that has changed enough to warrant an honest review: what hardware is available, what tools let you use it from real code, which workloads pay off and which are still better on CPU or GPU.

Key takeaways

The three dominant laptop families are Qualcomm Snapdragon X (Elite and Plus), Apple Silicon from M1 onward, and AMD Ryzen AI 300 with XDNA.
TOPS are an easy number to compare but a misleading one: what matters in practice is the combination of raw capacity, supported precision, memory bandwidth, and the software stack available.
The most useful common denominator for developers wanting to cover multiple platforms is ONNX Runtime with vendor-specific execution providers.
Best-solved use cases today are lightweight vision, audio transcription, and small language models with INT4 quantization.
Running a model on NPU is often slower than on integrated GPU for the first invocation because of loading and compilation cost; the benefit shows up in repeated runs.

What’s on the market

The three dominant laptop families are Qualcomm Snapdragon X (Elite and Plus), Apple Silicon from M1 onward, and AMD Ryzen AI 300 with XDNA. Intel entered later with Core Ultra Meteor Lake and Lunar Lake. TOPS numbers: 45 on Snapdragon X Elite, 38 on Apple M4, 50 on Ryzen AI 300, and 48 on Lunar Lake. TOPS are an easy number to compare but a misleading one.

The toolchain: ONNX Runtime as common denominator

The element that made it realistic to talk about NPUs for developers is ONNX Runtime with vendor-specific execution providers: Qualcomm’s QNN EP, Apple’s CoreML EP, AMD’s Vitis AI EP, Intel’s OpenVINO EP. The practical decision for a developer wanting to cover multiple platforms is to start with ONNX Runtime. Quantization to INT8 or INT4 is almost always required: most NPUs are integer-oriented.

What you can do today

The best-solved use case is lightweight vision inference. Object detection, image classification, segmentation, face recognition, all run well on any current NPU with tens-of-milliseconds latency and significantly lower energy than on integrated GPU.

The second mature case is audio transcription. Whisper in small and medium variants runs reasonably well on NPU after proper quantization.

The third case, more recent and more ambitious, is small language models. Phi-3 Mini, Llama 3.2 1B and 3B, Qwen 2.5 in few-billion-parameter range with INT4 quantization already run on current NPUs at a tokens-per-second rate useful for summarization, text correction or local assistants. It’s not the territory where a laptop NPU competes with a datacenter GPU; it’s where it competes with running the same model on CPU, and there the NPU usually wins on both latency and energy.

Where it doesn’t pay off yet

Large models remain GPU or abundant-memory CPU territory. No consumer NPU trains today, they’re all inference. Models with complex control flow that don’t compile well to the static graph NPUs expect are also better left on CPU or GPU.

A detail that surprises people approaching this for the first time: running a model on NPU is often slower than on integrated GPU for the first invocation because of loading and compilation cost. The benefit shows up in repeated runs.

Minimal ONNX Runtime example

python

import onnxruntime as ort
import numpy as np

# On a Snapdragon X Elite laptop
providers = [
    ("QNNExecutionProvider", {"backend_path": "QnnHtp.dll"}),
    "CPUExecutionProvider",
]
session = ort.InferenceSession("quantized_model.onnx", providers=providers)

input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
output = session.run(None, {"input": input_data})

On Apple switch to CoreMLExecutionProvider, on AMD to VitisAIExecutionProvider, on Intel to OpenVINOExecutionProvider. If something fails on the NPU the runtime falls back to CPU automatically.

Conclusion

What was marketing is becoming infrastructure; time to learn to use it. The correct evaluation is always on the target hardware, with your own model, and with real data; vendor benchmarks are starting points, not decisions.

Was this useful?

[Total: 13 · Average: 4.4]

Post Views: 78

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software

AI editors in 2026: comparison after a year of use

Claude Code, Cursor, Aider, Copilot, Windsurf. Tras un año intenso con los principales editores asistidos por IA, esta es la comparativa que importa para quien elige hoy.

128 5 min April 28, 2026

Desarrollo de Software

AI tools for developers: the 2026 stack

El stack de herramientas IA que un desarrollador usa en 2026 es distinto al de hace dieciocho meses. Editores agénticos, herramientas de revisión, agentes de terminal y asistentes de pruebas se han estabilizado en roles reconocibles. Guía práctica por categoría.

90 13 min March 29, 2026 4.5

Desarrollo de Software

Rust in the Linux kernel: balance after several years

Cuatro años y medio después de la entrada oficial de Rust en el kernel Linux 6.1, con drivers reales de GPU Apple y NVMe en producción y tras varios conflictos mediáticos entre mantenedores, toca hacer balance técnico sin histrionismo. Qué funciona, qué cuesta y hacia dónde va la próxima fase.

71 11 min March 8, 2026 4.3

Desarrollo de Software

WASI preview 3: adoption and real cases

WASI preview 3 llegó como estándar estable a finales de 2025 y ha tenido unos meses para demostrar si realmente desbloquea los casos que preview 2 se quedaba cortos. Recorrido honesto por adopciones reales, bibliotecas maduras y patrones que empiezan a funcionar en producción.

114 13 min February 6, 2026 4.6

NPUs for developers: what you can actually do today

Key takeaways

What’s on the market

The toolchain: ONNX Runtime as common denominator

What you can do today

Where it doesn’t pay off yet

Minimal ONNX Runtime example

Conclusion

Related posts

AI editors in 2026: comparison after a year of use

AI tools for developers: the 2026 stack

Rust in the Linux kernel: balance after several years

WASI preview 3: adoption and real cases