LM Studio: Exploring AI Models from Your Desktop
Actualizado: 2026-05-03
LM Studio[1] is a desktop app (Mac, Windows, Linux) that downloads and runs local LLMs with a polished UI: no terminal, no complicated setup. Open, pick model, chat. For exploratory developers, data analysts, journalists with sensitive data, and anyone wanting to try LLMs without sending queries to the cloud.
Key takeaways
- LM Studio runs local LLMs (llama.cpp under the hood) with a polished chat UI and no terminal required.
- The local OpenAI-compatible API allows existing code to work without changes pointing to
localhost:1234. - Integrated RAG with documents (PDF, TXT, DOCX) keeps everything local: zero cloud exposure.
- For personal and single-user use, LM Studio is superior to Ollama in UX. For teams, Ollama + OpenWebUI is more flexible.
- For production or simultaneous multi-user, neither — use vLLM or TGI.
What LM Studio Does
Main features:
- Model download from Hugging Face with one click.
- Local execution over llama.cpp (under the hood).
- Polished chat UI.
- Local OpenAI-compatible API that other apps can consume.
- RAG with your documents (PDF, TXT, DOCX) — chat with your files.
- Side-by-side model comparison.
- GPU offloading configurable (CPU+GPU hybrid).
OpenAI-Compatible API: The Hidden Value
LM Studio exposes an OpenAI-compatible API at localhost:1234. Existing code works without changes:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="local-model",
messages=[{"role": "user", "content": "Hi"}]
)Useful for offline development, privacy-sensitive apps, or as a fallback if the cloud API is unavailable.
Local RAG with Your Documents
LM Studio integrates ingestion and RAG directly in the UI:
- Drag PDFs/docs to the chat.
- System extracts text and generates local embeddings.
- Chat uses relevant context from your docs.
For lawyers, doctors, journalists with confidential data: zero cloud exposure. Document store stays local.
Performance by Hardware
On Apple Silicon M2/M3:
- Llama 3 8B Q4: 30-50 tokens/s on M2 Pro.
- Mixtral 8x7B Q4: 15-25 tokens/s on M3 Max 64 GB.
On Windows with NVIDIA GPU:
- RTX 4090: Llama 3 70B Q4 at ~15 tokens/s.
- RTX 4070/4080: 7B-13B are sweet spot.
LM Studio vs Ollama vs OpenWebUI
| Aspect | LM Studio | Ollama | OpenWebUI + Ollama |
|---|---|---|---|
| UI | Rich desktop | Minimal (CLI) | Very good (web) |
| Multi-user | No | No | Yes |
| Built-in RAG | Yes | Via OpenWebUI | Yes |
| Open-source | No | Yes (MIT) | Yes |
| Target audience | Individual + devs | Devs | Teams |
LM Studio wins for non-technical-user UX and individual use. Ollama wins for dev/CLI stack integration and open-source. OpenWebUI is the option for teams wanting multi-user self-hosted.
Conclusion
LM Studio is the best option for individuals wanting to explore local LLMs with polished UI. For teams, Ollama + OpenWebUI offers more flexibility. For production, neither — use vLLM or TGI. LM Studio occupies a specific but important niche: democratising local LLM access for non-technical users. Free and polished, it’s the obvious choice in its category. For people handling private data or wanting to experiment without paying for APIs, it’s worth downloading.