LM Studio is a desktop app (Mac, Windows, Linux) that downloads and runs local LLMs with a polished UI. No terminal, no complicated setup: open, pick model, chat. For exploratory developers, data analysts, journalists with sensitive data, and anyone wanting to try LLMs without sending queries to the cloud.
This article covers what it offers, when it’s better than Ollama or OpenWebUI, and where it has limits.
What LM Studio Does
Main features:
- Model download from Hugging Face with one click.
- Local execution over llama.cpp (under the hood).
- Polished chat UI.
- Local OpenAI-compatible API that other apps can consume.
- RAG with your documents (PDF, TXT, DOCX) — chat with your files.
- Saved prompt management.
- Side-by-side model comparison.
All in a desktop binary, no terminal, no YAML config.
Installation
Download from lmstudio.ai. DMG for Mac, MSI for Windows, AppImage for Linux. Open.
First time asks to select a model. Recommended to start:
- Mac Apple Silicon: Llama 3 8B Q4_K_M (~5GB) or Phi-3 Mini (3GB).
- PC with 16GB RAM: Mistral 7B Q4 (~4GB) or Phi-3.
- PC with 32GB+ RAM: Mixtral 8x7B Q4 (~25GB) or quantised Llama 3 70B (~40GB).
Download and load, ready to chat.
Usage Experience
For a non-technical user:
- UI with model selector at start.
- Chat with visual parameters (temperature, top_p, context length).
- File upload for local RAG.
- Export/import conversations.
- Pre-configured prompt templates for common cases.
For a developer:
- API server at
localhost:1234OpenAI-compatible. - Multiple models loaded simultaneously.
- Logs of each query and tokens consumed.
- GPU offloading configurable (CPU+GPU hybrid).
OpenAI-Compatible API
An underrated feature: LM Studio exposes an OpenAI-compatible API. Your existing code works:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="local-model", # ignored, LM Studio uses loaded
messages=[{"role": "user", "content": "Hi"}]
)
Useful for offline development, privacy-sensitive apps, or as fallback if OpenAI falls.
Local RAG with Your Documents
LM Studio integrates ingestion and RAG:
- Drag PDFs/docs to the chat.
- System extracts text, generates local embeddings.
- Chat uses relevant context from your docs.
For lawyers, doctors, journalists with confidential data: zero cloud exposure. Document store stays local.
Hardware and Performance
On Apple Silicon M2/M3:
- Llama 3 8B Q4: 30-50 tokens/s on M2 Pro.
- Mistral 7B Q4: similar.
- Mixtral 8x7B Q4: 15-25 tokens/s on M3 Max 64GB.
- Llama 3 70B Q4: 5-10 tokens/s if it fits unified memory.
On Windows with NVIDIA GPU:
- RTX 4090: Llama 3 70B Q4 at ~15 tokens/s.
- RTX 4070/4080: 7B-13B are sweet spot.
- Laptop with 3050/4050: limited, better CPU inference.
CPU-only is viable for small models (3B) with slower but usable responses.
LM Studio vs Ollama
Honest comparison:
| Aspect | LM Studio | Ollama |
|---|---|---|
| UI | Rich desktop | Minimal (CLI + optional web) |
| Installation | DMG/MSI install | CLI binary |
| Models | Direct Hugging Face | Own registry + GGUF |
| API | OpenAI-compat | OpenAI-compat |
| Built-in RAG | Yes | Via OpenWebUI |
| Multi-model loading | Yes | Yes |
| Linux | AppImage (beta) | Mature native |
| Target audience | Non-tech users + devs | Devs |
| License | Closed (free) | Open MIT |
LM Studio wins for non-technical-user UX. Ollama wins for dev/CLI stack integration and open-source.
LM Studio vs OpenWebUI
OpenWebUI is a web UI for Ollama/other LLM backends.
| Aspect | LM Studio | OpenWebUI + Ollama |
|---|---|---|
| Deploy | Local desktop app | Docker container |
| Multi-user | No (single-user) | Yes |
| UI quality | Excellent | Very good |
| Self-hosted | Per user | For team |
| Open-source | No | Yes |
LM Studio is personal / single-user. OpenWebUI is team / multi-user self-hosted.
Real Use Cases
Where we see LM Studio:
- Developers testing models before deploy.
- Data scientists iterating with LLMs without cloud.
- Journalists and lawyers with confidential documents.
- Students learning about LLMs without spending on APIs.
- Small companies with laptop fleets and strict compliance.
Where it doesn’t fit:
- Production servers (use Ollama/vLLM).
- Simultaneous multi-user (use OpenWebUI).
- Scaling with multiple concurrent sessions.
- Non-GUI environments (SSH-only servers).
Limitations
Honestly:
- Closed-source (not OSS), though free. Potential lock-in.
- Update cadence depends on LM Studio team.
- Not easily integrable into CI pipelines.
- Single-machine: doesn’t distribute inference.
- Optional telemetry but worth verifying settings.
Performance Tuning
Three key tunings:
- GPU layers: how many model layers go to GPU. More = fast but needs VRAM.
- Context length: max tokens. Lower = faster + less memory.
- Thread count: for CPU inference, match physical cores (not HT logical).
Play with these until finding your hardware’s speed/memory balance.
Recommended Models to Start
For Apple Silicon M2/M3:
- General chat: Llama 3 8B Instruct Q4_K_M.
- Code: DeepSeek Coder 6.7B Q4.
- Spanish: Mixtral 8x7B if it fits.
- Reasoning: Phi-3 Medium.
For modest hardware:
- Phi-3 Mini (3.8B): excellent for size.
- Gemma 2B: very light.
- TinyLlama 1.1B: experimentation only.
Privacy and Data
LM Studio runs everything locally:
- Models downloaded and stored on disk.
- Chats stored in
~/.cache/lm-studio/. - RAG documents stay local.
- Optional telemetry for analytics (check settings).
- No mandatory cloud.
For sensitive data, it’s reasonable guarantee — nothing leaves your machine unless you enable it.
Conclusion
LM Studio is the best option for individuals wanting to explore local LLMs with polished UI. For teams, Ollama + OpenWebUI offers more flexibility. For production, neither — use vLLM or TGI. LM Studio occupies a specific but important niche: democratising local LLM access for non-technical users. Free and polished, it’s the obvious choice in its category. For people handling private data or wanting to experiment without paying for APIs, it’s worth downloading this afternoon.
Follow us on jacar.es for more on local LLMs, AI tools, and privacy.