Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Herramientas Inteligencia Artificial

LM Studio: Exploring AI Models from Your Desktop

LM Studio: Exploring AI Models from Your Desktop

Actualizado: 2026-05-03

LM Studio[1] is a desktop app (Mac, Windows, Linux) that downloads and runs local LLMs with a polished UI: no terminal, no complicated setup. Open, pick model, chat. For exploratory developers, data analysts, journalists with sensitive data, and anyone wanting to try LLMs without sending queries to the cloud.

Key takeaways

  • LM Studio runs local LLMs (llama.cpp under the hood) with a polished chat UI and no terminal required.
  • The local OpenAI-compatible API allows existing code to work without changes pointing to localhost:1234.
  • Integrated RAG with documents (PDF, TXT, DOCX) keeps everything local: zero cloud exposure.
  • For personal and single-user use, LM Studio is superior to Ollama in UX. For teams, Ollama + OpenWebUI is more flexible.
  • For production or simultaneous multi-user, neither — use vLLM or TGI.

What LM Studio Does

Main features:

  • Model download from Hugging Face with one click.
  • Local execution over llama.cpp (under the hood).
  • Polished chat UI.
  • Local OpenAI-compatible API that other apps can consume.
  • RAG with your documents (PDF, TXT, DOCX) — chat with your files.
  • Side-by-side model comparison.
  • GPU offloading configurable (CPU+GPU hybrid).

OpenAI-Compatible API: The Hidden Value

LM Studio exposes an OpenAI-compatible API at localhost:1234. Existing code works without changes:

python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "Hi"}]
)

Useful for offline development, privacy-sensitive apps, or as a fallback if the cloud API is unavailable.

Local RAG with Your Documents

LM Studio integrates ingestion and RAG directly in the UI:

  1. Drag PDFs/docs to the chat.
  2. System extracts text and generates local embeddings.
  3. Chat uses relevant context from your docs.

For lawyers, doctors, journalists with confidential data: zero cloud exposure. Document store stays local.

Performance by Hardware

On Apple Silicon M2/M3:

  • Llama 3 8B Q4: 30-50 tokens/s on M2 Pro.
  • Mixtral 8x7B Q4: 15-25 tokens/s on M3 Max 64 GB.

On Windows with NVIDIA GPU:

  • RTX 4090: Llama 3 70B Q4 at ~15 tokens/s.
  • RTX 4070/4080: 7B-13B are sweet spot.

LM Studio vs Ollama vs OpenWebUI

Aspect LM Studio Ollama OpenWebUI + Ollama
UI Rich desktop Minimal (CLI) Very good (web)
Multi-user No No Yes
Built-in RAG Yes Via OpenWebUI Yes
Open-source No Yes (MIT) Yes
Target audience Individual + devs Devs Teams

LM Studio wins for non-technical-user UX and individual use. Ollama wins for dev/CLI stack integration and open-source. OpenWebUI is the option for teams wanting multi-user self-hosted.

Conclusion

LM Studio is the best option for individuals wanting to explore local LLMs with polished UI. For teams, Ollama + OpenWebUI offers more flexibility. For production, neither — use vLLM or TGI. LM Studio occupies a specific but important niche: democratising local LLM access for non-technical users. Free and polished, it’s the obvious choice in its category. For people handling private data or wanting to experiment without paying for APIs, it’s worth downloading.

Was this useful?
[Total: 11 · Average: 4.6]
  1. LM Studio

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.