llama.cpp is the quiet foundation of local LLMs. 2024 brought speculative decoding, distributed RPC, and renewed GPU backends. When to use it directly vs through Ollama.
Read moreTag: gguf
Ollama in 2024: Running LLMs Locally Without Pain
Ollama consolidated as standard for local LLMs. 2024 features, model catalogue, app integration, when to use vs vLLM.
Read moreLM Studio: Exploring AI Models from Your Desktop
LM Studio turns any modern laptop into a local-LLM lab. Who it’s for and when it beats Ollama or OpenWebUI.
Read moreModel Quantization and llama.cpp on Your Laptop
With quantization and llama.cpp you can run Llama 2 7B/13B on a modern laptop. How it works and what quality to actually expect.
Read more