llama.cpp is the quiet foundation of local LLMs. 2024 brought speculative decoding, distributed RPC, and renewed GPU backends. When to use it directly vs through Ollama.
Read moreTag: llama.cpp
Ollama in 2024: Running LLMs Locally Without Pain
Ollama consolidated as standard for local LLMs. 2024 features, model catalogue, app integration, when to use vs vLLM.
Read moreModel Quantization and llama.cpp on Your Laptop
With quantization and llama.cpp you can run Llama 2 7B/13B on a modern laptop. How it works and what quality to actually expect.
Read more