llama.cpp is the quiet foundation of local LLMs. 2024 brought speculative decoding, distributed RPC, and renewed GPU backends. When to use it directly vs through Ollama.
Read moreTag: quantization
LoRA and QLoRA: Efficient Fine-Tuning on a Single Laptop
LoRA dramatically cuts fine-tune cost. QLoRA goes even further. How, when, and what quality to expect.
Read more