Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Inteligencia Artificial Mac Tecnología

llama mistral ollama tutorial ubuntu

Deploy Llama 3.3 and Mistral locally with Ollama and Open WebUI on Ubuntu 24.04

April 12, 2026 7 min read 165 reads

Table of contents

Key takeaways
Prerequisites: Ubuntu 24.04 + NVIDIA drivers
Install Ollama and verify GPU
Pull quantized Llama 3.3 and Mistral
Open WebUI with docker compose
Expose behind Traefik with TLS
Restrict access by IP and user

Actualizado: 2026-05-17

Key takeaways

Ollama 0.5+ runs Llama 3.3 70B Q4 comfortably on an RTX 4090 (24 GB VRAM); quantized Mistral Large 2 fits the same machine.
Open WebUI replaces the ollama CLI for non-technical users; same ecosystem, modern UI.
The piece separating “laptop demo” from “real service” is exposing behind Traefik with TLS and IP/user auth — the second half of this tutorial.
Q4_K_M quantization offers the best quality/memory trade-off in 2026; drop to Q3 only when GPU is tight.

Prerequisites: Ubuntu 24.04 + NVIDIA drivers

Reasonable minimum hardware:

NVIDIA GPU with ≥16 GB VRAM (RTX 4080/4090, A4000, RTX 5000 Ada).
32 GB RAM, 1 TB NVMe.
Ubuntu 24.04 LTS Server.

NVIDIA + CUDA drivers:

bash

sudo ubuntu-drivers autoinstall
sudo reboot
nvidia-smi   # check GPU shows up

Docker + nvidia-container-toolkit (follow como-instalar-docker-en-ubuntu-22-04, equivalent steps on 24.04):

bash

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | 
  sudo sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | 
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify: docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi.

Install Ollama and verify GPU

bash

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
ollama --version    # should report 0.5+ as of 2026-06

Verify GPU detected:

bash

journalctl -u ollama | grep -i "gpu|cuda" | head -10

You should see cuda and compute capability lines. If it says using cpu, revisit nvidia-smi and nvidia-container-toolkit.

Pull quantized Llama 3.3 and Mistral

Recommended models in 2026:

bash

ollama pull llama3.3:70b-instruct-q4_K_M    # ~40 GB download, ~24 GB VRAM
ollama pull mistral-large:latest             # ~70 GB download, needs 48 GB VRAM or offload
ollama pull qwen2.5-coder:32b-instruct-q4_K_M # large code model

Test:

bash

ollama run llama3.3:70b-instruct-q4_K_M "Explain MCP in one sentence."

Expected throughput: ~30-50 tokens/s on an RTX 4090 with a short prompt.

Open WebUI with docker compose

yaml

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    extra_hosts:
      - host.docker.internal:host-gateway
    environment:
      OLLAMA_BASE_URL: http://host.docker.internal:11434
      WEBUI_SECRET_KEY: "${WEBUI_SECRET_KEY}"
    ports:
      - "127.0.0.1:3000:8080"
    volumes:
      - open-webui-data:/app/backend/data

volumes:
  open-webui-data:

docker compose up -d. Local access: http://127.0.0.1:3000. Create the admin account on first launch.

Expose behind Traefik with TLS

If you already run Traefik (per como-instalar-traefik-con-docker-compose), add labels:

yaml

services:
  open-webui:
    networks: [traefik]
    labels:
      - traefik.enable=true
      - traefik.http.routers.openwebui.rule=Host(`llm.your-domain.com`)
      - traefik.http.routers.openwebui.entrypoints=websecure
      - traefik.http.routers.openwebui.tls.certresolver=letsencrypt
      - traefik.http.services.openwebui.loadbalancer.server.port=8080
networks:
  traefik:
    external: true

Drop the ports: localhost binding when using Traefik.

Restrict access by IP and user

Layer 1 — IP allowlist in Traefik:

yaml

labels:
  - traefik.http.middlewares.openwebui-ipallow.ipallowlist.sourcerange=10.0.0.0/8,192.168.1.0/24,YOUR.PUBLIC.IP/32
  - traefik.http.routers.openwebui.middlewares=openwebui-ipallow@docker

Layer 2 — Open WebUI native auth: in Settings → Users tick “Require email verification” and set ENABLE_SIGNUP=false so only admin invites.

Layer 3 — auditing: set WEBUI_LOG_LEVEL=info and ship to Loki or Elasticsearch to keep who-asks-what — in enterprise contexts, especially with sensitive data, traceability is mandatory.

For fine-tuning when generic models aren’t enough, see the upcoming Phase 3 cluster on LoRA + Unsloth. To understand how this Ollama plugs into a full RAG stack, RAG with Postgres + pgvector.

Reference repos: ollama.com^[1], github.com/open-webui^[2], traefik.io^[3].

Was this useful?

[Total: 0 · Average: 0]

Post Views: 165

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Deploy Llama 3.3 and Mistral locally with Ollama and Open WebUI on Ubuntu 24.04

Key takeaways

Prerequisites: Ubuntu 24.04 + NVIDIA drivers

Install Ollama and verify GPU

Pull quantized Llama 3.3 and Mistral

Open WebUI with docker compose

Expose behind Traefik with TLS

Restrict access by IP and user

Related posts

RAG with Postgres and pgvector in production: from PoC to SLO

“EU AI Act 2026: a technical checklist for Spanish CTOs”

Agent observability with OpenTelemetry GenAI semconv in 2026

How to install and tune oMLX on M5 Max 128 GB