Dify: a self-hosted LLMOps platform

Dify is an open-source platform for building AI applications and agents, with a visual workflow canvas, prompt management, a RAG knowledge base and LLMOps layers. You can self-host the whole thing with Docker Compose on top of Postgres, Redis and a vector database. This guide explains how to deploy it and when it beats Flowise and Langflow.

July 17, 2026 10 min 11

Artificial Intelligence

What is a vector embedding and what is it used for

A vector embedding is a list of real numbers that represents the semantic meaning of a piece of text, an image, or any other data. Two sentences with the same meaning produce vectors that are close together; two unrelated ones produce vectors that are far apart. Semantic search, RAG, and recommendation systems are all built on this principle.

July 5, 2026 5 min 50

Artificial Intelligence

RAG with Postgres and pgvector in production: from PoC to SLO

Embeddings, HNSW indexing, reranking, evaluation, context window, latency under load. Full stack with code and measurable SLOs.

June 9, 2026 10 min 308

Architecture

Hybrid RAG in 2026: the patterns that keep winning

Hybrid RAG in 2026 combines dense and lexical search fused with RRF, cross-encoder reranking over the top-50 candidates, structure-aware chunking, and continuous evaluation with Ragas or TruLens. It is the pattern that survives in serious production systems three years after the initial embeddings boom.

April 28, 2026 3 min 567 4.6

Architecture

Redis 8.2 and its vector support: when it actually makes sense

Redis 8.2 ships vector search as a native data type. The real question is whether it replaces a dedicated engine like Qdrant, Weaviate, or pgvector on workloads with millions of vectors and tight latency budgets, or only works as a bonus on top of the cache you already run.

August 4, 2025 5 min 224 4.6

Artificial Intelligence

RAG 2.0: knowledge graphs, vectors, and hybrid

El RAG de 2023 era búsqueda vectorial con un LLM detrás. El de 2025 es un sistema híbrido que combina vectores, búsqueda léxica y grafos de conocimiento. Qué ha cambiado, dónde funciona cada pieza y qué decisiones marcan la diferencia entre un RAG útil y uno decepcionante.

July 17, 2025 6 min 331 4.2

Artificial Intelligence

The knowledge graph era is reborn with LLMs

For a decade, knowledge graphs were an academic idea with few real use cases, held back by the cost of building and maintaining the schema. LLMs have changed that equation: they now extract entities automatically and help anchor answers, audit reasoning, and support agents without hallucinating.

May 21, 2025 5 min 257 4.5

Artificial Intelligence

Continuous evaluation of RAG: dashboards that actually matter

Un sistema RAG sin evaluación continua se degrada en silencio. Los índices cambian, los modelos se actualizan, los usuarios preguntan cosas nuevas. Este es un repaso práctico de qué métricas vigilar y cómo montar el cuadro de mando que avisa antes del incidente.

May 9, 2025 6 min 226 4.3

Architecture

Applying graph RAG to a real product

Desde que Microsoft abrió GraphRAG, el patrón de usar grafos sobre tus propios datos ha pasado de experimento académico a técnica con aplicaciones prácticas. Reflexión sobre cuándo compensa, cómo se monta y qué errores se repiten.

March 28, 2025 6 min 258 4.7

Architecture

Microsoft’s GraphRAG in enterprise: patterns that work

GraphRAG has been in real enterprise use for over a year: during indexing, an LLM builds a knowledge graph that answers global questions about a corpus well, precisely where classic RAG fails because no single chunk holds the full answer. Here I compare indexing costs, the cases where it pays off, and the hybrid pattern that teams have settled on.

February 11, 2025 4 min 218 4.4

Artificial Intelligence

How to Evaluate a RAG System Without Fooling Yourself

Measuring RAG quality rigorously takes more than skimming a handful of answers: it requires objective metrics (faithfulness, relevance, context precision, and coverage), a golden set of hundreds of curated questions, and regular human validation of the LLM judge to avoid misleading conclusions.

December 28, 2024 5 min 264 4.3

Architecture

Hybrid Search: Combining BM25 and Vectors Seriously

Hybrid search combines BM25 and vector retrieval to cover what each misses alone. Vectors fail on exact identifiers like SKUs or CVEs; BM25 fails when query and document use different vocabulary for the same idea. Reciprocal Rank Fusion (RRF) merges both rankings without depending on their score scales.

December 7, 2024 6 min 269

Architecture

RAG in Production: Patterns That Work and Those That Don’t

Tras dos años de RAG en producción, patrones claros emergen: chunking inteligente, hybrid search, re-ranking, evaluación continua. Qué evitar.

September 26, 2024 6 min 269 4.4

Artificial Intelligence

OpenAI Assistants API: Stateful Agents Without Your Own Infrastructure

OpenAI's Assistants API offers persistent threads, sandboxed code execution, and managed document search, but OpenAI is shutting it down completely on August 26, 2026 in favor of the Responses API. We look at when it used to pay off against Chat Completions with your own infrastructure, and what to do if your project still depends on it.

September 17, 2024 7 min 217 4.4

Artificial Intelligence

Re-Ranking in RAG: The Piece That Really Raises Quality

Embeddings retrieve fast but rank poorly. A reranker over the top-100 lifts precision 15–30 %. When it pays off and when it does not.

July 10, 2024 6 min 423 4.6

Artificial Intelligence

nomic-embed-text: Competitive Open Embeddings

nomic-embed-text-v1.5 from Nomic AI is an embedding model with weights, code and training data released under Apache 2.0: 137 million parameters, up to 8192 tokens of context, and an MTEB score of 62.4, almost matching the 62.3 of OpenAI's text-embedding-3-small, at 768 dimensions instead of 1536.

May 5, 2024 4 min 250 4.4

Artificial Intelligence

Gemini 1.5: Millions of Tokens of Context in Production

Gemini 1.5 Pro launched in February 2024 with a verified one-million-token context window. It retrieves over 95% of data up to 530,000 tokens in recall tests, reshaping RAG system design, making full-document analysis viable, and enabling new architectural patterns through context caching.

February 26, 2024 3 min 199 4.3

Artificial Intelligence

OpenAI text-embedding-3: What Changes vs the Previous One

OpenAI released text-embedding-3 on 25 January 2024 in two variants: small and large. It improves MTEB quality over ada-002, adds variable dimensions you can truncate without retraining, and lowers the price for small. Migration pays off for most serious RAG setups, but measure real recall on your own corpus before reindexing everything.

January 27, 2024 4 min 244 4.5

Architecture

pgvector in 2024: HNSW Indexes and Real Scaling

pgvector matured in 2023-2024 with the HNSW index type and parallel construction that arrived in version 0.6. For projects already running PostgreSQL, a dedicated vector database is not needed in most cases: this guide explains when PostgreSQL is enough, how to configure the index, and where it starts to fall short.

January 21, 2024 5 min 233 4.4

Artificial Intelligence

Cohere Embed v3: Multilingual and Enterprise-Oriented

Cohere Embed v3 is an embedding model that distinguishes queries from documents via the input_type parameter and scores intrinsic text quality, with multilingual support for over 100 languages at 1024 dimensions. It costs $0.10 per million tokens versus OpenAI's $0.02, and delivers better recall in multilingual RAG.

January 9, 2024 4 min 228 4.2

Architecture

Vector Databases: Qdrant, Pinecone, and Weaviate

Vector databases have gone from an experimental curiosity to the central component of most LLM-based products. This comparison covers Qdrant, Pinecone, and Weaviate: architecture, strengths, limitations, and a decision tree for choosing the right option based on your operational priorities and budget.

November 13, 2023 5 min 258 4.3

Architecture

pgvector: Semantic Search Without Leaving Postgres

pgvector turns PostgreSQL into a fully functional vector database without adding a separate service to the stack. It extends Postgres with the vector type, IVFFlat indexes for approximate nearest-neighbour search (ANN), and the ability to combine relational SQL filters with vector ranking in a single query. For most RAG projects and internal chatbots, those limits never become a problem.

November 1, 2023 6 min 227

Artificial Intelligence

LangChain: The Framework for Orchestrating LLM Applications

LangChain is a Python framework that unifies building LLM applications: prompt templates, retrievers over vector databases, function-calling agents, and conversational memory. It earns its keep in fast prototypes and multi-model systems, but for a single well-defined production use case, direct code usually stays more maintainable.

October 29, 2023 5 min 264 4.4

Architecture

Chroma: A Lightweight Vector Database for Embedding Prototypes

Chroma is the easiest vector database to get started with embeddings and semantic search: install it with pip install chromadb, no extra infrastructure required, and it exposes a minimal API (add, query, delete). It suits prototypes and mid-sized RAG systems well; past a few million vectors, Qdrant or Milvus scale better.

October 17, 2023 5 min 235 4.4

Artificial Intelligence

LLM Fine-Tuning: When It’s Worth Training Your Own

Fine-tuning your own LLM pays off in three cases: you need a very specific style or voice, a rigid structured output format, or you want lower cost and latency from a small specialised model. LoRA and QLoRA have cut the GPU cost, but preparing data and running the model in production are still expensive. For everything else, RAG and prompt engineering are usually enough.

July 13, 2023 4 min 265 4.6