Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Inteligencia Artificial

apache 2.0 embeddings nomic embed open-source openai rag

nomic-embed-text: Competitive Open Embeddings

May 5, 2024 9 min read 122 reads

Table of contents

Key takeaways
Why open embeddings matter
What nomic-embed-text-v1 actually is
Task prefixes and the v1.5 version
Performance and integration
Realistic expectations
Conclusion

Actualizado: 2026-05-03

In February 2024 Nomic AI released nomic-embed-text-v1 and, weeks later, the v1.5 variant with Matryoshka representations. It’s not the first open-source embedding model, but it’s the first one arriving with three things at once: Apache 2.0 weights, fully auditable training data, and an MTEB score close enough to text-embedding-3-small that the conversation stops being “open source or quality” and becomes “open source with enough quality.”

Key takeaways

137M parameters, 768-dimensional vectors, and up to 8192 tokens of context: triple most prior open models.
Apache 2.0 licence with published training data: no vendor lock-in.
The v1.5 variant adds Matryoshka Representation Learning: truncating to 256 dimensions loses only 2-3 MTEB points.
Task prefixes (search_query:, search_document:) are mandatory; omitting them is the most common migration mistake from OpenAI.
Compatible with Ollama, LangChain, LlamaIndex, and pgvector with no extra plugins.

Why open embeddings matter

An embedding is the quietly expensive part of a RAG system — not because of per-token cost, but because of coupling. If you index millions of documents with a proprietary model and it changes version or disappears, your index is orphaned. Reindexing isn’t trivial: reprocessing the corpus, regenerating vectors, rebuilding the HNSW index, and validating retrieval quality is a multi-day project.

Two additional dimensions matter:

Data residency: in the EU, many legal teams wince at sending the whole corpus to the OpenAI API. A local model removes that friction.
Transparency: Nomic published the training dataset, allowing a compliance team to reason about what the model saw and its likely biases.

What nomic-embed-text-v1 actually is

137M parameters, 768-dimensional vectors, up to 8192 tokens of context. That last number is the surprising one: most open embeddings from the previous generation (E5, BGE, GTE) capped at 512 tokens. 8k lets you embed a full article or an entire conversation without artificial chunking.

Training followed a two-stage contrastive scheme: weakly supervised pretraining on ~235M pairs (Common Crawl, Wikipedia, StackExchange), then supervised fine-tuning on MSMARCO, NQ, HotpotQA, and Nomic-curated sets.

On MTEB, v1 averages around 62.4 points. text-embedding-3-small sits at 62.3. Nomic isn’t the best open model, but it’s within noise of the default proprietary embedding, at 768 dimensions instead of 1536. Fewer dimensions mean smaller indices and faster searches.

Task prefixes and the v1.5 version

A critical operational quirk: the model uses task prefixes in the style of Cohere Embed v3. Prepend the right prefix depending on use:

python

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)
doc = model.encode("search_document: RAG combines retrieval with generation.")
query = model.encode("search_query: what is RAG?")

Ignoring prefixes noticeably degrades retrieval quality. It’s the most frequent mistake when migrating from OpenAI, where prefixes don’t exist.

The v1.5 version adds Matryoshka Representation Learning: the model is trained so that the first N components of the vector are useful on their own. You can keep the full 768 dimensions or truncate to 512, 256, 128, or even 64 depending on storage constraints, with gradual and predictable quality degradation. Going from 768 to 256 dimensions cuts space by almost 70% at the cost of a couple of MTEB points.

Performance and integration

On a 16-core server CPU: ~100 embeddings/second; on an RTX 4090: ~3000; on Apple Silicon M2 Pro with MPS: ~500. With Ollama (ollama pull nomic-embed-text) it exposes an OpenAI-compatible embeddings endpoint, letting you migrate existing code by changing only base_url. For pgvector: declare the column as vector(768) and index with HNSW over vector_cosine_ops.

Realistic expectations

Nomic isn’t frontier. If your application needs the last 3% of retrieval precision, text-embedding-3-large or mxbai-embed-large will still win. Not the best option if your corpus is heavily multilingual or you need more than 8192 tokens.

What it offers — and few models do — is an acceptable combination across four axes that usually matter simultaneously: quality close to the proprietary standard, long context, a genuinely permissive licence, and published training data.

Conclusion

The fact that a model like nomic-embed-text exists, with the full supply chain open and competitive quality, is probably more important in the medium term than its exact position in this month’s benchmark. For a production RAG where the team prefers not to depend on an external API and English is the main language, Nomic becomes the reasonable default choice.

Was this useful?

[Total: 14 · Average: 4.4]

Post Views: 122

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

nomic-embed-text: Competitive Open Embeddings

Key takeaways

Why open embeddings matter

What nomic-embed-text-v1 actually is

Task prefixes and the v1.5 version

Performance and integration

Realistic expectations

Conclusion

Related posts

“EU AI Act 2026: a technical checklist for Spanish CTOs”

Agent observability with OpenTelemetry GenAI semconv in 2026

How to install and tune oMLX on M5 Max 128 GB

Multi-agent systems: LangGraph vs CrewAI vs Autogen in 2026