Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Inteligencia Artificial

OpenAI text-embedding-3: What Changes vs the Previous One

OpenAI text-embedding-3: What Changes vs the Previous One

Actualizado: 2026-05-03

OpenAI released text-embedding-3 on 25 January 2024, in two variants: text-embedding-3-small (cheaper, 1536 dim) and text-embedding-3-large (higher quality, 3072 dim). It’s the first major embedding change from OpenAI since text-embedding-ada-002 of December 2022. For teams building RAG on OpenAI, the immediate question is: migrate? And if so, when and at what cost? This article covers what it adds, how to compare, and strategies for the switch.

What’s New in Five Points

Concrete changes vs ada-002:

  • MTEB quality: text-embedding-3-large reaches ~64% on the multi-task benchmark (vs 61% for ada-002). 3-small achieves ~62%.
  • Variable dimensions: truncate embeddings to fewer dimensions without retraining. 3072 → 512 keeps ~80% quality at a third of storage.
  • Improved multilingual: 3-large on MIRACL (multilingual benchmark) jumps from 31% to 54% vs ada-002.
  • Lower price for small: $0.02 / 1M tokens vs $0.10 for ada-002. large is $0.13 / 1M.
  • Better handling of long documents: max context stays at 8191 tokens but improves on long texts.

The “cheaper + better quality” combo is rare; normally you pay one with the other. That’s why this release generates so much interest.

Variable Dimensions: How It Works

The technique is called Matryoshka Representation Learning: the model is trained so the first N vector components are a functional representation on their own:

python
from openai import OpenAI
client = OpenAI()

# Generate embedding with reduced dimensions server-side
res = client.embeddings.create(
    input="Sample text",
    model="text-embedding-3-small",
    dimensions=512
)

# Or truncate client-side
full = res.data[0].embedding
truncated = full[:512]

Truncating enables: smaller indexes (3x less memory in pgvector, Qdrant, or Pinecone), faster searches, and easy A/B testing (try 256 vs 512 vs 1536 without full reindex). Practical rule: for Spanish/English RAG, 512 dim is a good midpoint. For complex multilingual, 1024 dim.

Migration from ada-002

Not plug-and-play if you have indexed embeddings. Three options:

  • Reindex everything: reprocess the corpus with the new model. For 10M 1KB docs, ~10B tokens × $0.02/M = $200 with 3-small. The cleanest for most cases.
  • Gradual migration: keep ada-002 for existing docs, 3-small for new, separate indexes, search over both. Complicates the architecture.
  • Hybrid search: new and old embeddings coexist with a unified re-rank at the end.

For most cases, reindex at once is cleaner. Cost is manageable and quality gain offsets it. The key is measuring real recall before and after, not just trusting public benchmarks.

Comparison with Open-Source Alternatives

Model MTEB avg Dim Cost
text-embedding-3-large 64.6 3072 $0.13/1M
text-embedding-3-small 62.3 1536 $0.02/1M
BGE-large-en-v1.5 64.2 1024 own infra
e5-large-v2 63.4 1024 own infra
text-embedding-ada-002 60.9 1536 $0.10/1M

BGE-large and E5-large are nearly on par in quality but require own infrastructure. Classic trade-off: OpenAI is simple but external dependency; self-hosted is control but operational burden.

Latency Considerations

OpenAI embedding API: ~50-200ms p50, can hit 1s+ on spikes. For batch processing it doesn’t matter; for real-time queries it can be problematic. Mitigation strategies: aggressive caching for similar queries; local model for the query + OpenAI-indexed docs (precomputed offline); batch embeddings (up to 2048 inputs per call to amortise network latency).

Data Residency

OpenAI processes in US by default. For European compliance, Azure OpenAI offers EU regions with text-embedding-3 available at similar pricing. For regulated European companies, Azure is the path; for the rest, direct OpenAI is usually more agile.

Own Benchmark Before Migrating

Don’t just trust MTEB. Measure with your real corpus using a golden set of 500 manually curated query-document pairs. 500 well-selected pairs are more informative than millions of public benchmarks on different data.

Common Migration Mistakes

  • Forgetting to normalise: cosine vs inner product aren’t always interchangeable. Accidental double-normalisation causes subtle bugs.
  • Mixing models in the same index: searching embeddings from different models in the same index gives garbage results.
  • Badly truncated dimensions: [1:N+1] instead of [:N]. An off-by-one kills quality.
  • Not measuring real recall post-migration: only verifying the API responds is not enough.

When to Migrate

Migrate now if: recall or precision is your RAG bottleneck; you have a multilingual corpus; you have high ada-002 volume where 3-small saves money.

Wait if: RAG works well with stable metrics; upcoming product changes will require reindexing anyway; low volume where reindex cost doesn’t recover in reasonable time.

Conclusion

text-embedding-3 is a real, measurable improvement over ada-002. The variable-dimensions option is particularly interesting for its operational impact — less memory, faster searches, cheaper A/B testing. Migration is worth it for most serious RAG cases, though reindexing must be budgeted carefully. Against open-source alternatives, the decision remains the classic one between managed simplicity vs self-hosted control — and both are correct answers depending on context.

Was this useful?
[Total: 14 · Average: 4.5]

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.