OpenAI text-embedding-3: What Changes vs the Previous One

Red neuronal abstracta con puntos conectados en gradiente de colores

OpenAI released text-embedding-3 on January 25, 2024, in two variants: text-embedding-3-small (cheaper, 1536 dim) and text-embedding-3-large (higher quality, 3072 dim). It’s the first major embedding change from OpenAI since text-embedding-ada-002 (December 2022). For teams building RAG on OpenAI, the immediate question is: migrate? And if so, when? This article covers what it adds, how to compare, and strategies for the switch.

What’s New

Five concrete changes vs ada-002:

  • MTEB quality (multi-task benchmark): text-embedding-3-large scores ~64% (vs 61% ada-002). 3-small scores ~62%.
  • Variable dimensions: you can truncate embeddings to fewer dimensions without retraining. 3072 → 512 keeps ~80% quality at a third of storage.
  • Improved multilingual: 3-large on MIRACL (multilingual benchmark) jumps from 31% to 54% vs ada-002.
  • Lower price for small: $0.02 / 1M tokens (vs $0.10 for ada-002). large is $0.13 / 1M.
  • Better length handling: max context stays at 8191 tokens, but improves on long documents.

The “cheaper + better quality” combo is rare; usually you pay one with the other.

Variable Dimensions: How It Works

The technique is called Matryoshka Representation Learning — the model is trained so the first N vector components are a functional representation on their own.

Practice:

from openai import OpenAI
client = OpenAI()

# Generate full embedding
res = client.embeddings.create(
    input="Sample text",
    model="text-embedding-3-small",
    dimensions=512  # <— optionally truncate server-side
)
# Or truncate client-side
full = res.data[0].embedding
truncated = full[:512]

Truncating allows:

  • Smaller indexes: 3x less memory in pgvector/Qdrant/Pinecone.
  • Faster searches: fewer dimensions = less compute per query.
  • Easy A/B testing: try 256 vs 512 vs 1536 without full reindex.

Practical rule: for ES/EN RAG, 512 dim is a good middle point. For complex multilingual, 1024 dim.

Migration from ada-002

Not plug-and-play if you have indexed embeddings. Options:

  • Reindex everything: reprocess your corpus with the new model. Cost: proportional to size. For 10M 1KB docs, ~10B tokens × $0.02/M = $200 with 3-small.
  • Gradual migration: keep ada-002 for existing docs, use 3-small for new ones, with separate indexes and search across both. Complicates.
  • Hybrid search: new and old embeddings coexist for a while, with a unified re-rank at the end.

For most, reindex at once is cleaner. The cost is manageable and quality gain offsets.

Comparison with Open-Source Alternatives

Where it stands vs open embeddings on MTEB:

Model MTEB avg Dim Cost
text-embedding-3-large 64.6 3072 $0.13/1M
text-embedding-3-small 62.3 1536 $0.02/1M
BGE-large-en-v1.5 64.2 1024 own infra
e5-large-v2 63.4 1024 own infra
text-embedding-ada-002 60.9 1536 $0.10/1M

BGE-large and E5-large are nearly on par in quality but require own infra. Classic trade-off: OpenAI is simple but external dependency; self-hosted is control but ops.

Latency Considerations

OpenAI embedding API: ~50-200ms p50, can hit 1s+ on spikes. For batch processing it doesn’t matter; for real-time queries it can be problematic.

Alternatives:

  • Aggressive caching: similar queries can reuse cached embeddings.
  • Local model for the query (fast) + index with OpenAI embeddings for docs (precomputed offline). Conceptually different but works if you choose the local model well.
  • Batch embeddings: OpenAI accepts up to 2048 inputs per call — amortise latency.

Data Residency

OpenAI processes in US by default. For European compliance:

  • Azure OpenAI offers EU regions (West Europe, etc).
  • text-embedding-3 available on Azure with same SLA.
  • Similar pricing.

For regulated European companies, Azure is the path; for the rest, direct OpenAI is usually more agile.

When to Migrate

Clear criteria:

Migrate now:

  • Recall or precision is your RAG bottleneck.
  • Multilingual corpus (larger MIRACL gain).
  • High ada-002 volume where you save with 3-small.

Wait:

  • RAG works well, stable metrics.
  • Upcoming product changes that will require reindexing anyway.
  • Low volume where reindex cost doesn’t recover soon.

Own Benchmark Example

Don’t just trust MTEB. Benchmark with your real corpus:

# Golden set: 500 query-docRelevant pairs manually curated
for q, expected_doc in golden_set:
    # Ada-002
    old_emb = client.embeddings.create(input=q, model="text-embedding-ada-002")
    old_results = search(old_emb, index_ada)
    old_hits.append(expected_doc in old_results[:10])

    # Embedding-3
    new_emb = client.embeddings.create(input=q, model="text-embedding-3-small")
    new_results = search(new_emb, index_new)
    new_hits.append(expected_doc in new_results[:10])

print(f"ada: {sum(old_hits)/len(old_hits)*100:.1f}%")
print(f"new: {sum(new_hits)/len(new_hits)*100:.1f}%")

500 well-selected pairs are more informative than millions of public benchmarks.

Common Mistakes

What we’ve seen break:

  • Forgetting to normalise: cosine distance vs inner product aren’t always interchangeable. OpenAI normalises, but accidental doubling causes issues.
  • Mixing models in the same index: searching embeddings from different models in the same index gives garbage.
  • Badly truncated dimensions: truncating to [1:N+1] instead of [:N]. Off-by-one kills quality.
  • Not measuring real recall post-migration. Only measuring “API responds”.

Conclusion

text-embedding-3 is a real, measurable improvement over ada-002. The variable-dimensions option is particularly interesting for operational impact — less memory, faster queries. Migration is worth it for most serious RAG cases, though you must budget reindexing. For teams with multilingual corpus, the MIRACL improvement is particularly relevant. Against open-source alternatives, the decision remains the classic one between managed simplicity vs self-hosted control.

Follow us on jacar.es for more on embeddings, RAG, and model migration strategies.

Entradas relacionadas