AI & ML Beginner By Samson Tanimawo, PhD Published Apr 8, 2025 8 min read

Semantic Search vs Keyword Search

Keyword search finds the exact words. Semantic search finds the meaning. Each fails the other’s easy cases. The right answer in production is almost always to use both.

The two strategies, side by side

Keyword search matches exact terms (or stems). The classical implementation is BM25, a 1990s information-retrieval algorithm that scores documents by term frequency (how often the query terms appear in a doc) weighted against inverse document frequency (how rare those terms are in the corpus).

Semantic search matches meaning. You embed the query into a vector and find documents whose vectors are nearest. “Pod won’t start” matches a doc titled “CrashLoopBackOff troubleshooting” even though the words are different.

Where semantic search wins

Semantic search shines when your users phrase queries differently from how your documents are written.

Where keyword search wins

The classic failure mode of pure semantic search: the user types an exact identifier they remember and gets back conceptually-similar documents instead of the exact match they wanted.

Hybrid search: best of both

The pragmatic answer is to run both and combine. Two combination strategies:

RRF is the default starting point. Weighted blending squeezes out a few more points of accuracy if you have the eval data to tune it.

The reranking step

For the highest accuracy, add a cross-encoder reranker after retrieval. Here’s the shape of the pipeline:

  1. Hybrid retrieval returns the top 50-100 candidates (cheap and approximate).
  2. A cross-encoder model scores every (query, candidate) pair to produce a more accurate relevance score (slow but exact).
  3. The top 5-10 from the reranker go to the LLM as context.

Cross-encoders are dedicated reranking models like BGE-reranker, Cohere Rerank, or Voyage-rerank. They’re much smaller than LLMs (typically 100M-1B parameters) and much more accurate than embedding-based retrieval, at a per-query cost of 10-100ms for 50 candidates.

Adding a reranker is the single biggest accuracy improvement most RAG systems can make. Skipping it leaves easy 10-20% gains on the table.

Putting it together

A solid 2025 search stack:

  1. BM25 keyword index (Elasticsearch / OpenSearch / Postgres full-text).
  2. Vector index (pgvector / Pinecone / Weaviate / Chroma) using a 768- or 1536-dim embedding model.
  3. Run both for every query, retrieve top 50 from each, fuse with RRF.
  4. Rerank the top 100 candidates with a cross-encoder.
  5. Send the top 5-10 to the LLM.

This shape works at small scale (a thousand documents) and large scale (tens of millions). The components scale independently. Pure semantic or pure keyword setups eventually hit ceilings the hybrid stack doesn’t.