Agentic lesson

Track 03 · Agentic Intermediate ~8 min

How semantic search finds meaning, not just words

Keyword search matches the letters you typed. Semantic search matches what you meant — so it can surface the right document even when it shares not a single word with your query. The trick is turning text into vectors and looking for the nearest neighbours. See it happen on the page.

Module progress

01The problem: words can hide meaning

Classic keyword search (the kind behind BM25 and TF-IDF) ranks documents by how much their exact words overlap with your query. It is fast and precise when you know the right term — but it is brittle. Search for “how do I lower my heart rate?” and a perfect article titled “reducing your pulse” can score zero, because it never repeats your words. This is the vocabulary-mismatch problem: the same idea, expressed differently, slips through the net.

Semantic search attacks this from the other side. Instead of comparing the words, it compares the meaning. It does that by encoding both the query and every document into vectors — lists of numbers — positioned so that texts with similar meaning land near each other, whether or not they share any vocabulary. Retrieval becomes a geometry problem: find the document vectors closest to the query vector. Meta’s Dense Passage Retrieval (DPR) showed that a learned vector retriever can beat strong keyword baselines like BM25 on open-domain question answering.

Keyword / sparse retrieval matches on shared terms — precise, but blind to synonyms and paraphrase.
Semantic / dense retrieval matches on meaning encoded as vectors — it finds relevant docs that share no keywords.
“Vector search” is the mechanism; “semantic search” is the capability it delivers.

02Embeddings: turning meaning into coordinates

An embedding is a dense vector — a list of floating-point numbers — produced by a model so that it captures the meaning of a piece of text (or an image, or audio). The guiding rule is simple: similar meanings map to nearby points in the vector space. The idea goes back to word vectors like Word2Vec (2013) and GloVe (2014), which learned a single static vector per word; modern contextual models such as BERT, and sentence/passage encoders like SBERT, extended this to whole sentences and documents whose meaning depends on context.

Once text is vectors, “closeness” needs a number. The usual measure is cosine similarity — the angle between two vectors — which runs from about −1 (opposite) through 0 (unrelated) to 1 (same direction). Many providers normalise vectors to length 1, in which case ranking by cosine similarity, dot product, or Euclidean distance all agree. Real embedding models are high-dimensional (often hundreds to thousands of numbers per vector), but the principle is identical to the 2-D picture you can play with next.

An embedding = a dense vector that places a text by its meaning; nearby vectors mean similar meaning.
Cosine similarity scores how aligned two vectors are — higher means more semantically similar.
Query and document vectors must come from the same model and dimension to be comparable.

03Watch a query find its neighbours

Below is a tiny corpus of a dozen documents, each plotted as a point on a 2-D embedding plane — an illustrative stand-in for the high-dimensional space a real model would use. Pick a query (or type your own from the suggested words). The query gets “embedded” — placed on the plane — and the closest documents by cosine similarity light up. Then flip the toggle to Keyword to see which documents a literal exact-word match would have returned. Watch how semantic mode surfaces relevant documents that share no words with your query.

InteractivePick a query, then toggle modes

document top match query

Your query

Or try one

Search mode

04Bi-encoders, cross-encoders & reranking

How the query and a document get compared is the design decision that makes semantic search both fast and accurate. There are three building blocks. Switch between them to see how each trades speed for precision — and why production systems usually combine them.

InteractiveSwitch the encoder

Bi-encoder — one vector each, compared by similarity

A bi-encoder (dual-encoder) runs the query and each document through the model independently, producing one vector apiece; relevance is just a fast cosine or dot-product comparison. Because documents are encoded ahead of time and stored in an index, search scales to millions of items. SBERT is the canonical sentence bi-encoder; DPR applied the idea to passage retrieval.

strength: pre-index documents once → sub-second first-stage retrieval at scale

trade-off: query and doc never “see” each other, so fine-grained matches can be missed

Cross-encoder — query and document scored together

A cross-encoder feeds the query and a candidate document through the model at the same time and outputs a single relevance score for the pair. Letting the two attend to each other makes it more accurate — but it cannot pre-index documents, so scoring the whole corpus is infeasible. It is used as a reranker over a small candidate set, never as the first-stage retriever.

strength: highest precision on the pairs it scores

trade-off: must re-run per query×doc → only viable on a short shortlist

Late interaction — token-level matching, still indexable

ColBERT keeps a contextual vector for each token rather than one pooled vector, and scores a pair with MaxSim: for every query token, take its best match among the document’s tokens, then sum. This “late interaction” captures finer matching than a single vector while staying indexable; ColBERTv2 adds residual compression to shrink the index.

strength: near cross-encoder quality with indexable, scalable retrieval

trade-off: larger index footprint (many vectors per document)

The standard pattern ties them together: a fast bi-encoder retrieves the top-k candidates from the full corpus, then a cross-encoder reranks just those for precision. This retrieve-then-rerank pipeline gets the recall of vector search with the accuracy of pairwise scoring — without ever running the expensive model over every document.

05Making it work at scale

Comparing a query against millions of vectors one by one would be too slow, so production systems use Approximate Nearest Neighbor (ANN) indexes — structures like HNSW — that find the nearest vectors in sub-second time by trading a sliver of recall for a large speed gain. Vector databases and search engines (Pinecone, Weaviate, Elasticsearch, OpenSearch) build on exactly this.

Two more practical levers matter. Hybrid search fuses dense (semantic) scores with sparse (keyword/BM25) scores, so you get meaning-based recall and exact-term precision — useful when product codes, names, or rare terms must match literally. And some models embed queries and documents asymmetrically (for example Cohere’s search_query vs search_document input types, or E5’s query: / passage: prefixes) to better match short questions against longer passages.

Finally, which embedding model? The neutral way to compare is the Massive Text Embedding Benchmark (MTEB), which evaluates models across many task types — retrieval, reranking, clustering, classification, semantic textual similarity — over many datasets and languages, with a public leaderboard. Pick for your task (retrieval fit matters more than a single aggregate score), date any “best model” claim, and remember that query and document vectors must come from the same model and dimension to be comparable.

ANN indexes (e.g. HNSW) make million-scale vector search fast by approximating nearest neighbours.
Hybrid search blends semantic recall with keyword precision for the best of both.
Compare embedding models on MTEB by the task you actually care about, not a single headline number.

06Check your understanding

TJS Quiz

07Take it with you & go deeper

“Semantic search” — one-page summary

The whole lesson distilled to a printable cheat-sheet.

▸ Already on the site — go deeper

Live lesson

Embeddings & vector search

Go one level deeper on the vectors that make semantic search possible.

Read →

Live lesson

Vector databases

Where embeddings live and how ANN indexes serve them at scale.

Read →

▸ Where it leads next

Live lesson

Retrieval-augmented generation (RAG)

Semantic search is the retrieval engine behind RAG — see how it feeds an LLM.

Read →

Coming soon

Vector index algorithms (HNSW, IVF, PQ)

The data structures that make approximate nearest-neighbour search fast.

Coming soon

⊕Concept map

Expand each branch to see how the core ideas in this lesson connect.

Keyword vs meaning

Keyword (sparse) retrieval like BM25 and TF-IDF ranks by exact term overlap — precise but blind to synonyms and paraphrase.
The vocabulary-mismatch problem: the same idea worded differently can score zero.
Semantic (dense) retrieval matches on meaning, so it can surface relevant docs that share no query words.

Embeddings & similarity

An embedding is a dense vector of numbers that places a text by its meaning — similar meanings map to nearby points.
Cosine similarity scores how aligned two vectors are; for length-1 vectors it agrees with dot product.
Word vectors (Word2Vec, GloVe) led to contextual models like BERT and sentence encoders like SBERT.
Query and document vectors must come from the same model and dimension to be comparable.

Nearest-neighbour retrieval

The query is embedded into the same space, and the closest document vectors by cosine similarity become the results.
Retrieval becomes a geometry problem: find the document vectors nearest the query vector.
Semantic mode surfaces relevant documents that share no words with the query; keyword mode only matches literal terms.

Encoders & reranking

A bi-encoder encodes query and document independently into one vector each — fast and pre-indexable at scale.
A cross-encoder scores the query and document together for higher precision, but can only rerank a small candidate set.
Late interaction (ColBERT) keeps per-token vectors and scores with MaxSim — indexable, near cross-encoder quality.
Retrieve-then-rerank: a bi-encoder retrieves top-k, then a cross-encoder reranks just those.

Scaling & model choice

Approximate Nearest Neighbor indexes such as HNSW make million-scale vector search fast by trading a sliver of recall.
Hybrid search fuses dense (semantic) and sparse (keyword) scores for meaning recall plus exact-term precision.
Some models embed queries and documents asymmetrically to match short questions against longer passages.
Compare embedding models on MTEB by the task you care about, not a single headline number.

Sources & further reading

Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; the 2-D plane and similarity scores in the interactive are illustrative and labelled as such.

Dense Passage Retrieval for Open-Domain QA (DPR) — Karpukhin et al., EMNLP 2020
Sentence-BERT (SBERT) — Reimers & Gurevych, EMNLP 2019
ColBERT: Late Interaction over BERT — Khattab & Zaharia, SIGIR 2020
MTEB: Massive Text Embedding Benchmark — Muennighoff et al., 2022
Vector embeddings guide — OpenAI
Semantic search with embeddings — Cohere
Semantic Search documentation — Sentence Transformers (Hugging Face / UKP Lab)
kNN search — Elastic
Vector search concepts — Weaviate
GloVe: Global Vectors for Word Representation — Stanford NLP Group

Responsible use & transparency

Educational scope. This is an introductory lesson on how semantic search works. The embedding plane, the document points, and the similarity percentages in the interactive are illustrative — hand-placed to teach the geometry — not the output of a production model. Real systems use high-dimensional vectors and trained embedding models.

Verify before you build. Embedding model dimensions, token limits, input-type conventions, and benchmark rankings change between model generations. Treat vendor performance and scale claims as vendor-reported, prefer neutral benchmarks like MTEB for model comparison, and confirm specifics against current provider documentation before relying on them.

Your data. Sending text to a hosted embedding API means that text leaves your environment. Review the provider's data-handling and retention terms, and follow your organisation's policy before embedding sensitive or regulated content.

Semantic search — in 8 minutes

Tech Jacks Solutions · AI Knowledge Hub · educational summary

Keyword vs meaning

Keyword (sparse) search matches exact shared words — precise but blind to synonyms and paraphrase. Semantic (dense) search matches meaning encoded as vectors, so it finds relevant documents that share no keywords.

Embeddings

An embedding is a dense vector that captures a text's meaning; similar meanings map to nearby points. Similarity is usually scored by cosine similarity. Query and document vectors must come from the same model and dimension.

Nearest-neighbour retrieval

The query is embedded, then the closest document vectors are returned. Retrieval is a geometry problem: find the nearest neighbours in the embedding space.

Encoders & reranking

Bi-encoders encode query and docs independently (pre-indexed, fast, scalable). Cross-encoders score a pair together (accurate, used to rerank a shortlist). ColBERT's late interaction keeps per-token vectors. Standard pipeline: bi-encoder retrieves, cross-encoder reranks.

Scaling & model choice

ANN indexes (e.g. HNSW) give fast approximate search at million scale. Hybrid search blends dense + sparse signals. Compare embedding models on MTEB for your specific task.

Gallery

Contacts

How semantic search finds meaning, not just words

01The problem: words can hide meaning

02Embeddings: turning meaning into coordinates

03Watch a query find its neighbours

04Bi-encoders, cross-encoders & reranking

Bi-encoder — one vector each, compared by similarity

Cross-encoder — query and document scored together

Late interaction — token-level matching, still indexable

05Making it work at scale

06Check your understanding