Embeddings & vector search: meaning as geometry
An embedding turns a piece of text into a vector — a list of numbers that captures its meaning — so that text with similar meaning lands near it in space. A vector store indexes those vectors, and vector search finds the nearest ones to a query. That single idea powers semantic search and the retrieval step in RAG. Explore the space, the cosine measure, and the pipeline, right here on the page.
01What an embedding is
Imagine giving every piece of text a map pin, placed so that things which mean similar things sit close together and unrelated things sit far apart. That's the whole idea here: turning meaning into location. The technique that does it is called an embedding — it converts a piece of text (a word, a sentence, a whole paragraph) into a vector, which is just a fixed-length list of numbers. Those numbers are produced so they capture the text's meaning rather than its exact spelling: two passages that mean similar things get similar numbers, and passages about unrelated topics get very different numbers. Read each number as a coordinate, and every piece of text becomes a single point in space — real models use hundreds or thousands of dimensions, far more than we can draw.
- Text in, a vector of numbers out — the same model embeds both your documents and your query the same way.
- The vector encodes meaning, not spelling: "car" and "automobile" land close even though they share no letters.
- Because meaning becomes geometry, you can compare texts with simple math instead of matching words.
02Close in space = close in meaning
If similar meanings get similar vectors, then nearby points mean similar things. Below is a tiny, hand-placed map of words in a flattened 2-D space (real embeddings have far more dimensions — this is an illustration, not a model's output). Click a word to see its nearest neighbours light up and read a similarity score for each. Notice the clusters: royalty, animals, and vehicles each huddle together.
Nearest neighbours
Click any word in the map. Its closest neighbours (smallest distance) are its most similar in meaning — and they'll almost always be from the same cluster.
Two ways to measure "closeness" come up constantly. Distance (how far apart two points are) and cosine similarity (whether two vectors point the same way). Cosine ignores how long the vectors are and looks only at direction, which is why it's the default for comparing meaning — a short note and a long document about the same topic can still point the same way.
03Where the vectors live
Once your text is embedded, the vectors need a home that can search by similarity. That home is a vector store (often called a vector database). It does three jobs: it stores each vector with a reference back to its original text, it builds an index so it doesn't have to compare every vector one by one, and it answers a query by returning the closest vectors. Tap each block to see what it does.
Store
Every chunk's embedding is saved alongside a reference to the original text (and any metadata you want to filter on later, like source or date). The vector is what gets searched; the reference is what lets the system hand you back the real passage once a match is found.
04Nearest-neighbour search in action
Finding the closest vectors to a query is a nearest-neighbour search. Checking every vector exactly is slow at scale, so vector stores use approximate nearest neighbour (ANN) indexes that trade a little accuracy for a lot of speed. That fast meaning-based lookup is what powers semantic search and the retrieval step inside RAG.
Semantic search — match meaning, not words
Embed the query, find the nearest vectors, return their original passages. Because matching is by meaning, a search for "how do I get my money back" can surface a passage titled "Refund policy" even with no shared keywords.
Powering RAG — this is the "retrieve" step
RAG (Retrieval-Augmented Generation) fetches relevant documents before a language model answers. The fetch is exactly this vector search: embed the question, pull the nearest chunks from the store, and hand them to the model as context so the answer is grounded in real sources.
vs keyword search — different strengths
Classic keyword search matches the exact words you type — great for names, codes, and precise terms, but it misses synonyms and paraphrases. Vector search matches meaning — great for natural-language questions, but it can blur very specific terms. Many real systems combine both (often called hybrid search).
05The pipeline — and where it breaks
Putting it together, building a searchable knowledge base is a short pipeline: chunk your documents into passages, embed each chunk with an embedding model, and store the vectors in a vector index. At query time you embed the query the same way and run a nearest-neighbour search. The same embedding model must be used on both sides — vectors from different models aren't comparable.
- Chunk — split long documents into smaller passages so a match points to just the relevant part, and the text fits in a prompt.
- Embed — turn each chunk (and later, each query) into a vector with the same model.
- Store & index — load the vectors into a vector store that can search them by similarity.
- Search — embed the query, return the nearest chunks, and use them downstream (display them, or feed them to a model for RAG).
Embeddings are powerful but not magic. A few honest limits worth knowing:
- Quality depends on the model. Embeddings reflect what their model learned; a general model may struggle with specialised jargon or a language it saw little of.
- Chunking choices matter. Chunks that are too big blur several ideas into one vague vector; too small and they lose context. The right size depends on your content.
- "Close" isn't always "correct." The nearest vector is the most similar, which is usually but not always the most relevant — retrieval can surface a plausible-looking but unhelpful passage.
- Approximate search trades some accuracy. ANN indexes are fast because they may occasionally miss a true nearest neighbour. Usually a worthwhile trade, but worth remembering.