Embeddings are the bridge between human language and product intelligence. They turn text into dense vectors that can be compared, clustered, and searched, which is why they power semantic search, recommendations, deduplication, and memory features. But a list of numbers rarely builds intuition.
This tutorial makes the pipeline tangible. You will watch tokenization create IDs, see a masked model score candidate tokens, generate sentence embeddings with different pooling strategies, and finish by ranking results with cosine similarity - all running locally in the browser with Transformers.js.
The lesson is simple: tokens feed the model, logits explain a model’s next‑token beliefs, embeddings compress meaning, and vector search turns that meaning into ranked results. We will make each step visible.
In this walkthrough you will:
- Watch how token IDs and special tokens are created.
- See masked token predictions and how logits become probabilities.
- Generate embeddings and inspect their structure.
- Store vectors in localStorage and rank results with cosine similarity.
Everything runs locally in your browser. The first run will download the models you choose and cache them for reuse.
Running embeddings on edge devices changes the product equation:
- Lower latency: users get instant results without round‑tripping to a server.
- Privacy by default: sensitive text never leaves the device.
- Cost control: similarity search happens client‑side, reducing inference spend.
- Resilience: features keep working in flaky networks or offline.
Primer: tokens, logits, embeddings
Tokenization converts text into IDs. Models like BERT prepend special tokens such as [CLS] (classification) and
append [SEP] (separator) so the model can identify boundaries. Those IDs are the true input to the transformer.
Logits are the raw scores a model produces before softmax. In a fill‑mask task, the model assigns a score to every vocabulary token for the masked position. Softmax turns those scores into probabilities so you can compare candidates.
Embeddings are the vector representations that drop out of the model. Sentence embeddings are produced by pooling token embeddings (mean pooling) or by using a designated token (CLS pooling). Normalizing vectors makes cosine similarity a simple dot product, which is perfect for nearest‑neighbor search.
Interactive lab
Each step is executable. Change a setting, watch the code update, and re-run to see the impact immediately.
Desktop required
Open this post on a desktop browser to run the interactive embeddings and vector search lab.
Model downloads
The first run will download model weights to your browser cache. Subsequent runs reuse the cached files.
Interactive lab
Transformers.js embeddings lab
Each step runs locally in the browser so you can see how tokens, logits, embeddings, and vector search connect.
Shared
Shared helpers
Step 1
Tokenize text into model-ready pieces
Transformers do not see words, they see token IDs. A tokenizer breaks text into a sequence of IDs that the model can embed and attend over.
Edit the sentence to watch how subwords appear, and toggle special tokens to see how sequence boundaries are encoded.
Run the step to see output and a rendered result.
const modelId = "Xenova/bert-base-uncased";
const text = "Build in-browser models that never leave the device.";
const showSpecialTokens = true;
const tokenizer = await getTokenizer(modelId);
const encoded = await tokenizer(text);
const ids = Array.from(encoded.input_ids.data ?? []);
const rawTokens = tokenizer.model?.convert_ids_to_tokens
? tokenizer.model.convert_ids_to_tokens(ids)
: tokenizer.convert_ids_to_tokens
? tokenizer.convert_ids_to_tokens(ids)
: ids.map(id => String(id));
const tokens = rawTokens.map((token, index) => ({
id: ids[index],
text: token,
special: token.startsWith("["),
color: token.startsWith("[") ? "rgba(255, 255, 255, 0.12)" : colorFromText(token),
}));
const visible = showSpecialTokens ? tokens : tokens.filter(token => !token.special);
return {
type: "tokens",
title: "Tokenization (" + visible.length + " tokens)",
note:
"Tokenizer: " +
modelId +
". Special tokens like [CLS] and [SEP] mark sequence boundaries before the model ever sees a word.",
tokens: visible,
insights: [
{
title: "Token IDs are the model's vocabulary",
body: "Every token maps to a numeric ID. The embedding table uses that ID to look up the vector the model starts with.",
accent: "Vocabulary",
},
{
title: "Subwords explain unknown words",
body: "Rare words are split into smaller pieces, so the model can recombine meanings without inventing new IDs.",
accent: "Subwords",
},
{
title: "Token count drives cost",
body: "Longer inputs create more tokens, which means more attention work and higher latency.",
accent: "Compute",
},
],
};Step 2
Predict a masked token (logit intuition)
Masked language models score every vocabulary token for the missing slot. The highest scores become the top guesses after softmax.
Change the prompt and Top‑K slider to see how the distribution shifts with different context.
Run the step to see output and a rendered result.
const modelId = "Xenova/bert-base-uncased";
const prompt = "The quick brown [MASK] jumps over the lazy dog.";
const topK = 5;
const fillMask = await getPipeline("fill-mask", modelId);
const predictions = await fillMask(prompt, { topk: topK });
const candidates = Array.isArray(predictions[0]) ? predictions[0] : predictions;
const items = candidates.slice(0, topK).map((item, index) => ({
token: item.token_str.trim(),
score: item.score,
color: scoreColor(item.score),
}));
return {
type: "logits",
title: "Masked token guesses",
note:
"Prompt: " +
prompt +
". Scores come from a softmax over the model's logits for the masked position.",
items,
insights: [
{
title: "Logits become probabilities",
body: "The model outputs raw scores (logits). A softmax converts them into probabilities that sum to 1.",
accent: "Softmax",
},
{
title: "Context reshapes the distribution",
body: "Change a word near the mask and the ranking shifts, because attention pulls in different signals.",
accent: "Attention",
},
{
title: "Top‑K is a slice, not the whole story",
body: "The model scores every token in the vocabulary, but we only visualize the top few for clarity.",
accent: "Top‑K",
},
],
};Step 3
Generate a sentence embedding
Embeddings compress meaning into a numeric vector you can compare. The model produces a vector per token, then pooling combines them into a single sentence representation.
Switch between mean and CLS pooling, then watch how the vector preview and norm change.
Run the step to see output and a rendered result.
const modelId = "Xenova/all-MiniLM-L6-v2";
const pooling = "mean";
const normalize = true;
const text = "Embeddings turn text into a numeric fingerprint you can compare.";
const embedder = await getEmbedder(modelId);
const tensor = await embedder(text, { pooling, normalize });
const vector = toVector(tensor);
let sum = 0;
let min = vector[0] ?? 0;
let max = vector[0] ?? 0;
for (const value of vector) {
sum += value * value;
if (value < min) min = value;
if (value > max) max = value;
}
return {
type: "embedding",
title: "Embedding snapshot",
dimensions: vector.length,
norm: Math.sqrt(sum),
min,
max,
preview: vector.slice(0, 16),
note:
"First 16 dimensions from " +
modelId +
". Pooling decides how token vectors combine before similarity comparisons.",
insights: [
{
title: "Pooling defines the sentence view",
body: "Mean pooling averages all tokens. CLS uses the first token embedding as a summary signal.",
accent: "Pooling",
},
{
title: "Normalization stabilizes similarity",
body: "Unit‑length vectors make cosine similarity a simple dot product and keep scores consistent.",
accent: "Normalize",
},
{
title: "Embeddings are high‑dimensional fingerprints",
body: "Small edits nudge the vector, which is why similar phrases end up close together.",
accent: "Geometry",
},
],
};Step 4
Run vector search with local storage
Vector search ranks documents by semantic similarity. We embed the query, compare it to stored vectors, and return the nearest neighbors.
Add a new document or swap datasets to see how the ranking shifts, and use localStorage to keep the index in your browser.
Run the step to see output and a rendered result.
const datasetId = "support";
const documents = [
"Checkout is stuck on loading after I click pay.",
"Cannot reset my password because the email never arrives.",
"The export button spins forever on large datasets.",
"Billing page shows two invoices for the same month.",
"Mobile layout cuts off the submit button on iOS.",
"I want a webhook when a job finishes successfully."
];
const query = "How do I fix the checkout spinner?";
const newDocument = "";
const topK = 3;
const persist = true;
const resetStore = false;
const modelId = "Xenova/all-MiniLM-L6-v2";
const pooling = "mean";
const normalize = true;
const storeKey = "embeddings-lab:" + datasetId;
const seedDocs = documents.map((text, index) => ({
id: "seed-" + index,
text,
}));
let stored = persist ? loadStore(storeKey) : null;
if (!Array.isArray(stored) || resetStore) {
stored = seedDocs;
}
if (newDocument.trim() && !stored.some(doc => doc.text === newDocument.trim())) {
stored = [
...stored,
{ id: "custom-" + Date.now(), text: newDocument.trim() },
];
}
const withVectors = [];
for (const doc of stored) {
const embedding = doc.embedding || (await embedText(modelId, doc.text, { pooling, normalize }));
withVectors.push({ ...doc, embedding });
}
if (persist) saveStore(storeKey, withVectors);
const queryVector = await embedText(modelId, query, { pooling, normalize });
const ranked = withVectors
.map(doc => ({
id: doc.id,
text: doc.text,
score: cosineSimilarity(queryVector, doc.embedding),
}))
.sort((a, b) => b.score - a.score)
.slice(0, topK);
return {
type: "vector-search",
title: "Top matches",
query,
count: withVectors.length,
items: ranked,
insights: [
{
title: "Query and documents share the same space",
body: "The query is embedded with the same model, then compared to each stored vector using cosine similarity.",
accent: "Similarity",
},
{
title: "Local storage is your lightweight vector DB",
body: "The lab stores vectors in localStorage so you can keep adding documents without a server.",
accent: "Local",
},
{
title: "Embedding choice changes ranking",
body: "Switch the model or pooling and the nearest neighbors will reorder.",
accent: "Tuning",
},
],
};Concept map
Tokenization → Logits → Embeddings → Vector search
Tokenization creates a list of IDs the model understands. A masked model produces logits for the missing token, which are converted into probabilities. Embedding models compress entire sequences into fixed‑length vectors. Vector search compares those vectors to rank the nearest neighbors.
Why this matters for real apps
- Search without keywords: embeddings let users search by meaning, not exact phrasing.
- Product feedback clustering: vectors reveal themes without manual tagging.
- Local-first AI: in‑browser embeddings keep data private and reduce infrastructure.
What to notice as you experiment
- Token boundaries: word pieces like
##ingshow how subwords are stitched together. - Masked predictions: top tokens hint at what the model thinks is the most likely continuation.
- Embeddings: tiny edits to the sentence change the vector preview and similarity scores.
- Vector search: rankings shift as you add documents or switch pooling strategies.