Terms Dictionary

Industry terms, concepts, and standards used across Pro Trailblazer content. Filter by letter, category, or search.

A

Agent
Agentic Systems

An LLM call wrapped in three things: a scoped job, a set of tools it is allowed to call, and a clear output contract. Agents are not smarter models, they are narrowly focused models with defined inputs, outputs, and side effects.
Agentic workflow
Agentic Systems

An AI system that splits one job into steps, assigns each step to a specialized agent, and routes the work between them through a central orchestrator. The intelligence lives in how the pieces are wired together, not in any single model call.
Attention mechanism
Model Architecture

The method that lets each token look at other tokens in the sequence and decide which ones matter for understanding it.

B

Backpropagation
Model Training

The algorithm that calculates how each weight contributed to the loss, walking backward through the network so weights can be updated. Training-only. Not involved in generating a response once the model is deployed.
Batch size
Model Training

How many training examples are processed together before one weight update. Bigger batches give more stable gradient estimates but require more memory.
Best-of-N
Inference

A sampling strategy that draws N independent completions from a model, scores each one (with a reward model, verifier, or log-probability), and returns the highest-scoring completion. Replaces voting with picking, which works well when the scoring function is trustworthy and badly when it is not.

C

Causal masking
Model Architecture

A mask applied inside attention that prevents a token from looking at future tokens. This is what makes the model autoregressive, so it can only condition on what came before.
Chain-of-thought
Inference

A prompting or decoding pattern where the model produces intermediate reasoning steps before its final answer, rather than jumping straight to a conclusion. Longer chains tend to improve accuracy on multi-step problems by letting the model factor the work into smaller subproblems.
Context window
Model Architecture

The maximum number of tokens the model can hold at once. Prompt, prior conversation, and output-so-far all live inside it.

E

Embeddings
Model Architecture

High-dimensional vectors that represent tokens. Each token maps to a list of numbers that encodes its meaning, and similar meanings end up near each other in that space. These self-organize during pretraining without explicit labels.
Example Term
General Terms

This is a placeholder glossary term showing how entries are structured. Replace with real terminology as you build out the glossary.

You can include markdown-style HTML here for richer descriptions. This field is optional.

F

Few-shot
Inference

A prompting pattern where the model receives two to five example input-output pairs before the actual task. The examples shift the conditional distribution over next tokens toward the demonstrated format without any weight updates to the model.
Forward pass
Model Architecture

Data flowing input to output through the model. Tokens go in, embeddings get looked up, attention and feedforward layers transform things, a prediction comes out. Happens during both training and inference.

G

Gating network
Model Architecture

The small neural network inside a Mixture of Experts layer that scores every expert for a given token and picks the top-k to route through. The gating network is tiny compared to the experts but decides everything about which parameters a token actually activates.
Gradient descent
Model Training

The optimization procedure that uses the gradients from backprop to step weights in the direction that lowers loss. The ball-rolling-downhill image.
Greedy decoding
Inference

The simplest decoding strategy: always pick the highest-probability token at each step. Deterministic and fast, but tends to produce repetitive or bland output on longer generations because small probability gaps compound over many steps.

H

Hyperparameters
Model Training

Settings chosen by humans before training (learning rate, batch size, layer count, etc.). Not learned by the model.

J

JSON-RPC
Agentic Systems

A simple remote procedure call format that encodes every request and response as a JSON object with a method name, parameters, and an id. MCP uses JSON-RPC 2.0 as its wire format over both stdio and HTTP.

K

KV cache
Inference

Per-position keys and values from the attention layer, stored in GPU memory during generation so they do not have to be recomputed for every new token. Without a KV cache, autoregressive inference would scale quadratically with sequence length instead of linearly per step.

L

Logits
Model Architecture

The raw, unnormalized scores a language model's final layer produces, one per token in the vocabulary. Softmax turns logits into a probability distribution, and most sampling parameters (temperature, top-k, top-p) are applied to the logits before that final normalization.
Loss function
Model Training

A score for how wrong the model's prediction was on a training example. Training aims to drive this down.

M

Majority voting
Inference

The aggregation step at the heart of self-consistency: take N independent samples, extract each final answer, and return the answer that appears most often. Works on discrete answers (math results, multiple choice, yes/no) but not on open-ended generation where no two samples produce the same string.
MCP host
Agentic Systems

The AI application (Claude Desktop, Claude Code, an IDE, an agent) that runs the model and speaks MCP to one or more servers. The host owns the conversation, the tool catalog, and any API keys.
MCP server
Agentic Systems

A small program that exposes tools, resources, and prompts to an MCP host. Servers run as separate processes (stdio child or remote HTTP) so a single host can plug many of them in without changing its code.
Mixture of Experts
Model Architecture

A transformer architecture that replaces each layer's feed-forward block with a bank of smaller feed-forward networks called experts, plus a gating network that routes every token through only a small top-k subset. This gives the model a huge total parameter count while keeping per-token compute small.
Model Context Protocol
Agentic Systems

An open standard from Anthropic that lets AI apps discover and call external tools, read external data, and invoke templated workflows through a common JSON-RPC interface. Abbreviated MCP.
Multi-head attention
Model Architecture

Running attention in parallel multiple times within the same layer, each head with its own learned projections. Different heads can specialize in different kinds of relationships.

O

Orchestrator
Agentic Systems

The component in an agentic workflow that holds the pipeline definition, passes outputs between agents, handles retries on failure, and decides when to drop a task. It is usually a few hundred lines of code, not a separate model, and it is where the business logic lives.
Overfitting
Model Training

When a model learns training data too specifically and fails to generalize. Good on training set, bad on new data.

P

Parallel training
Model Training

Training across many GPUs or TPUs at once. Can be data parallelism (same model copies, different batches), model parallelism (model split across devices), or hybrids.
Pretraining
Model Training

The initial training phase on massive text corpora, where the model learns general language patterns, before any fine-tuning or alignment work.

Q

Q/K/V framework
Model Architecture

The three projections inside attention. Each token produces a Query (what am I looking for?), a Key (what do I offer?), and a Value (the actual info to pass along). Q times K produces match scores, and those scores weight the sum of Vs.

S

Self-consistency
Inference

A sampling strategy that asks the model for many independent reasoning chains on the same question and returns the most common final answer. The intuition is that wrong answers tend to disagree with each other while correct answers tend to agree, so majority voting concentrates the signal.
Softmax
Model Architecture

The function that turns a vector of raw scores (logits) into a probability distribution by exponentiating each score and normalizing so the results sum to 1. It is the last step before a language model either picks or samples a token.
Sparse activation
Model Architecture

Using only a fraction of a model's parameters for any given input. In Mixture of Experts, only the top-k experts per layer are touched for each token, so compute scales with active parameters rather than total parameters. Memory still scales with the total.

T

Temperature
Inference

An inference-time sampling parameter. Zero means always pick the highest-probability token. Higher values spread probability across alternatives for more varied output.
Test-time compute
Inference

Compute spent at the moment a model answers a question, rather than during training. Scaling test-time compute, through longer reasoning chains, more parallel samples, or search over branches, can raise accuracy without changing the weights. The core idea behind self-consistency, best-of-N, and the o1, o3, and R1 reasoning-model family.
Tokens
Model Architecture

The discrete units a model actually processes. Text is broken into tokens (usually subword chunks, not whole words) before anything else happens.
Tool use
Agentic Systems

The pattern where an LLM emits a structured call to an external function (a pricing API, a database lookup, a code runner) and incorporates the result into its next response. Tool use is what turns a text generator into an agent that can actually change the state of the world.
Top-k sampling
Inference

A sampling strategy that keeps only the K highest-probability tokens at each step, discards the rest, renormalizes, and draws a token from what is left. The cutoff is rank-based, so the same K can feel too loose on peaked distributions and too tight on flat ones.
Top-p sampling
Inference

A sampling strategy, also called nucleus sampling, that keeps tokens in descending probability order until their cumulative mass crosses a threshold P, then samples from that set. The eligible set grows and shrinks with the shape of the distribution, which is why it handles uncertainty better than top-k.
Training loop
Model Training

The repeated cycle of forward pass, compute loss, backpropagation, weight update, then repeat, over billions of examples.
Transformer layers
Model Architecture

Stacked blocks of attention plus feedforward. Each layer refines token representations based on the layer below. Modern LLMs stack dozens to 100+.

Z

Zero-shot
Inference

A prompting pattern where the model receives only the task description with no examples of the desired input-output format. The model relies entirely on its pretrained knowledge to produce a response.

A

Agent

Agentic workflow

Attention mechanism

B

Backpropagation

Batch size

Best-of-N

C

Causal masking

Chain-of-thought

Context window

E

Embeddings

Example Term

F

Few-shot

Forward pass

G

Gating network

Gradient descent

Greedy decoding

H

Hyperparameters

J

JSON-RPC

K

KV cache

L

Logits

Loss function

M

Majority voting

MCP host

MCP server

Mixture of Experts

Model Context Protocol

Multi-head attention

O

Orchestrator

Overfitting

P

Parallel training

Pretraining

Q

Q/K/V framework

S

Self-consistency

Softmax

Sparse activation

T

Temperature

Test-time compute

Tokens

Tool use

Top-k sampling

Top-p sampling

Training loop

Transformer layers

Z

Zero-shot