Big Ideas

AI (Artificial Intelligence)
A computer’s ability to solve problems and make decisions.

Model
A system that allows you to make predictions based on data.

Generative AI (GenAI)
AI models that generate media such as text, images, audio, or video.

LLM (Large Language Model)
An AI model that can perform a wide range of tasks using natural language.

Foundation Model
A general-purpose LLM that can be adapted into many applications but is not very useful on its own.

Training
The process of fitting a model’s predictions to example data.

Pre-training
Training a foundation model on large, general datasets.

Post-training
Adapting a foundation model into something useful (e.g., a chatbot).

Train-time Compute
The cost of training an LLM.

Test-time Compute
The cost of running or using an LLM after it’s trained.

Scaling Laws
Relationships between model size, compute, data, and performance.

Train-time Scaling Law
More data + more parameters + more compute → better performance.

Test-time Scaling Law
More tokens and reasoning steps → better performance.


Prompt Engineering

Prompt
The request you send to an LLM.

Inference
Using an LLM to generate an output.

Prompt Engineering
Designing prompts to optimize model performance.

Few-shot Prompting
Including examples in the prompt to improve results.

Context Window
The maximum amount of text an LLM can process at once.

Context Engineering
Filling the context window with the right information at the right time.

System Message
The instruction that defines the behavior of an LLM application.

Developer Message
Another term for a system message.

User Message
A prompt provided by the user.

Assistant Message
The response generated by the LLM.

Trace
All messages exchanged in a single LLM conversation.

Parameters
Numerical values that control how an LLM generates outputs.

Token
A unit of text the model understands.

Tokenization
Converting text into tokens.

Temperature
Controls how random or creative the output is.

Top-p
Limits responses to the most likely next tokens (e.g., top 10%).


Security & Safety

Hallucination
When an LLM generates false or made-up information.

Guardrails
Rules applied to inputs and outputs to control behavior.

Prompt Injection
Tricking an LLM into breaking its rules.

Red Teaming
Testing an AI system for failures and misuse.

Alignment
Ensuring AI behavior matches human intent and values.

AI Safety
Preventing AI systems from causing harm.

AI Governance
Policies and processes for responsible AI deployment.

Model Watermarking
Embedding markers to identify AI-generated content.


RAG (Retrieval Augmented Generation)

RAG
Providing external context to an LLM at request time.

Grounding
Giving references to reduce hallucinations.

Embedding
A numerical representation of a text’s meaning.

Chunking
Splitting documents into smaller pieces for retrieval.

Vector Database
Stores embeddings and their source text.

Semantic Search
Search based on meaning, not keywords.

Cosine Similarity
Measures how similar two embeddings are.

Top-k
Returns the top most relevant results.

Hybrid Search
Combines keyword and semantic search.

Reranker
Reorders results based on user intent.

Knowledge Graph
Information structured as connected concepts.

Graph RAG
RAG systems that use a knowledge graph.


AI Agents

AI Agent
An LLM system that can use tools to perform actions.

Agentic AI
LLMs with some level of autonomy.

Function Calling
Allowing LLMs to use tools programmatically.

Structured Outputs
LLMs generating JSON or structured data.

Reasoning Models
Models that think before responding.

Chain of Thought (CoT)
Step-by-step reasoning inside a response.

MCP (Model Context Protocol)
A standard for connecting tools and context to LLMs.

Agent Skills
Reusable ways to give agents context when needed.

A2A (Agent-to-Agent)
Agents working together.

Multi-Agent Systems
Multiple agents collaborating on a task.

Deep Research Agent
An agent designed to search and synthesize many sources.

Coding Agent
An agent designed to write software.


Evals

Eval
A metric that measures AI system performance.

Golden Dataset
Trusted test cases for evaluation.

Failure Modes
Common mistakes an AI system makes.

Offline Evals
Evaluations during development.

Online Evals
Evaluations in production.

Model Drift
Performance degradation over time.


Fine-Tuning

Fine-tuning
Adapting a model to a specific task.

Supervised Fine-tuning
Training using labeled examples.

Reinforcement Fine-tuning
Training through rewards and penalties.

SLM (Small Language Model)
An LLM with fewer than 10B parameters.

Quantization
Reducing parameter precision to save compute.

PEFT
Fine-tuning with minimal resources.

LoRA
Adapting a model using a small number of parameters.

Open-weight Model
A downloadable and modifiable LLM.

Instruction Tuning
Turning a foundation model into a chatbot.

Distillation
Training a smaller model using a larger one.

Synthetic Data
AI-generated data used for training.


Reinforcement Learning

Reinforcement Learning (RL)
Learning through trial and error.

RLHF
Aligning models using human feedback.

RLAIF
Aligning models using AI feedback.

RLVR
Optimizing for verifiable outcomes.

Reward Hacking
Optimizing metrics instead of intent.

DPO
A simplified alternative to RLHF.


Multimodal AI

Multimodal AI
AI that processes multiple input types.

Vision-Language Models (VLMs)
LLMs that understand images and text.

Diffusion Models
Models that generate images.


Transformers

Machine Learning (ML)
Learning patterns from data.

Natural Language Processing (NLP)
Teaching computers to understand language.

Neural Network
A system of connected mathematical operations.

Deep Neural Network
A neural network with many layers.

Transformer
The architecture behind modern LLMs.

Attention
Allows models to focus on relevant input.

Mixture of Experts (MoE)
Routing inputs to specialized sub-models.


Deep Learning

Deep Learning
Neural networks that learn features automatically.

Hyperparameter
A value that guides training.

Training Data
Data used to teach the model.

Validation Data
Data used to tune hyperparameters.

Testing Data
Data used only for final evaluation.

Loss Function
Measures prediction error.

Gradient Descent
Updates parameters to reduce loss.

Backpropagation
Computes parameter updates.

Epoch
One full pass through training data.

Batch Size
Number of examples processed at once.

Learning Rate
Controls update size.

Overfitting
Good training performance, poor generalization.

Underfitting
Poor performance overall.

Regularization
Techniques to reduce overfitting.


Traditional Machine Learning

Regression
Predicting continuous values.

Classification
Predicting discrete labels.

Clustering
Grouping similar data.

Supervised Learning
Learning from labeled data.

Unsupervised Learning
Learning from unlabeled data.

Feature
An input used for prediction.

Feature Engineering
Designing useful input features.