Submodule 2: How LLM Memory Works: Interactive Visual Guide

Learn how Large Language Models like GPT and Claude handle memory through interactive visualizations!

Part 1: The Context Window - LLM’s “Working Memory”

LLMs don’t have memory like humans. Instead, they have a context window - think of it as a sliding window that can see a limited amount of text at once.

Interactive Demo: Context Window Visualization

🪟 Context Window Simulator

This demonstrates how an LLM can only "see" a limited number of messages at once. Older messages fall out of the context window.

Context Window Size: 5 messages

In Context (LLM can see)

Out of Context (Forgotten)

Current Processing

Total Messages: 0

In Context Window: 0

Forgotten (Out of Context): 0

Context Usage:

Part 2: Attention Mechanism - How LLMs “Focus”

The attention mechanism allows LLMs to weigh the importance of different tokens (words/pieces) in the context. Not all words are equally relevant!

Interactive Demo: Attention Weights Visualizer

🎯 Attention Mechanism Visualizer

Attention allows the LLM to determine which words in the context are most relevant to the current word being processed.

Enter a sentence:

💡 How it works:
Click "Calculate Attention" to see how each word attends to other words. The intensity of the color shows the attention weight - stronger colors mean the word is paying more attention to that token.

Click on any word to see what it's paying attention to!

Attention Strength:

Low → High

Enter a sentence and click "Calculate Attention" to begin

Part 3: Training vs Runtime - Two Types of “Memory”

LLMs have two phases: training (where they learn patterns) and runtime (where they use those patterns). They can’t learn new facts during runtime!

Interactive Demo: Training vs Runtime Simulator

🧠 Training vs Runtime: How LLMs "Learn"

This simulation shows the difference between training (when the model learns) and runtime (when it uses what it learned).

📚 Training Phase

Model learns patterns from data.
This happens BEFORE deployment.

Example: "Paris is the capital of France"

⚡ Runtime Phase

Model answers using learned patterns.
Cannot learn new information!

Example: "What is the capital of France?"

Facts Trained

Queries Made

Success Rate

💾 Model's Knowledge Base (From Training)

// Model's learned knowledge appears here...

// No training data yet. Use the training phase to teach the model!

💬 Model Response

Train the model or ask it a question to see responses...

Part 4: No Persistent Memory Between Conversations

Each conversation is isolated. The LLM cannot remember previous conversations!

Interactive Demo: Conversation Isolation

💬 Conversation Isolation Simulator

See how LLMs cannot access information from previous conversations. Each session is completely isolated!

🎯 Try This:

Tell Session A your name in the chat
Start Session B (new conversation)
Ask Session B what your name is - it won't know!

⚠️ Key Point: Unlike humans, LLMs do not have long-term memory across conversations. When you start a new chat, the AI has absolutely no memory of previous chats - it's like meeting someone with complete amnesia every time.

Summary: Key Takeaways

✅ What You Learned

Context Window: LLMs have a sliding window of recent messages they can “see”
Attention Mechanism: Not all words are equally important - attention helps focus on relevant tokens
Training vs Runtime: Learning happens during training, not during conversations
No Persistent Memory: Each conversation is completely isolated - no memory between sessions

🎯 Real-World Implications

Token Limits Matter: Long conversations eventually lose early context
Can’t Learn New Facts: You can’t teach an LLM new information during a chat
Reset Each Time: Starting a new conversation = starting from scratch
Context is Everything: All the model knows is what’s in the current conversation

🚀 Advanced Topics (Not Covered Here)

Embeddings: How text is converted to numbers
Transformer Architecture: The neural network structure
Fine-tuning: Specialized training for specific tasks
RAG (Retrieval Augmented Generation): Adding external knowledge
Vector Databases: Long-term storage solutions

Want to learn more? Try modifying the demos, experimenting with different context window sizes, or building your own LLM memory visualizations!