Submodule 2: How LLM Memory Works: Interactive Visual Guide

Learn how Large Language Models like GPT and Claude handle memory through interactive visualizations!


Part 1: The Context Window - LLM’s “Working Memory”

LLMs don’t have memory like humans. Instead, they have a context window - think of it as a sliding window that can see a limited amount of text at once.

Interactive Demo: Context Window Visualization

🪟 Context Window Simulator

This demonstrates how an LLM can only "see" a limited number of messages at once. Older messages fall out of the context window.

In Context (LLM can see)
Out of Context (Forgotten)
Current Processing
Total Messages: 0
In Context Window: 0
Forgotten (Out of Context): 0
Context Usage:
0%

Part 2: Attention Mechanism - How LLMs “Focus”

The attention mechanism allows LLMs to weigh the importance of different tokens (words/pieces) in the context. Not all words are equally relevant!

Interactive Demo: Attention Weights Visualizer

🎯 Attention Mechanism Visualizer

Attention allows the LLM to determine which words in the context are most relevant to the current word being processed.

💡 How it works:
Click "Calculate Attention" to see how each word attends to other words. The intensity of the color shows the attention weight - stronger colors mean the word is paying more attention to that token.

Click on any word to see what it's paying attention to!
Attention Strength:
Low → High

Enter a sentence and click "Calculate Attention" to begin


Part 3: Training vs Runtime - Two Types of “Memory”

LLMs have two phases: training (where they learn patterns) and runtime (where they use those patterns). They can’t learn new facts during runtime!

Interactive Demo: Training vs Runtime Simulator

🧠 Training vs Runtime: How LLMs "Learn"

This simulation shows the difference between training (when the model learns) and runtime (when it uses what it learned).

📚 Training Phase

Model learns patterns from data.
This happens BEFORE deployment.

Example: "Paris is the capital of France"

⚡ Runtime Phase

Model answers using learned patterns.
Cannot learn new information!

Example: "What is the capital of France?"
0
Facts Trained
0
Queries Made
0%
Success Rate

💾 Model's Knowledge Base (From Training)

// Model's learned knowledge appears here...
// No training data yet. Use the training phase to teach the model!

💬 Model Response

Train the model or ask it a question to see responses...

Part 4: No Persistent Memory Between Conversations

Each conversation is isolated. The LLM cannot remember previous conversations!

Interactive Demo: Conversation Isolation

💬 Conversation Isolation Simulator

See how LLMs cannot access information from previous conversations. Each session is completely isolated!

🎯 Try This:
  1. Tell Session A your name in the chat
  2. Start Session B (new conversation)
  3. Ask Session B what your name is - it won't know!
⚠️ Key Point: Unlike humans, LLMs do not have long-term memory across conversations. When you start a new chat, the AI has absolutely no memory of previous chats - it's like meeting someone with complete amnesia every time.

Summary: Key Takeaways

✅ What You Learned

  1. Context Window: LLMs have a sliding window of recent messages they can “see”
  2. Attention Mechanism: Not all words are equally important - attention helps focus on relevant tokens
  3. Training vs Runtime: Learning happens during training, not during conversations
  4. No Persistent Memory: Each conversation is completely isolated - no memory between sessions

🎯 Real-World Implications

  • Token Limits Matter: Long conversations eventually lose early context
  • Can’t Learn New Facts: You can’t teach an LLM new information during a chat
  • Reset Each Time: Starting a new conversation = starting from scratch
  • Context is Everything: All the model knows is what’s in the current conversation

🚀 Advanced Topics (Not Covered Here)

  • Embeddings: How text is converted to numbers
  • Transformer Architecture: The neural network structure
  • Fine-tuning: Specialized training for specific tasks
  • RAG (Retrieval Augmented Generation): Adding external knowledge
  • Vector Databases: Long-term storage solutions

Want to learn more? Try modifying the demos, experimenting with different context window sizes, or building your own LLM memory visualizations!