Summary of "[1hr Talk] Intro to Large Language Models"

Intro — overview / main ideas

A large language model (LLM) is conceptually simple but practically powerful: the runnable model consists of (1) a parameters/weights file and (2) code that runs them. The “magic” is in the weights produced by expensive training.
Training compresses (lossily) a large chunk of internet text into a model that predicts next tokens; that next-token objective implicitly encodes a lot of world knowledge.
Practical training typically has two main stages (three if you include RLHF):
1. Pre‑training (internet-scale next-token prediction).
2. Fine‑tuning (supervised alignment to make an “assistant”).
3. Optional: comparison/RLHF stage to further improve behavior.
LLMs are evolving from pure language samplers into tool-using, multimodal systems that act more like an OS/kernel orchestrating tools, files, and compute.
Scaling laws make performance predictable from parameter count and training data; this drives the industry push to bigger models and more data.
Significant security and safety challenges exist (jailbreaks, prompt injection, data poisoning), and defenses are a continual cat-and-mouse process.

Weights are like a lossy “zip” of the internet: much smaller than the raw data but lossy (not verbatim). The model’s core task is next-token prediction, and improving that objective tends to improve many downstream capabilities.

Key technical facts and examples

Example model: Llama 2 70B (Meta)
- Parameters: ~70 billion
- Stored as float16 → ~140 GB weights (2 bytes per parameter)
Running the model (inference) can be done with just the two files and a small program (e.g., ~500 lines of C). For smaller models this can run locally. Training to obtain the weights is the costly part.
Typical training scale (Llama-like big runs, example numbers):
- Data: ~10 TB of text
- Compute: thousands of GPUs (e.g., ~6,000) for many days (e.g., ~12 days)
- Cost: multi‑million dollar runs (order-of-magnitude dependent on model & scale)
Core facts:
- Next-token prediction is the training objective; better next-token accuracy correlates with improved downstream performance.
- Transformers are fully specified mathematically, but the function of billions of learned parameters remains largely inscrutable; interpretability research is ongoing.

Detailed methodology — training & deployment workflow

Pre‑training (base model)
- Collect a very large corpus (web crawl, books, code, etc.), typically many terabytes of text.
- Configure model size (parameter count) and training compute budget.
- Train on next-token prediction to obtain base weights (very compute- and data‑intensive; often done rarely by large organizations).
- Outcome: a base model that stores knowledge but behaves like an internet text generator (not a helpful assistant out-of-the-box).
Fine‑tuning (to make an assistant)
- Create a labeled dataset of high-quality Q&A or dialogues (quality over quantity; examples ~100k discussed).
- Labelers follow instruction docs (e.g., “be helpful, truthful, harmless”) to craft ideal responses in the desired assistant style.
- Fine-tune the base model on these supervised examples (computationally cheaper than pre-training; can iterate frequently).
- Outcome: an assistant model that answers in the expected helpful format while leveraging pre-trained knowledge.
Optional comparison / RLHF (stage 3)
- Generate multiple candidate responses for prompts.
- Human labelers rank or compare candidates (often easier than writing best answers from scratch).
- Train a reward model from these comparisons and apply reinforcement learning (RLHF) to optimize toward preferred outputs.
- Outcome: further behavior shaping and improved alignment.
Iterative monitoring and improvement
- Deploy the model, monitor for misbehaviors, and collect failing examples.
- For each misbehavior, add corrected examples to training data and re‑fine‑tune (fine‑tuning is cheaper so iteration is fast).
- Optionally use human+model collaboration to accelerate labeling (models draft answers; humans curate).

Tool-use pattern (example workflow)

Modern assistants often chain external tools for complex tasks rather than relying solely on internal knowledge:
- Browser/search → fetch up-to-date info.
- Calculator or Python interpreter → perform precise computation.
- Plotting libraries → generate charts.
- Image generators (DALL·E, etc.) → create illustrations.

Example illustrated in the talk: ask an assistant to collect Scale AI funding rounds → browser fetches data → calculator computes missing valuations → Python plots results → DALL·E generates a related image. This demonstrates chaining of tools.

Capabilities & trends

Multimodality: models can see and generate images, handle audio (speech-to-text and text-to-speech), and integrate modalities for richer tasks (e.g., generating code from sketches).
System‑1 vs System‑2 thinking: current LLMs are “system‑1” (fast, pattern-based, token-by-token). Research aims to enable “system‑2” behaviors (deliberative, multi-step reasoning, tree-of-thoughts, explicit time-for-accuracy tradeoffs).
Self‑improvement: inspired by AlphaGo’s self-play, researchers seek ways for LLMs to improve beyond human examples, but a general automated reward function is lacking outside narrow domains.
Customization: domain/expert variants (e.g., GPTs/App Store, retrieval-augmented generation using user uploads) produce specialized experts for tasks.
Ecosystem: proprietary models (GPT‑4, Claude, etc.) lead benchmarks; open-weight models (Llama 2, Mistral-derived, Zephyr, etc.) are improving and enable research and customization.

Security, safety, and robustness issues

LLMs face many novel attack vectors and robustness challenges. Defenses exist but attackers continually adapt.

Jailbreaks / roleplay
- Prompting the model to roleplay a character to circumvent safety filters (e.g., “act as a deceased expert” to elicit harmful instructions).
Encoded/alternative-language bypasses
- Unsafe instructions encoded in base64 or other encodings can evade filters trained mostly on English refusals.
Universal adversarial suffixes
- Token sequences (suffixes) found by researchers that override safety when appended—analogous to adversarial examples for text.
Image-based adversarial patterns
- Optimized visual noise added to an image that triggers unsafe behavior when the model processes the image.
Prompt injection (web/page/doc attacks)
- Malicious webpages or shared documents include hidden instructions a browsing-enabled LLM will ingest and obey (e.g., a Google Doc containing hidden instructions to exfiltrate data).
- Attackers can leverage platform features (Apps Script, embedded content) to route data to attacker-controlled endpoints.
Data poisoning / backdoor triggers
- If attackers control some training or fine-tuning text, they can embed trigger phrases (backdoors) that cause incorrect or malicious behavior when the phrase appears at inference time.
Defenses and dynamics
- Multilingual safety data, filtering, content-security policies, platform controls, and other mitigations exist, but attacks and defenses iterate in a cat-and-mouse dynamic.

High-level lessons & takeaways

LLMs are powerful, general-purpose prediction machines born from next-token pre-training plus human-guided alignment. Think of them as emerging computation kernels that orchestrate tools and data rather than only as “chatbots.”
Building useful assistants requires both a huge compute/data investment (pre-training) and careful human-driven alignment and iterative evaluation (fine-tuning + RLHF).
Scaling (more parameters + more data) reliably improves next-token accuracy and often improves capabilities; scaling laws drive industry trends.
Multimodality and tool use are significant capability multipliers.
Safety and security are central concerns; many attack vectors are novel and require ongoing research and engineering.
Open-source models lower barriers to experimentation and customization, while closed proprietary models currently lead in raw performance.

Speakers, organizations, tools, and research referenced

Speaker: single presenter (the talk’s presenter).
Models / organizations / tools:
- Meta — Llama 2 (7B / 13B / 34B / 70B)
- OpenAI — ChatGPT, GPT‑4, InstructGPT, DALL·E, RLHF work
- Anthropic — Claude
- Scale AI — event host, example company, labeling/data provider
- Berkeley team — Chatbot Arena leaderboard
- Mistral / Zephyr (open-weight model references)
- Greg Brockman (OpenAI demo referencing multimodality/code-from-sketch)
- Sam Altman (GPTs / customization announcement)
- DeepMind — AlphaGo (self‑improvement example)
- Google Bard (prompt-injection and Google Docs example)
Research/papers/attacks referenced (generally):
- Llama 2 research/release
- InstructGPT paper (labeling instructions: helpful/truthful/harmless)
- Papers/demos on jailbreaks, universal adversarial suffixes, image-based adversarial triggers, prompt-injection, and data-poisoning/backdoor attacks