Summary of "Red Teaming AI: OWASP LLM Top 10 with Brian and Derek"

High-level overview

Webinar covering OWASP (OAS) “Top 10 for LLM applications” with a focus on generative LLMs.
Practical primer on LLM security, red‑teaming, and defenses.
Presenters cover: LLM basics, safety vs security distinctions, the OAS Top 10 categories with attack examples and defenses, red‑team methodology, tools, and training/CTF opportunities.

LLM fundamentals

Key technical points:

Transformer architecture and the attention mechanism (Attention Is All You Need) enable context-aware token prediction.
Models are trained on vast token corpora; weights are stored as very large parameter matrices (billions → trillions of parameters).
Training and inference require massive GPU compute, energy, and cost; inference at scale requires many GPUs/instances per user session.
Tokens (words or subword pieces) are embedded into vectors; next-token prediction produces outputs.
Two common deployment paradigms:
- Chatbots: user prompt + system prompt (system prompt prepended to control behavior); optional tool integrations and RAG (retrieval‑augmented generation) to provide up‑to‑date or domain documents.
- Agents: systems that perceive, plan, and take actions (scripts/skills that loop: plan → act → observe → (optional human feedback)). Agents greatly increase the risk surface.

Safety vs Security

Safety issues: alignment (does the model do the intended job), bias/fairness, harmful content, and confabulations (hallucinations/inaccurate outputs).
Security issues: sensitive information disclosure, excessive agency (too much autonomy), model/data poisoning, supply‑chain compromises, unbounded consumption (cost/DoS), leakage of system prompts, and vector/embedding attacks.

OWASP LLM Top‑10 (summary with examples and defenses)

Prompt injection
- What: Attacker injects instructions to override model/system prompt or behavior (direct via chat or indirect via uploaded docs/URLs).
- Defenses: Input/output validation and filtering, system prompt hardening, multiple model architectures, delimiters, human‑in‑the‑loop for high‑risk actions, and layered defenses (no single fix).
Sensitive information disclosure
- What: LLM returns secrets/PII/API keys or data it should not reveal (often an outcome rather than a single bug).
- Defenses: Don’t store secrets in system prompts, isolate access, DLP/output filtering, strict data access controls and data classification, and minimal privileges.
Supply‑chain compromise
- What: Malicious third‑party models, plugins/skills, contaminated model files (e.g., pickle backdoors), or compromised integrations.
- Defenses: Maintain a software bill of materials (SBOM) for AI components, verify model hashes/signatures, vet third‑party plugins, and restrict unapproved tools (provide safe alternatives).
Model/data poisoning
- What: Adversarially injecting data into training/fine‑tuning sets or tampering with weights to change model behavior; poisoning RAG/vector stores.
- Defenses: Vet training/fine‑tune data, control/monitor write access to vector DBs/RAG sources, validate ingested sources, and use ground‑truth grounding.
Improper output handling
- What: Treating LLM outputs as trusted—rendering as HTML, executing shell/database commands—leading to XSS/RCE/SQLi‑type issues.
- Defenses: Always treat outputs as untrusted, sanitize/escape outputs, do not auto‑execute generated code/queries, and follow standard app security hygiene.
Excessive agency
- What: Granting LLMs/agents too much autonomy (sending emails, deleting files, making DB transactions).
- Defenses: Principle of least privilege, human approval for high‑impact actions, audit trails, and activity monitoring.
System prompt leakage
- What: System prompts (which may include policies or sensitive info) leak via errors, prompts, or prompt injection.
- Defenses: Treat system prompts as presumptively public; avoid putting secrets there and design minimal, hardened system prompts.
Vector & embedding weaknesses (RAG/vector DB attacks)
- What: Attackers probe or poison vector stores to exfiltrate or manipulate retrieved content, recover sensitive chunks, or confuse retrievals.
- Defenses: Strict access control to vector DBs, monitoring for abnormal retrieval patterns, RAG grounding (source‑of‑truth checks), and isolation between tenants/users.
Unbounded consumption
- What: Attackers intentionally drive massive compute/requests to rack up costs or cause DoS (especially against pay‑as‑you‑go hosting).
- Defenses: Rate limiting, cost monitoring and anomaly alerts, quotas, usage billing controls; coordinate with customers/test targets before heavy testing.
Traditional issues still matter
- What: LLM systems are still web apps; standard vulnerabilities (auth, web flaws, misconfigurations) apply and often are root causes.
- Defenses: Maintain standard application security practices in addition to LLM-specific controls.

Red‑team methodology and testing tips

Follow traditional assessment phases: reconnaissance (map attack surface the LLM touches), threat modeling (what can it access/do), design payloads, iterative testing, and document/report findings.
Important differences: LLMs are non‑deterministic — retry prompts many times and replicate conditions.
Probe indirect injection vectors (documents, plugins, RAG) in addition to direct chat inputs.
Don’t forget classic app security testing: many findings are in the surrounding infrastructure.

Defensive architecture recommendations (defense‑in‑depth)

Input: Validate and limit user prompt length/content; enforce role separation.
Model: System prompt hardening; refusal fine‑tuning where practical.
Output: Filters, DLP, PII scrubbing, and sanitization.
Integration: Least privilege for actions, human approvals for dangerous operations, audit logs and tracing.
Operational: Data classification and governance, SBOM and hash checks for models, vet plugins, monitoring/logging, workload cost controls, and rate limits.

Tools & platforms (practical starting points)

Agent frameworks and code interpreters: Anthropic Claude, OpenAI (and their agent/code features), and open‑source agent frameworks.
RAG and embedding stores: Vector DBs—ensure access control and vetting of content.
Security testing tools: Automated vulnerability/jailbreak scanners (example referenced as “Deep Team”), Burp Suite integrations, and community scripts.

Note: Some tool names mentioned in the webcast/subtitles may have inaccuracies—treat names as examples to research.
Monitoring/logging: Many vendor previews lack enterprise logging; consider adding logging as a skill or instrumenting the pipeline yourself.

Training, tutorials, and hands‑on opportunities

Short workshop: 4‑hour hands‑on workshop (~$25) with CTF challenges (11 CTF‑style flags) to practice prompt‑injection and other OAS categories.
Full course: Two‑day in‑depth training covering tooling, defensive practice, agent topics, enterprise infrastructure (e.g., AWS Bedrock, Azure Foundry), and red‑team workflows.
CTF and community: Live CTF event tied to the webcast; prizes and training giveaways.
On‑demand material: ~7.5 hours of recorded content available; weekly Black Hills Information Security podcast covering AI topics and interviews.

Practical takeaways

Prompt injection and RAG/vector attacks are the most immediate and recurring threats.
Defense is layered; there is no single fix. System prompt hardening, access controls, DLP, and human review for high‑risk actions are essential.
Classic cybersecurity skills remain crucial—LLMs add new layers but don’t remove prior attack vectors.
Red‑teaming LLMs requires iterative, multi‑vector testing because of nondeterminism and indirect injection surfaces.