Summary of "Red Teaming AI: OWASP LLM Top 10 with Brian and Derek"
High-level overview
- Webinar covering OWASP (OAS) “Top 10 for LLM applications” with a focus on generative LLMs.
- Practical primer on LLM security, red‑teaming, and defenses.
- Presenters cover: LLM basics, safety vs security distinctions, the OAS Top 10 categories with attack examples and defenses, red‑team methodology, tools, and training/CTF opportunities.
LLM fundamentals
Key technical points:
- Transformer architecture and the attention mechanism (Attention Is All You Need) enable context-aware token prediction.
- Models are trained on vast token corpora; weights are stored as very large parameter matrices (billions → trillions of parameters).
- Training and inference require massive GPU compute, energy, and cost; inference at scale requires many GPUs/instances per user session.
- Tokens (words or subword pieces) are embedded into vectors; next-token prediction produces outputs.
- Two common deployment paradigms:
- Chatbots: user prompt + system prompt (system prompt prepended to control behavior); optional tool integrations and RAG (retrieval‑augmented generation) to provide up‑to‑date or domain documents.
- Agents: systems that perceive, plan, and take actions (scripts/skills that loop: plan → act → observe → (optional human feedback)). Agents greatly increase the risk surface.
Safety vs Security
- Safety issues: alignment (does the model do the intended job), bias/fairness, harmful content, and confabulations (hallucinations/inaccurate outputs).
- Security issues: sensitive information disclosure, excessive agency (too much autonomy), model/data poisoning, supply‑chain compromises, unbounded consumption (cost/DoS), leakage of system prompts, and vector/embedding attacks.
OWASP LLM Top‑10 (summary with examples and defenses)
-
Prompt injection
- What: Attacker injects instructions to override model/system prompt or behavior (direct via chat or indirect via uploaded docs/URLs).
- Defenses: Input/output validation and filtering, system prompt hardening, multiple model architectures, delimiters, human‑in‑the‑loop for high‑risk actions, and layered defenses (no single fix).
-
Sensitive information disclosure
- What: LLM returns secrets/PII/API keys or data it should not reveal (often an outcome rather than a single bug).
- Defenses: Don’t store secrets in system prompts, isolate access, DLP/output filtering, strict data access controls and data classification, and minimal privileges.
-
Supply‑chain compromise
- What: Malicious third‑party models, plugins/skills, contaminated model files (e.g., pickle backdoors), or compromised integrations.
- Defenses: Maintain a software bill of materials (SBOM) for AI components, verify model hashes/signatures, vet third‑party plugins, and restrict unapproved tools (provide safe alternatives).
-
Model/data poisoning
- What: Adversarially injecting data into training/fine‑tuning sets or tampering with weights to change model behavior; poisoning RAG/vector stores.
- Defenses: Vet training/fine‑tune data, control/monitor write access to vector DBs/RAG sources, validate ingested sources, and use ground‑truth grounding.
-
Improper output handling
- What: Treating LLM outputs as trusted—rendering as HTML, executing shell/database commands—leading to XSS/RCE/SQLi‑type issues.
- Defenses: Always treat outputs as untrusted, sanitize/escape outputs, do not auto‑execute generated code/queries, and follow standard app security hygiene.
-
Excessive agency
- What: Granting LLMs/agents too much autonomy (sending emails, deleting files, making DB transactions).
- Defenses: Principle of least privilege, human approval for high‑impact actions, audit trails, and activity monitoring.
-
System prompt leakage
- What: System prompts (which may include policies or sensitive info) leak via errors, prompts, or prompt injection.
- Defenses: Treat system prompts as presumptively public; avoid putting secrets there and design minimal, hardened system prompts.
-
Vector & embedding weaknesses (RAG/vector DB attacks)
- What: Attackers probe or poison vector stores to exfiltrate or manipulate retrieved content, recover sensitive chunks, or confuse retrievals.
- Defenses: Strict access control to vector DBs, monitoring for abnormal retrieval patterns, RAG grounding (source‑of‑truth checks), and isolation between tenants/users.
-
Unbounded consumption
- What: Attackers intentionally drive massive compute/requests to rack up costs or cause DoS (especially against pay‑as‑you‑go hosting).
- Defenses: Rate limiting, cost monitoring and anomaly alerts, quotas, usage billing controls; coordinate with customers/test targets before heavy testing.
-
Traditional issues still matter
- What: LLM systems are still web apps; standard vulnerabilities (auth, web flaws, misconfigurations) apply and often are root causes.
- Defenses: Maintain standard application security practices in addition to LLM-specific controls.
Red‑team methodology and testing tips
- Follow traditional assessment phases: reconnaissance (map attack surface the LLM touches), threat modeling (what can it access/do), design payloads, iterative testing, and document/report findings.
- Important differences: LLMs are non‑deterministic — retry prompts many times and replicate conditions.
- Probe indirect injection vectors (documents, plugins, RAG) in addition to direct chat inputs.
- Don’t forget classic app security testing: many findings are in the surrounding infrastructure.
Defensive architecture recommendations (defense‑in‑depth)
- Input: Validate and limit user prompt length/content; enforce role separation.
- Model: System prompt hardening; refusal fine‑tuning where practical.
- Output: Filters, DLP, PII scrubbing, and sanitization.
- Integration: Least privilege for actions, human approvals for dangerous operations, audit logs and tracing.
- Operational: Data classification and governance, SBOM and hash checks for models, vet plugins, monitoring/logging, workload cost controls, and rate limits.
Tools & platforms (practical starting points)
- Agent frameworks and code interpreters: Anthropic Claude, OpenAI (and their agent/code features), and open‑source agent frameworks.
- RAG and embedding stores: Vector DBs—ensure access control and vetting of content.
-
Security testing tools: Automated vulnerability/jailbreak scanners (example referenced as “Deep Team”), Burp Suite integrations, and community scripts.
Note: Some tool names mentioned in the webcast/subtitles may have inaccuracies—treat names as examples to research.
-
Monitoring/logging: Many vendor previews lack enterprise logging; consider adding logging as a skill or instrumenting the pipeline yourself.
Training, tutorials, and hands‑on opportunities
- Short workshop: 4‑hour hands‑on workshop (~$25) with CTF challenges (11 CTF‑style flags) to practice prompt‑injection and other OAS categories.
- Full course: Two‑day in‑depth training covering tooling, defensive practice, agent topics, enterprise infrastructure (e.g., AWS Bedrock, Azure Foundry), and red‑team workflows.
- CTF and community: Live CTF event tied to the webcast; prizes and training giveaways.
- On‑demand material: ~7.5 hours of recorded content available; weekly Black Hills Information Security podcast covering AI topics and interviews.
Practical takeaways
- Prompt injection and RAG/vector attacks are the most immediate and recurring threats.
- Defense is layered; there is no single fix. System prompt hardening, access controls, DLP, and human review for high‑risk actions are essential.
- Classic cybersecurity skills remain crucial—LLMs add new layers but don’t remove prior attack vectors.
- Red‑teaming LLMs requires iterative, multi‑vector testing because of nondeterminism and indirect injection surfaces.
Main speakers / sources
- Brian Fairman
- Derek Banks
- Host/organizer: Black Hills Information Security (webcast/podcast team)
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.