Summary of "Red Teaming AI: OWASP LLM Top 10 with Brian and Derek"

High-level overview

LLM fundamentals

Key technical points:

Safety vs Security

OWASP LLM Top‑10 (summary with examples and defenses)

  1. Prompt injection

    • What: Attacker injects instructions to override model/system prompt or behavior (direct via chat or indirect via uploaded docs/URLs).
    • Defenses: Input/output validation and filtering, system prompt hardening, multiple model architectures, delimiters, human‑in‑the‑loop for high‑risk actions, and layered defenses (no single fix).
  2. Sensitive information disclosure

    • What: LLM returns secrets/PII/API keys or data it should not reveal (often an outcome rather than a single bug).
    • Defenses: Don’t store secrets in system prompts, isolate access, DLP/output filtering, strict data access controls and data classification, and minimal privileges.
  3. Supply‑chain compromise

    • What: Malicious third‑party models, plugins/skills, contaminated model files (e.g., pickle backdoors), or compromised integrations.
    • Defenses: Maintain a software bill of materials (SBOM) for AI components, verify model hashes/signatures, vet third‑party plugins, and restrict unapproved tools (provide safe alternatives).
  4. Model/data poisoning

    • What: Adversarially injecting data into training/fine‑tuning sets or tampering with weights to change model behavior; poisoning RAG/vector stores.
    • Defenses: Vet training/fine‑tune data, control/monitor write access to vector DBs/RAG sources, validate ingested sources, and use ground‑truth grounding.
  5. Improper output handling

    • What: Treating LLM outputs as trusted—rendering as HTML, executing shell/database commands—leading to XSS/RCE/SQLi‑type issues.
    • Defenses: Always treat outputs as untrusted, sanitize/escape outputs, do not auto‑execute generated code/queries, and follow standard app security hygiene.
  6. Excessive agency

    • What: Granting LLMs/agents too much autonomy (sending emails, deleting files, making DB transactions).
    • Defenses: Principle of least privilege, human approval for high‑impact actions, audit trails, and activity monitoring.
  7. System prompt leakage

    • What: System prompts (which may include policies or sensitive info) leak via errors, prompts, or prompt injection.
    • Defenses: Treat system prompts as presumptively public; avoid putting secrets there and design minimal, hardened system prompts.
  8. Vector & embedding weaknesses (RAG/vector DB attacks)

    • What: Attackers probe or poison vector stores to exfiltrate or manipulate retrieved content, recover sensitive chunks, or confuse retrievals.
    • Defenses: Strict access control to vector DBs, monitoring for abnormal retrieval patterns, RAG grounding (source‑of‑truth checks), and isolation between tenants/users.
  9. Unbounded consumption

    • What: Attackers intentionally drive massive compute/requests to rack up costs or cause DoS (especially against pay‑as‑you‑go hosting).
    • Defenses: Rate limiting, cost monitoring and anomaly alerts, quotas, usage billing controls; coordinate with customers/test targets before heavy testing.
  10. Traditional issues still matter

    • What: LLM systems are still web apps; standard vulnerabilities (auth, web flaws, misconfigurations) apply and often are root causes.
    • Defenses: Maintain standard application security practices in addition to LLM-specific controls.

Red‑team methodology and testing tips

Defensive architecture recommendations (defense‑in‑depth)

Tools & platforms (practical starting points)

Training, tutorials, and hands‑on opportunities

Practical takeaways

Main speakers / sources

Category ?

Technology


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video