Summary of "#17: Testing inthe AI era"

Core thesis

AI is changing how we build and test software. Code generation is fast and cheap, increasing the volume of code and shifting the emphasis toward disciplined testing and careful architecture. Testing is moving from a separate verification step toward being a way to encode and enforce requirements directly in code.

Key technological concepts and practices

Tests as encoded requirements Tests—especially unit and integration tests—are often the canonical encoding of requirements. With AI, the line between “coding” and “testing” blurs: writing tests is frequently part of the programming activity rather than a distinct role.
Architecture-first guidance Before mass-generating code with AI, invest in architecture (clear interfaces, domain-driven design, small classes, public/private boundaries). A well-structured codebase produces better, easier-to-maintain tests when AI generates test code.
TDD & AI Traditional TDD (write a failing test, then the minimal code) can be awkward with AI. TDD-style prompts can give AI a head start, but forcing pure TDD may yield brittle, minimal implementations. A mixed approach—architecture-first, then iterative test/code—is often more effective.
Abstractions for UI tests Page objects, component abstractions, and interfaces remain critical. AI frequently produces low-level, brittle UI tests (locator/click sequences). Prompts and enforcement of higher-level abstractions make tests maintainable and reviewable.
Skepticism about “self-healing” tests Tools that auto-adjust tests to UI changes can mask bugs and create false confidence. Changes to application and tests should be synchronized and reviewed—tests represent requirements and should not silently drift.
Focus testing effort on stable seams Prioritize automated coverage for stable interfaces, APIs, and external contracts. Keep exploratory or lighter coverage for fast-changing UI/features to allow rapid iteration.
Determinism vs agentic/non-deterministic runs Agentic or LLM-driven flows can be non-deterministic. For repeatable automation, map natural-language flows into explicit, reviewable code or domain-specific steps (Cucumber-like) that can be re-run deterministically.
Human in the loop / senior oversight AI behaves like a very fast, eager junior: it produces lots of output but can make architectural and context errors. Senior engineers are needed to review, select patterns, and guard quality. Teams should raise people’s meta-skills for interacting with agents.
Prompting strategy Rather than instructing an agent to “be an expert,” provide concrete principles, examples, and constraints (e.g., “use page objects, do not test page objects directly, use interfaces, include linting and unit tests”). Include good examples inside the repo (agents.md / agents file) to teach the agent by example.
Skills vs examples Embedding concrete examples of desired patterns in your repository is often more maintainable and evolvable than hard-coded external “skills” that require continuous updates.
Randomization and fuzzing LLMs are probabilistic and biased; they are not a substitute for robust fuzzing/randomized input generation. Use AI to create scripts that produce randomized inputs or use specialized fuzzers to achieve broad coverage.
Security testing and red teaming AI is effective at some security tasks—scanning for CVEs, generating PoC exploit code, automating pentest-style checks. AI tooling in this area is advancing quickly and can augment specialized security tools (Snyk-like scanners).
Migration & maintainability When refactoring or rebuilding, create interfaces/seams and encode requirements as tests that run against both old and new implementations to verify parity. AI can speed migrations but may replicate mistakes if not guided.
Agentic workflows & auditability Running multiple agents requires tooling to record explored paths, inputs used, and coverage. Open-source initiatives are emerging to help with history, skills, and auditing of agent actions.

Tools and patterns mentioned

Playwright (UI automation) — enforce page-object patterns because AI-generated tests can be generic and brittle.
Page objects / component abstractions — continue to use them for maintainability.
Interface/API-first testing and mocking external services — use these for stable coverage.
Agentic QE Fleet (open-source repo) — example/community resource for agentic testing workflows and skills.
Security scanners / CVE tooling (Snyk-like) — supplement with AI-driven pentest experiments to produce PoC exploits and remediation suggestions.
GitHub Actions / natural-language automation — convenient but potentially non-deterministic; convert to deterministic scripts when repeatability and auditability are required.

Practical recommendations

Start with architecture and include testing requirements up-front (interfaces, unit tests, linting).
Put good examples and desired patterns in the repository (agents.md) to provide context for future AI sessions.
Use AI for tedious, repetitive work (scaffolding, boilerplate, basic UI tests) but always review generated code and tests.
Prioritize automation for stable seams (APIs, services) and use exploratory/AI-driven tests for discovery and edge cases.
Treat AI test outputs as information—require human review, reproduce findings, and attach concrete evidence (repro scripts, PoC) before acting.
Combine AI-generated inputs with randomized/fuzzing scripts to broaden coverage.
Invest in auditing, history, and coverage reporting when running agentic or long-running automated audits.

Risks and caveats

AI can produce superficially correct but poorly architected code and brittle low-level tests if not guided.
Non-deterministic agent runs complicate reproducibility, auditing, and CI/CD guarantees.
Over-reliance on AI may reduce on-the-job learning for juniors; continued investment in training is needed.
“Self-healing” fixes can mask regressions; automated fixes should be reviewed.

References and resources

Agentic QE Fleet (open-source agent/skill repo) — community resource for agentic testing.
Playwright — UI testing automation.
Security scanners and CVE tooling (e.g., Snyk-like tools) and AI-assisted pentesting experiments.
Agents.md / agents file pattern — encode principles, constraints, and examples to guide agents.

Speakers and sources

Toby Sears — Tech League podcast host
Christian Fisher — Co-host
Alan Richardson — guest; senior software developer & testing specialist; author and trainer (eviltester.com)
“Dragon” — maintainer of the Agentic QE Fleet repo and other community contributors