Summary of "Integrating ChatGPT into CI/CD Pipelines | DevOpsCon"
Summary: Integrating ChatGPT into CI/CD Pipelines (DevOpsCon)
The talk describes how Neil Coen (SAP Giga, CI/CD lead) integrated an AI system (ChatGPT) into a large-scale CI/CD setup to reduce DevOps support noise, especially for build/deployment failures in a microservices environment.
1) Motivation: reduce CI/CD “support issues”
Although the CI/CD and deployment process is owned end-to-end by developers (they trigger deployments to production), DevOps still supports:
- CI/CD pipelines and tooling
- Infrastructure usage (e.g., Jenkins/TeamCity, binaries/artifacts)
- Security/static analysis tooling
- Example tools: Checkmarx, SonarQube, Black Duck, dependency scanning (via “Maven dependencies tools” / known equivalents)
- Deployment troubleshooting
- Runtime access to logs and error analysis
Pain points included:
- Too many support requests when pipelines fail (broken builds, failing scans, integration issues)
- Hard-to-interpret failures in large console logs
- Need to support junior/inexperienced developers and cases where developers don’t read logs properly
- DevOps time/resources getting consumed by “support mode,” limiting improvements to pipelines
2) Production/CI-CD environment (SAP Giga)
Scale and architecture:
- ~300 microservices
- ~800 production deployments per month
- ~150,000 monthly builds/jobs on TeamCity/Jenkins
- Kubernetes at scale:
- ~30 Kubernetes clusters
- ~650 nodes
- Also uses VMs:
- ~6,500 VMs
- 80 build agents, including Kubernetes/Docker-based agents that spin up and end
- Tech variety across services:
- Primarily .NET Core (Docker + Kubernetes)
- Also Java/Scala, Maven, npm, Python, etc.
- Cloud/data centers:
- AWS, Azure, and Alibaba Cloud (APAC/US/Europe)
3) CI/CD workflow (high level)
Inputs/flow:
- Developers push code to GitLab (hosted instance).
- CI runs primarily on Jenkins (TeamCity exists as “legacy”).
- Deployments are Kubernetes-based using Helm.
- Build artifacts go to registries (e.g., Nexus/Artifactory/Harbor and Docker registries).
- Security checks/static analysis run (e.g., SonarQube and dependency scanning).
- Notifications:
- Slack alerts to developers
- Reporting in Kibana/Elastic
Deployment model:
- They run Continuous Delivery (developers choose and click to deploy; DevOps doesn’t deploy directly).
- Two deployment paths:
- Jenkins deployment pipeline
- In-house deployment tool
- Sanity/application tests run around deployment.
- Deployment failure triggers auto-rollback to the previous version.
- Typical deployment sanity cycle: ~5 minutes
4) Microservices support model (how issues are handled)
- No ticketing system
- Instead, a public Slack channel monitored by on-call DevOps:
- Shared visibility for everyone (status and ongoing issues)
- Easier ad-hoc communication
- Interactive help (developers can answer each other)
- Enables bots (e.g., routing to on-call, after-hours messages, suggestions if logs/links are provided)
Common support scenarios:
- Pipeline/tool/process bugs
- Junior developer guidance (how to read logs)
- “RTFM”-type issues (developers not reading logs)
- Complicated troubleshooting requiring CI/CD expertise
5) ChatGPT integration design
Goal: when Jenkins detects an actionable failure, send that failure to an AI service that returns a developer-friendly explanation and fix guidance.
Architecture described:
- A Jenkins pipeline fails → Jenkins shared logic decides whether the failure is “AI-worthy.”
- If yes, Jenkins sends the error to an internal Python server running on Kubernetes.
- The Python service calls ChatGPT and returns results to Slack.
Practical behavior:
- Developers receive Slack messages explaining the cause and suggesting the fix.
- Example: intentionally breaking a Java/Maven build (e.g., missing semicolon)
- Jenkins fails
- AI explains what’s wrong and what to do
- Devs don’t need to ask DevOps
6) Key technical challenges (limitations of the solution)
-
Extracting the real error from huge logs
- Humans can spot signal vs noise; automation is harder.
- Approach:
- Wrap commands in
try/catch - Append identifiable markers/messages so the pipeline can locate the likely relevant error line
- Scan for tool-specific failure patterns (e.g., around Checkmarx scan failures) to decide what to forward to AI
- Wrap commands in
-
Data sensitivity / preventing leakage
- Concern: customer data and secrets might appear in build output.
- They filter/block what gets sent using regex-like checks (e.g., detecting “password” patterns).
- Still imperfect; they block sending when sensitive content is suspected.
-
Avoid sending irrelevant failures
- Many Jenkins failures are generic (e.g., wrapper errors like failures that “start with …”).
- If sent, ChatGPT may produce unhelpful output.
- They improved relevancy over time (claimed reduction from ~60–70% irrelevant down to ~20%).
-
Internal tooling failures
- If an internal tool/process fails, ChatGPT may not understand it.
- They try to catch/label internal-tool-specific errors and sometimes provide “alternative solutions” without involving ChatGPT.
-
Measuring effectiveness
- Monitor whether support tickets/messages drop and whether AI explanations match reality.
- Evaluate outputs by reviewing Slack failure reports and developer questions based on the AI-provided content.
7) Governance: why developers don’t manage pipelines directly
- DevOps maintains shared Jenkins libraries and pipeline steps; developers submit changes via merge requests.
- Reason: developers previously had full permissions and sometimes skipped critical steps (e.g., security scans) to unblock builds, causing production deployments to miss security gates.
- This constraint is described as “not ideal,” but it ensures security/scans remain mandatory.
8) Tooling answers from Q&A (selected points)
- Slack routing:
- Slack channels are mapped per service/team using an internal database (service name, repo, owner/team, pipeline type, and Slack channel).
- Jenkins plugin for log analysis:
- They didn’t rely on older Jenkins plugins (some were outdated/unmaintained).
- Integration is HTTP-based communication via an internal Python service rather than a Jenkins plugin specific to ChatGPT.
- Relevance/retry behavior:
- They didn’t emphasize retry de-duplication; they mainly handle first/last relevant failure based on their capture strategy.
- Team size:
- Neil’s CI/CD DevOps team is ~5 people (within a larger DevOps structure).
Main speaker/source
- Neil Coen — CI/CD team lead, SAP Giga, Israel
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.