Summary of "Integrating ChatGPT into CI/CD Pipelines | DevOpsCon"

Summary: Integrating ChatGPT into CI/CD Pipelines (DevOpsCon)

The talk describes how Neil Coen (SAP Giga, CI/CD lead) integrated an AI system (ChatGPT) into a large-scale CI/CD setup to reduce DevOps support noise, especially for build/deployment failures in a microservices environment.

1) Motivation: reduce CI/CD “support issues”

Although the CI/CD and deployment process is owned end-to-end by developers (they trigger deployments to production), DevOps still supports:

CI/CD pipelines and tooling
Infrastructure usage (e.g., Jenkins/TeamCity, binaries/artifacts)
Security/static analysis tooling
- Example tools: Checkmarx, SonarQube, Black Duck, dependency scanning (via “Maven dependencies tools” / known equivalents)
Deployment troubleshooting
- Runtime access to logs and error analysis

Pain points included:

Too many support requests when pipelines fail (broken builds, failing scans, integration issues)
Hard-to-interpret failures in large console logs
Need to support junior/inexperienced developers and cases where developers don’t read logs properly
DevOps time/resources getting consumed by “support mode,” limiting improvements to pipelines

2) Production/CI-CD environment (SAP Giga)

Scale and architecture:

~300 microservices
~800 production deployments per month
~150,000 monthly builds/jobs on TeamCity/Jenkins
Kubernetes at scale:
- ~30 Kubernetes clusters
- ~650 nodes
Also uses VMs:
- ~6,500 VMs
80 build agents, including Kubernetes/Docker-based agents that spin up and end
Tech variety across services:
- Primarily .NET Core (Docker + Kubernetes)
- Also Java/Scala, Maven, npm, Python, etc.
Cloud/data centers:
- AWS, Azure, and Alibaba Cloud (APAC/US/Europe)

3) CI/CD workflow (high level)

Inputs/flow:

Developers push code to GitLab (hosted instance).
CI runs primarily on Jenkins (TeamCity exists as “legacy”).
Deployments are Kubernetes-based using Helm.
Build artifacts go to registries (e.g., Nexus/Artifactory/Harbor and Docker registries).
Security checks/static analysis run (e.g., SonarQube and dependency scanning).
Notifications:
- Slack alerts to developers
- Reporting in Kibana/Elastic

Deployment model:

They run Continuous Delivery (developers choose and click to deploy; DevOps doesn’t deploy directly).
Two deployment paths:
- Jenkins deployment pipeline
- In-house deployment tool
Sanity/application tests run around deployment.
Deployment failure triggers auto-rollback to the previous version.
Typical deployment sanity cycle: ~5 minutes

4) Microservices support model (how issues are handled)

No ticketing system
Instead, a public Slack channel monitored by on-call DevOps:
- Shared visibility for everyone (status and ongoing issues)
- Easier ad-hoc communication
- Interactive help (developers can answer each other)
- Enables bots (e.g., routing to on-call, after-hours messages, suggestions if logs/links are provided)

Common support scenarios:

Pipeline/tool/process bugs
Junior developer guidance (how to read logs)
“RTFM”-type issues (developers not reading logs)
Complicated troubleshooting requiring CI/CD expertise

5) ChatGPT integration design

Goal: when Jenkins detects an actionable failure, send that failure to an AI service that returns a developer-friendly explanation and fix guidance.

Architecture described:

A Jenkins pipeline fails → Jenkins shared logic decides whether the failure is “AI-worthy.”
If yes, Jenkins sends the error to an internal Python server running on Kubernetes.
The Python service calls ChatGPT and returns results to Slack.

Practical behavior:

Developers receive Slack messages explaining the cause and suggesting the fix.
Example: intentionally breaking a Java/Maven build (e.g., missing semicolon)
- Jenkins fails
- AI explains what’s wrong and what to do
- Devs don’t need to ask DevOps

6) Key technical challenges (limitations of the solution)

Extracting the real error from huge logs
- Humans can spot signal vs noise; automation is harder.
- Approach:
  - Wrap commands in try/catch
  - Append identifiable markers/messages so the pipeline can locate the likely relevant error line
  - Scan for tool-specific failure patterns (e.g., around Checkmarx scan failures) to decide what to forward to AI
Data sensitivity / preventing leakage
- Concern: customer data and secrets might appear in build output.
- They filter/block what gets sent using regex-like checks (e.g., detecting “password” patterns).
- Still imperfect; they block sending when sensitive content is suspected.
Avoid sending irrelevant failures
- Many Jenkins failures are generic (e.g., wrapper errors like failures that “start with …”).
- If sent, ChatGPT may produce unhelpful output.
- They improved relevancy over time (claimed reduction from ~60–70% irrelevant down to ~20%).
Internal tooling failures
- If an internal tool/process fails, ChatGPT may not understand it.
- They try to catch/label internal-tool-specific errors and sometimes provide “alternative solutions” without involving ChatGPT.
Measuring effectiveness
- Monitor whether support tickets/messages drop and whether AI explanations match reality.
- Evaluate outputs by reviewing Slack failure reports and developer questions based on the AI-provided content.

7) Governance: why developers don’t manage pipelines directly

DevOps maintains shared Jenkins libraries and pipeline steps; developers submit changes via merge requests.
Reason: developers previously had full permissions and sometimes skipped critical steps (e.g., security scans) to unblock builds, causing production deployments to miss security gates.
This constraint is described as “not ideal,” but it ensures security/scans remain mandatory.

8) Tooling answers from Q&A (selected points)

Slack routing:
- Slack channels are mapped per service/team using an internal database (service name, repo, owner/team, pipeline type, and Slack channel).
Jenkins plugin for log analysis:
- They didn’t rely on older Jenkins plugins (some were outdated/unmaintained).
- Integration is HTTP-based communication via an internal Python service rather than a Jenkins plugin specific to ChatGPT.
Relevance/retry behavior:
- They didn’t emphasize retry de-duplication; they mainly handle first/last relevant failure based on their capture strategy.
Team size:
- Neil’s CI/CD DevOps team is ~5 people (within a larger DevOps structure).