Summary of "ИИ-агенты — вот что действительно изменит разработку. Пишем ИИ-агент на Python, LangChain и GigaChat"
High-level summary
Thesis: LLM-based agents — systems that combine a large language model (LLM) with callable tools — are the thing that will truly change software development. Rather than replacing programmers, this paradigm empowers developers to write connectors and let models decide which code/tool to call and with what arguments.
Definition of an agent
- An agent is an LLM that:
- understands natural language,
- is given explicit tools / code (Python functions, connectors),
- decides autonomously when and how to call those tools to solve tasks and interact with the outside world.
Technical concepts and architecture
-
Core difference vs classic programs Traditional programs encode branching, loops and storage in code (ifs, loops, memory). Agents shift the decision-making into the LLM: the model interprets natural-language intent and chooses which connector/function to invoke and with which parameters.
-
Tools / connectors Small, explicit functions written by developers (typically Python) that perform deterministic work (e.g., generate a PDF, fetch email attachments, upload files to an LLM). The LLM calls these tools instead of generating everything itself.
-
Agents as business-process interfaces Business logic can be expressed as natural-language prompts while connectors implement interactions with external systems (email, databases, PDF generator, etc.). Chat serves as a lightweight UI and can replace or simplify traditional BPM systems.
-
Hallucination risk and mitigation If a single agent both searches external sources and performs downstream actions, the model can invent or alter facts. Practical mitigation: split work into simple agents chained sequentially. Example: Agent A finds a filename in mail and returns it; Agent B, given the verified local file, generates the invoice/act.
Hands-on tutorial / practical demo
Goal: build an agent pipeline to automatically produce accounting documents (acts/invoices) from incoming documents/emails using Python + LangChain-like tooling + Gigachat.
Stack used
- Python for connectors and orchestration.
- LangChain / LangGraph for creating agents and wiring tools (demo referenced LangGraph/react-agent style interfaces).
- Sberbank’s Gigachat (GGChat 2 Max) as the LLM (50k free tokens after registration; ~1M tokens ≈ 1,950 RUB referenced).
- Types (a typesetting tool invoked via subprocess) to render PDFs from JSON/templates (presenter likened it to a modern LaTeX replacement).
- Data classes to represent nested structures (bank details, counterparty info, jobs list).
- Optional Postgres or other stores to persist chat history (connectors available).
Step-by-step breakdown (demo)
- Create project, virtualenv, install packages (LangChain / LangGraph connectors, Gigachat connector).
- Set up .env with API keys/credentials for Gigachat.
- Define data models (email, bank details, jobs) and a deterministic function generate_pdf_act(custom, jobs) that consumes structured data and writes a PDF via Types.
- Implement an LMAgent class that wraps the LLM + tools, with helper methods like upload_file and invoke (send messages / attachments).
- Build an initial single-agent flow:
- Upload a file (docx/pdf/text).
- Ask the agent to extract counterparty details and request job lines (task name + cost).
- Call generate_pdf_act to produce the document. The demo showed correct parsing across many formats.
- Extend with mail.py:
- Add a tool to fetch recent emails (returning subject/body/attachments as data classes).
- Add a tool to upload chosen attachments to the LLM.
- Add these tools to the agent capabilities.
- Observe hallucination when a single agent handled search → upload → generation. Implemented solution: split into two agents with restricted toolsets:
- Agent A: only fetch_recent_emails; returns the filename that contains company details (or “missing”).
- Agent B: only generate_pdf_act and is given the verified local file to extract details and generate the PDF. This sequentialization reduced hallucinations.
Demo details
- Use low temperature (0.1) for deterministic outputs and lower creativity.
- System prompt is carefully written with formatting rules (how to format signatory name, wrap company names, etc.).
- The agent + tools successfully parsed many real company detail files and supported multiple formats (docx, pdf, tables, varied layouts).
Implementation tips and practical notes
- Keep deterministic/authoritative parts (final PDF template, personal/company details) in code you control rather than letting the LLM generate them end-to-end.
- Use clear function names, docstrings, and comments so the LLM can understand what a tool expects and returns.
- Use low temperature for production flows where accuracy matters.
- Log or print attachments and model decisions to audit what the agent uploaded and used.
- For persistent contexts across runs, store chat history externally (Postgres, etc.); LangChain/LangGraph connectors support this.
- Be pragmatic about hallucinations: split complex flows into simple agents with tightly scoped tool access and deterministic handoffs.
- Consider cost and availability of models; the demo used Sber/Gigachat due to regional API availability. Alternatives mentioned: Yandex GPT v5, MTS and VK internal LMs.
Broader implications and analysis
- Agents change the software development paradigm by letting developers write connectors and describe business logic in natural language; this reduces the need for bespoke frontends and complex orchestration code for many workflows.
- Agents can replace complex BPM setups: connectors perform deterministic actions and an LLM orchestrates via natural language.
- The speaker anticipates a large proliferation of agents (citing industry expectations from figures like Mark Zuckerberg and NVIDIA leadership), creating new automation and productization opportunities.
- Practical caution: agents are powerful but not magical — engineering safeguards are required to avoid hallucinations and ensure correctness for critical fields (bank details, signatory names, etc.).
Tutorial & code resources
- The video is a step-by-step demo; the project repository and source code were provided in the video description.
- Walkthroughs covered:
- Project initialization and virtualenv setup.
- Installing LangChain / LangGraph / Gigachat connectors.
- Writing data classes and generate_pdf_act.
- Building LMAgent, adding mail fetching and attachment upload tools.
- Splitting agents to avoid hallucination and running the pipeline on varied document formats.
Main speakers / sources referenced
- Presenter: the video’s author (an entrepreneur / software developer who created the demo and a web development course).
- LLM provider / model used: Sber / Gigachat (GGChat 2 Max).
- Libraries / tools: LangChain, LangGraph, Python, Types typesetting tool.
- Other models/providers mentioned: Yandex GPT v5, MTS internal LM, VK internal LMs, OpenAI (not used in the demo due to regional API restrictions).
- Industry references: comments about an agents boom citing Mark Zuckerberg and NVIDIA leadership.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.