Summary of "Маленькие LLM как агенты - тест локальных моделей до 8B"

What the video tests

The video evaluates small (~3B–9B parameters) local LLMs used as agents for:

  1. Coding agent task: modify an existing project and implement a very specific UI feature without breaking API contracts.
  2. Web search agent task: search the web for latest news/posts (Jan–Apr 2026) about a new image generation model, filter relevant info, and save results as a JSON/Jon file.
  3. Tool-calling benchmark (instrumental mode): models don’t write code/search; they must choose whether/how to call tools correctly (or refuse tool use when appropriate).

All runs are done locally, using lama.cpp (lama CP) with context size ~64k tokens, and the authors record execution time and memory usage (and whether the model solved the task).


Test 1: Agent modifies an existing “Focusboard” project

Goal

Using the repo + AgentMD rules, the model must:

Model outcomes (high level)

Key takeaway from Test 1

Small models can reason, but tool + code integration reliability is the limiting factor; the strongest ones complete the scenario end-to-end and correctly modify existing projects.


Test 2: Web search agent + save results to file

Goal

Act as an agent with web search + filesystem tools to gather Jan–Apr 2026 news/posts about a new image generation model, filter relevant items, and save to a Jon/JSON file in the working directory.

Emphasis is on the full pipeline: search → selection → verification → correct JSON writing → correct final output.

Model outcomes (high level)

Key takeaway from Test 2

The main failure mode is not searching—it’s correctly generating the final JSON and writing the file reliably.


Test 3: Benchmark for tool-calling behavior (12 prompts)

Purpose

Measure how well local small models:

Scoring

Results highlights

Key takeaway from Benchmark

Many models can “reason,” but a smaller set are stable in tool-instrumental agent behavior, which is crucial for local engineering reliability.


Main overall conclusion of the video


Main speakers/sources mentioned

Category ?

Technology


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video