Video Summary - Сливаю рабочую тему заработка на парсинге

Video purpose

Tutorial (Part 1) on how to monetize web scraping by building a robust parser for Wildberries search queries and overcoming site-imposed limits.
Goal: obtain full product coverage (potentially hundreds of thousands of items) despite Wildberries’ pagination/limit protections.

Problem observed

Wildberries loads search results dynamically and returns ~100 product cards per request.
Direct pagination is limited: you can request up to ~60 pages (≈60 * 100 = ~6,000 items). Earlier limits were ≈5,000.
Requests for pages beyond that return errors (HTTP 500). Long scrolling depth appears to be treated as bot-like behavior.
Many marketplaces (e.g., Ozon) use similar protections.

Monetization ideas

Sell bespoke automations/integrations to businesses.
Sell downloadable parsed datasets (example: Kwork listings with price range ~500–2,000+ rubles per dataset).

High-level scraper strategy (universal approach)

Avoid non-universal filters (shoe size, brand, sellers) because they vary by category and may still exceed limits.
Use the price filter as a universal splitting key: request results for price intervals and recursively split intervals that still exceed the site’s per-request item limit.
Note that reported counts can be approximate/cached and products appear/disappear — expect slight inaccuracies.

All price values are transmitted in kopecks (divide by 100 for readable RUB).

Technical stack / tools recommended

Python:
- requests (synchronous helper requests)
- httpx (for later asynchronous product parsing)
- loguru (logging)
- dataclasses (for structured responses)
curl and a curl-to-Python converter (to obtain sample requests)
Cookies/headers captured from browser devtools (used to pass site checks initially; later add protection bypass)
Proxies: Mobile Proxy Space recommended (mobile + server proxies; unlimited traffic for mobile; rotate API key on blocks). Promo code mentioned in the video.
Reference to a previous video that explains security bypass / browser checks.

Implementation architecture and algorithm (detailed)

Parser class

Create a parsing class (example name: VBsefrasParserRange) initialized with:

search phrase
optional cookies/headers
parameters (examples and recommended defaults):
- price_step: default 500 RUB (stored as kopecks)
- min_step: 10 RUB
- max_step
- max_count_of_goods per interval (safe default: 5,000)
- max_split recursion depth for splitting (e.g., 10)
- small_count threshold (e.g., 500) to decide adaptive step changes

Key methods

fetch/request:
- Build request params, merge add_params, send GET using requests, return JSON or None on error.
get_total(data):
- Safely read data['total'] from the JSON.
get_min_max_price(data):
- Find the “price” filter in JSON filters by name and extract min/max price (helper).
get_price_range(json):
- Build and return a dataclass (DataPage) containing min_price, max_price, total.
parse() (main loop) — high-level flow:
1. Fetch the base page (no price filter) to obtain global min_price, max_price, and total.
2. Iterate with start_price from min_price to max_price using a current step:
  - Compute final_price = min(start_price + step, max_price).
  - Request results for that price interval (in kopecks).
  - If interval is empty: log and increase step (sparse zone).
  - If data.total > max_count_of_goods: call split_price_range() to recursively subdivide the interval.
  - If data.total <= max_count_of_goods: accept the interval and append a DataPage(min, max, total).
  - Update start_price = final_price and continue.
split_price_range(min_price, max_price, depth):
- Recursion guard: if depth > max_split return [].
- If (max_price - min_price) <= min_step: fetch once and return a DataPage if non-empty (cannot split further).
- Else compute mid = (min + max) // 2 and recursively call split on left (min, mid) and right (mid+1, max).
- Return concatenated lists of DataPage ranges.

Output

The parser returns a list of DataPage objects describing price ranges and the item counts for each range. These ranges are intended for efficient parallel/asynchronous product downloads (with httpx).

Practical notes and caveats

Prices are transmitted in kopecks (divide by 100 for RUB).
Headers and device IDs captured from requests can often be reduced/cleaned.
Counts returned by the site can be cached/approximate; results can change during parsing (sales, new listings).
Recursion depth and step sizes are tunable parameters. The example run produced ~56 price ranges for one query.
After generating ranges, the next stage (not covered in this video) is to fetch product lists and parse items asynchronously, then save to Excel or a database.

What the video demonstrates

Reverse-engineering Wildberries search fetch requests via browser devtools.
Converting a captured request to Python (curl → Python).
Building an adaptive range-splitting algorithm to bypass per-request limits by slicing by price with recursion.
Designing code structure (requests layer, helpers, dataclass DataPage, recursion split) and logging to inspect intermediate results.
Practical workflow: capture cookies/headers, test small runs, adjust parameters, then scale with proxies and async requests.

Result shown

On a “men’s sneakers” example, the algorithm produced ~56 price ranges with counts under the per-request limit, demonstrating the approach’s effectiveness.

Series / tutorial status

Part 1 (this video): focused on finding price intervals (first stage).
Part 2: will implement asynchronous parsing of products per price range and saving results (not yet filmed/uploaded at the time of this video).

Mentioned services / tools

Wildberries (target marketplace)
Kwork (example marketplace for selling parsed datasets)
Mobile Proxy Space (proxy provider)
curl and curl-to-Python converter
Browser devtools (Network / Fetch tab)

Main speaker / source

Host of the Parshub channel (narrator / developer presenting the tutorial)

Сливаю рабочую тему заработка на парсинге | Кейс Wildberries (Часть 1)

Key takeaways

Video purpose

Problem observed

Monetization ideas

High-level scraper strategy (universal approach)

Technical stack / tools recommended

Implementation architecture and algorithm (detailed)

Parser class

Key methods

Output

Practical notes and caveats

What the video demonstrates

Result shown

Series / tutorial status

Mentioned services / tools

Main speaker / source

Original video