Video summary

Сливаю рабочую тему заработка на парсинге | Кейс Wildberries (Часть 1)

Main summary

Key takeaways

Technology

Video purpose

  • Tutorial (Part 1) on how to monetize web scraping by building a robust parser for Wildberries search queries and overcoming site-imposed limits.
  • Goal: obtain full product coverage (potentially hundreds of thousands of items) despite Wildberries’ pagination/limit protections.

Problem observed

  • Wildberries loads search results dynamically and returns ~100 product cards per request.
  • Direct pagination is limited: you can request up to ~60 pages (≈60 * 100 = ~6,000 items). Earlier limits were ≈5,000.
  • Requests for pages beyond that return errors (HTTP 500). Long scrolling depth appears to be treated as bot-like behavior.
  • Many marketplaces (e.g., Ozon) use similar protections.

Monetization ideas

  • Sell bespoke automations/integrations to businesses.
  • Sell downloadable parsed datasets (example: Kwork listings with price range ~500–2,000+ rubles per dataset).

High-level scraper strategy (universal approach)

  • Avoid non-universal filters (shoe size, brand, sellers) because they vary by category and may still exceed limits.
  • Use the price filter as a universal splitting key: request results for price intervals and recursively split intervals that still exceed the site’s per-request item limit.
  • Note that reported counts can be approximate/cached and products appear/disappear — expect slight inaccuracies.

All price values are transmitted in kopecks (divide by 100 for readable RUB).

Technical stack / tools recommended

  • Python:
    • requests (synchronous helper requests)
    • httpx (for later asynchronous product parsing)
    • loguru (logging)
    • dataclasses (for structured responses)
  • curl and a curl-to-Python converter (to obtain sample requests)
  • Cookies/headers captured from browser devtools (used to pass site checks initially; later add protection bypass)
  • Proxies: Mobile Proxy Space recommended (mobile + server proxies; unlimited traffic for mobile; rotate API key on blocks). Promo code mentioned in the video.
  • Reference to a previous video that explains security bypass / browser checks.

Implementation architecture and algorithm (detailed)

Parser class

Create a parsing class (example name: VBsefrasParserRange) initialized with:

  • search phrase
  • optional cookies/headers
  • parameters (examples and recommended defaults):
    • price_step: default 500 RUB (stored as kopecks)
    • min_step: 10 RUB
    • max_step
    • max_count_of_goods per interval (safe default: 5,000)
    • max_split recursion depth for splitting (e.g., 10)
    • small_count threshold (e.g., 500) to decide adaptive step changes

Key methods

  • fetch/request:
    • Build request params, merge add_params, send GET using requests, return JSON or None on error.
  • get_total(data):
    • Safely read data['total'] from the JSON.
  • get_min_max_price(data):
    • Find the “price” filter in JSON filters by name and extract min/max price (helper).
  • get_price_range(json):
    • Build and return a dataclass (DataPage) containing min_price, max_price, total.
  • parse() (main loop) — high-level flow:
    1. Fetch the base page (no price filter) to obtain global min_price, max_price, and total.
    2. Iterate with start_price from min_price to max_price using a current step:
      • Compute final_price = min(start_price + step, max_price).
      • Request results for that price interval (in kopecks).
      • If interval is empty: log and increase step (sparse zone).
      • If data.total > max_count_of_goods: call split_price_range() to recursively subdivide the interval.
      • If data.total <= max_count_of_goods: accept the interval and append a DataPage(min, max, total).
      • Update start_price = final_price and continue.
  • split_price_range(min_price, max_price, depth):
    • Recursion guard: if depth > max_split return [].
    • If (max_price - min_price) <= min_step: fetch once and return a DataPage if non-empty (cannot split further).
    • Else compute mid = (min + max) // 2 and recursively call split on left (min, mid) and right (mid+1, max).
    • Return concatenated lists of DataPage ranges.

Output

  • The parser returns a list of DataPage objects describing price ranges and the item counts for each range. These ranges are intended for efficient parallel/asynchronous product downloads (with httpx).

Practical notes and caveats

  • Prices are transmitted in kopecks (divide by 100 for RUB).
  • Headers and device IDs captured from requests can often be reduced/cleaned.
  • Counts returned by the site can be cached/approximate; results can change during parsing (sales, new listings).
  • Recursion depth and step sizes are tunable parameters. The example run produced ~56 price ranges for one query.
  • After generating ranges, the next stage (not covered in this video) is to fetch product lists and parse items asynchronously, then save to Excel or a database.

What the video demonstrates

  • Reverse-engineering Wildberries search fetch requests via browser devtools.
  • Converting a captured request to Python (curl → Python).
  • Building an adaptive range-splitting algorithm to bypass per-request limits by slicing by price with recursion.
  • Designing code structure (requests layer, helpers, dataclass DataPage, recursion split) and logging to inspect intermediate results.
  • Practical workflow: capture cookies/headers, test small runs, adjust parameters, then scale with proxies and async requests.

Result shown

  • On a “men’s sneakers” example, the algorithm produced ~56 price ranges with counts under the per-request limit, demonstrating the approach’s effectiveness.

Series / tutorial status

  • Part 1 (this video): focused on finding price intervals (first stage).
  • Part 2: will implement asynchronous parsing of products per price range and saving results (not yet filmed/uploaded at the time of this video).

Mentioned services / tools

  • Wildberries (target marketplace)
  • Kwork (example marketplace for selling parsed datasets)
  • Mobile Proxy Space (proxy provider)
  • curl and curl-to-Python converter
  • Browser devtools (Network / Fetch tab)

Main speaker / source

  • Host of the Parshub channel (narrator / developer presenting the tutorial)

Original video