Summary of "Сливаю рабочую тему заработка на парсинге | Кейс Wildberries (Часть 1)"
Video purpose
- Tutorial (Part 1) on how to monetize web scraping by building a robust parser for Wildberries search queries and overcoming site-imposed limits.
- Goal: obtain full product coverage (potentially hundreds of thousands of items) despite Wildberries’ pagination/limit protections.
Problem observed
- Wildberries loads search results dynamically and returns ~100 product cards per request.
- Direct pagination is limited: you can request up to ~60 pages (≈60 * 100 = ~6,000 items). Earlier limits were ≈5,000.
- Requests for pages beyond that return errors (HTTP 500). Long scrolling depth appears to be treated as bot-like behavior.
- Many marketplaces (e.g., Ozon) use similar protections.
Monetization ideas
- Sell bespoke automations/integrations to businesses.
- Sell downloadable parsed datasets (example: Kwork listings with price range ~500–2,000+ rubles per dataset).
High-level scraper strategy (universal approach)
- Avoid non-universal filters (shoe size, brand, sellers) because they vary by category and may still exceed limits.
- Use the price filter as a universal splitting key: request results for price intervals and recursively split intervals that still exceed the site’s per-request item limit.
- Note that reported counts can be approximate/cached and products appear/disappear — expect slight inaccuracies.
All price values are transmitted in kopecks (divide by 100 for readable RUB).
Technical stack / tools recommended
- Python:
- requests (synchronous helper requests)
- httpx (for later asynchronous product parsing)
- loguru (logging)
- dataclasses (for structured responses)
- curl and a curl-to-Python converter (to obtain sample requests)
- Cookies/headers captured from browser devtools (used to pass site checks initially; later add protection bypass)
- Proxies: Mobile Proxy Space recommended (mobile + server proxies; unlimited traffic for mobile; rotate API key on blocks). Promo code mentioned in the video.
- Reference to a previous video that explains security bypass / browser checks.
Implementation architecture and algorithm (detailed)
Parser class
Create a parsing class (example name: VBsefrasParserRange) initialized with:
- search phrase
- optional cookies/headers
- parameters (examples and recommended defaults):
price_step: default 500 RUB (stored as kopecks)min_step: 10 RUBmax_stepmax_count_of_goodsper interval (safe default: 5,000)max_splitrecursion depth for splitting (e.g., 10)small_countthreshold (e.g., 500) to decide adaptive step changes
Key methods
fetch/request:- Build request params, merge
add_params, send GET usingrequests, return JSON orNoneon error.
- Build request params, merge
get_total(data):- Safely read
data['total']from the JSON.
- Safely read
get_min_max_price(data):- Find the “price” filter in JSON filters by name and extract min/max price (helper).
get_price_range(json):- Build and return a dataclass (
DataPage) containingmin_price,max_price,total.
- Build and return a dataclass (
parse()(main loop) — high-level flow:- Fetch the base page (no price filter) to obtain global
min_price,max_price, and total. - Iterate with
start_pricefrommin_pricetomax_priceusing a currentstep:- Compute
final_price = min(start_price + step, max_price). - Request results for that price interval (in kopecks).
- If interval is empty: log and increase step (sparse zone).
- If
data.total > max_count_of_goods: callsplit_price_range()to recursively subdivide the interval. - If
data.total <= max_count_of_goods: accept the interval and append aDataPage(min, max, total). - Update
start_price = final_priceand continue.
- Compute
- Fetch the base page (no price filter) to obtain global
split_price_range(min_price, max_price, depth):- Recursion guard: if
depth > max_splitreturn[]. - If
(max_price - min_price) <= min_step: fetch once and return aDataPageif non-empty (cannot split further). - Else compute
mid = (min + max) // 2and recursively call split on left(min, mid)and right(mid+1, max). - Return concatenated lists of
DataPageranges.
- Recursion guard: if
Output
- The parser returns a list of
DataPageobjects describing price ranges and the item counts for each range. These ranges are intended for efficient parallel/asynchronous product downloads (withhttpx).
Practical notes and caveats
- Prices are transmitted in kopecks (divide by 100 for RUB).
- Headers and device IDs captured from requests can often be reduced/cleaned.
- Counts returned by the site can be cached/approximate; results can change during parsing (sales, new listings).
- Recursion depth and step sizes are tunable parameters. The example run produced ~56 price ranges for one query.
- After generating ranges, the next stage (not covered in this video) is to fetch product lists and parse items asynchronously, then save to Excel or a database.
What the video demonstrates
- Reverse-engineering Wildberries search fetch requests via browser devtools.
- Converting a captured request to Python (curl → Python).
- Building an adaptive range-splitting algorithm to bypass per-request limits by slicing by price with recursion.
- Designing code structure (requests layer, helpers, dataclass
DataPage, recursion split) and logging to inspect intermediate results. - Practical workflow: capture cookies/headers, test small runs, adjust parameters, then scale with proxies and async requests.
Result shown
- On a “men’s sneakers” example, the algorithm produced ~56 price ranges with counts under the per-request limit, demonstrating the approach’s effectiveness.
Series / tutorial status
- Part 1 (this video): focused on finding price intervals (first stage).
- Part 2: will implement asynchronous parsing of products per price range and saving results (not yet filmed/uploaded at the time of this video).
Mentioned services / tools
- Wildberries (target marketplace)
- Kwork (example marketplace for selling parsed datasets)
- Mobile Proxy Space (proxy provider)
- curl and curl-to-Python converter
- Browser devtools (Network / Fetch tab)
Main speaker / source
- Host of the Parshub channel (narrator / developer presenting the tutorial)
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...