Summary of "Купил МОНСТРА на 32 ГБ VRAM за 45к. Что может серверная Tesla V100 в ИГРАХ?"
Product reviewed
A custom “Frankenstein” consumer install of an NVIDIA Tesla V100 (Volta) with 32 GB VRAM, specifically a server modification (PG503216 / PG503216). It was obtained from decommissioned enterprise hardware and adapted for use in a home PC.
Key features
- 32 GB VRAM (server-grade), positioned as a major advantage for LLM and AI training/inference.
- Enhanced memory throughput: claimed ~250 GB/s memory bandwidth and “>1 TB/s over a 4096-bit bus” (as stated in the video).
- Factory overclocked/pumped memory (the video claims a normal retail V100 has tuning blocked in BIOS/software).
- Uses server connector (SXM2) rather than standard PCIe.
- Includes a blank/adapter board so the SXM2 module can be installed into a desktop PC (seller provides mounting hardware + thermal interface).
- Early-generation Volta tensor cores: affects AI/game feature support (details below).
- Cooling solutions included:
- Option for a big air cooler (didn’t fit his case).
- Recommended liquid cooling (ID-COOLING FX240 Pro + water block/plate solution), tuned to keep noise low.
Pros (as presented)
AI/ML performance & capability
- Video calls it the best option in its price bracket for AI at this VRAM level.
- Strong LLM throughput, with token/sec results (see benchmarks below).
- Image generation described as very good for SD models; results scale with RAM/VRAM needs.
- Video generation and music generation also work, though video generation stability issues show up for heavy models.
Practical usability improvements
After driver workarounds, the card can be run in Windows with:
- Resizable BAR enabled
- Special use of two driver types to avoid installer/device incompatibility
- Switching from TCC (server mode) to VDDM, which enables Nvidia Control Panel and normal 3D acceleration
Claimed benefit: avoids “ghost display / phantom monitor” issues commonly reported with server GPUs.
Cooling and acoustics
With liquid cooling, he reports:
- “Absolutely silent” operation (constant fan speed; no noticeable ramp/noise spikes)
- Stress results at 300W:
- ~58°C chip
- 60°C memory
- 73°C host
Gaming stability (with caveats)
- Several games run at native 2K (Quad HD) with very stable frame pacing.
- Upscalers / “quality upscaling” are described as important for consistency in some titles.
Cost positioning / overall story
Framed as a way to get 32 GB VRAM “like” or “close to” modern high-end needs for less than typical consumer options with equivalent VRAM.
Cons / limitations (as presented)
Hardware & installation complexity
- Requires SXM2 adapter + server mounting, not a typical PCIe GPU swap.
- SXM2 is fragile; he warns about damaging it (risk described as turning it into a “keychain”).
- Liquid cooling needs care:
- No standard desktop CPU AIO spacers/limiters
- Over-tightening can bend the plate
Driver / compatibility hassle
- Setup took 2–3 hours of trial combinations.
- Must use a multi-step workflow:
- Use server driver 56603 for detection
- Unpack a GeForce/GRD installer and force-install the V100-compatible driver via Device Manager
- Run a command to switch TCC → VDDM
- Without the correct steps, the card may remain in TCC, where:
- games and Nvidia Control Panel are basically unavailable
AI features absent in games
- No ray tracing (video claims no RT cores).
- No DLSS (Volta tensor cores are older generation and lack required data formats such as BF16).
- Gaming shifts toward raw raster performance.
Performance for gaming depends on upscaling/settings
- Some titles show minor stutters / weaker 1% lows (example referenced: “D…ger 2”).
- He tests only 5 games.
Performance highlights (benchmarks mentioned)
LLM / text generation (tokens/sec)
- 4B-class
- “Gem3 4B” ~94 tok/s
- “Qn3 4B” ~96 tok/s
- 12B (Q8 quantization)
- ~45 tok/s
- 20B
- ~120 tok/s (attributed to MXFP4 optimization)
- 26GB-class model (Q6K)
- ~24 tok/s
- 30B (Qn3, ~23GB VRAM)
- ~70 tok/s (described as best “smart-brain/time ratio”)
Image generation (iterations/sec or seconds per image)
- Stable Diffusion 1.5
- Cold start ~3s
- After load: ~17 it/s
- SD XL (1024×1024)
- Cold start ~26s
- Then ~3 it/s (about 1 image per ~8s)
- “Z Image Turbo” 1024×1024
- Needs ~21GB VRAM + ~35GB RAM
- Cold start ~44s
- After load: ~1 it/s (about 1 image per ~4.5s)
- Next gen “Qn Image” at 1328×1328
- Needs ~25GB VRAM + ~44GB RAM
- Cold start ~528s (~9 min)
- Then ~458s per image (~7.5 min); turbo ~250s
Video generation
- One 22 (5s video, 14B params)
- Cold start ~273s
- Then ~206s (77GB RAM, 22GB VRAM)
- LTX2 (5s video)
- Cold start ~800s
- Uses all 32GB VRAM
- Error/connection drop after generation forcing reload
- would be ~8.5 min without crashes, but crashes add overhead repeatedly
Music generation
- “ACE Step 1.5”
- Cold start 83s
- Then ~50s to complete a 2-minute composition
- Uses ~12GB VRAM + 29GB RAM
Gaming (selected results)
- The Last of Us (Port), Quad HD native, max, no upscaling
- ~60 FPS with smooth frame times
- Cyberpunk 2077, Quad HD native, high settings
- ~60 FPS average with stable frame time graph
- “D…ger 2”, low preset, Quad HD, upscaling + chosen settings
- ~60 FPS, but noticeable small jerks and weaker 1% lows
- Hogwarts Legacy, Ultra preset, Quad HD native
- stable ~60 FPS, micro-drops noted (~0.1%)
- CS2
- Low preset, QHD: ~350 FPS
- Full HD: ~430 FPS
Overall gaming averages (as stated)
- Native Quad HD: ~59 FPS average
- Without quality upscaler: ~72 FPS average
- Conclusion: gaming performance is “on par” with decent gaming hardware (limited test sample)
Comparisons made
- Compared to Tesla P40 (previously reviewed):
- This V100 32GB is claimed to be “5 heads above” in AI (text/image/video generation) and generally overall.
- Compared conceptually to consumer GPUs for “raw performance equivalence”:
- Closest new-gen NVIDIA gaming power: RTX 5060
- Older NVIDIA range: “somewhere between RTX 3070 and RTX 2080 Ti”
- AMD: “between RX 6700 XT and RX 6800”
- For gaming features:
- No DLSS (Volta tensor core limitations)
- No ray tracing (no RT cores)
User experience notes (installation + driver workflow)
Physical fit / case constraints
- Large air cooler radiator didn’t fit (reported height near 100mm, with board thickness; roughly 6 PCIe slots tall).
Noise
- Air cooling not used (assumed jet-plane fan noise).
- Liquid cooling selected for silence.
Driver install pain
- Requires BIOS tweak + multi-step driver workflow.
- Took 2–3 hours in his experience.
Unique points mentioned (complete list)
- Goal pitch: 32GB VRAM for home AI “server mutant” use.
- Tesla V100 originally 16GB; 32GB is sold for cloud/supercomputers.
- Specific modification PG503216: some cores cut, memory improved.
- Claimed throughput increase and bus bandwidth numbers.
- Factory memory “overclock” and restrictions preventing user tuning on server cards.
- Uses SXM2 connector; requires adapter/blank board.
- Chinese adapter/blank boards enable desktop installation.
- Seller kit includes: chip on adapter/blank, two cooling options, mounting kit, thermal pads.
- Price range at filming: ~55,000 RUB base (with fluctuations like 50k/46k).
- Import/customs note: duty limit 200€, final total often ~60,000 RUB.
- Air cooler didn’t physically fit; liquid cooling used instead.
- Liquid cooling fits with smaller “slot” footprint.
- Assembly warnings: fragile SXM2; avoid overtightening (no spacers/limiters).
- Thermal pad thickness details: 6mm pads and 2mm pads on throttle (as stated).
- Power connectors: 4×8-pin on board; uses an adapter hack from PCIe cables by removing a pin segment to make a 6-pin with correct pinout.
- Cooling settings: pump at 80% (~2800 RPM), radiator fans at 1200 RPM.
- Temperature results: 58°C chip / 60°C memory / 73°C host at 300W.
- Driver process takes 2–3 hours; enable Resizable BAR.
- Use server driver 56603, then force-install V100-compatible driver to switch to VDDM.
- Benefit: avoids “ghost/phantom monitor” issues.
- AI claim: best-for-price at this VRAM level; no real competitors in the described segment.
- Gaming limitations: no RT cores, no DLSS; only rasterization + FSR-like methods.
- Hardware equivalence estimates to RTX/AMD tiers for gaming power.
- LLM test results across model sizes and quantizations (tokens/sec).
- Image generation results across SD 1.5, SD XL, and other turbo/tall models (RAM/VRAM constraints).
- Video generation results with big RAM/VRAM needs and LTX2 crash/stability issue.
- Music generation results (ACE Step 1.5).
- Gaming benchmarks across specific games and reported FPS/stability.
- Final summary: not ideal for casual gamers, but “universal” for creators/AI workloads; recommendation tied to monetization/content creation.
- Speculative future/collector angle: museum piece; suggests value as 32GB neural card “for price of budget GPUs.”
- Mentions NVLink as an imagined upgrade path (two/four cards) for more memory/compute.
Speaker views
- Single main speaker (Andrey, “Cyberkuznitsa”):
- Technical explanations (Volta/tensor cores, PG503216 differences)
- Installation/assembly experience
- Driver workflow and Windows mode switching
- Benchmark results and comparisons
- Overall recommendation and use-case guidance
(No other speakers are clearly identified in the provided subtitles.)
Overall verdict / recommendation
Recommended primarily for AI/creation workloads (LLMs, image/video generation, local content monetization) where 32GB VRAM is valuable.
Not ideal for typical gamers, mainly due to:
- driver complexity
- lack of DLSS and ray tracing
- need for special setup
However, for native QHD gaming, it can still deliver around ~60 FPS with good frame pacing in the tested titles.
Category
Product Review
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.