Summary of "The last ramble about my RX 6900XT Red Devil Ultimate because it's SUPER DEAD"
Summary — RX 6900 XT Red Devil Ultimate (final report)
What happened to the card
- The GPU die is physically cracked and internally massively shorted — the card is effectively irreparable (the chip/core must be replaced, which is economically impractical).
- Root cause: a high‑side MOSFET / power‑stage failure shorted 12 V to the VRM output, producing a very large energy/voltage surge into Vcore. That surge apparently cracked and welded internal metal layers in the die, producing a catastrophic short.
- The creator attempted a power‑stage swap (used a power stage from an X570 ITX board). The replacement was either misaligned or thermally damaged during rework and no longer functions. With the GPU core removed, the VRM largely powers up — confirming the core is the shorted component.
Technical analysis and diagnostics
- Short behavior
- Most PSUs are effectively a single 12 V rail with per‑connector OCP; bulk capacitors across PSU/motherboard/GPU still dump energy into the short when a MOSFET fails.
- Extra input bulk caps on this card likely increased the energy available for the destructive surge.
- Visible damage
- The visible crack indicates an unusually energetic event — normally VRM failures kill chips silently (no visible damage).
- The speaker suspects melted/blobbing of metal interconnect layers and is considering X‑raying the die to examine internal damage.
- Missed measurements
- The surge was not captured on the oscilloscope (missed opportunity). Recording the Vcore waveform during a high‑side MOSFET failure would be informative.
- Possible contributing factors (not proven)
- Navi’s aggressive deep‑idle behavior (Vcore going to 0 V then re‑powering).
- A +300 mV voltage offset that was being run.
- A defective power stage from the factory.
- Thermal stress from extreme cooling (cold causing undefined behavior in components).
Repair economics
- Replacing the entire GPU core (die) is cost‑prohibitive; buying a new card is the sensible option.
- RMA is unlikely to accept a card in this physical condition.
Overclocking / benching experience (air, water, LN2)
- Overall: excellent on air and water/chilled water; extremely frustrating on liquid nitrogen (LN2).
- Notable benchmark results
- Fire Strike Extreme (1440p): top single‑GPU ranking achieved (around 32k on that benchmark).
- Fire Strike Ultra (4K): 3rd place on air cooling; he could not beat his air‑cooled Ultra score on LN2.
- LN2 issues observed
- Cold boot bug / driver crash: Windows/AMD driver crashes below about −75 °C. The card can be driven colder in BIOS but Windows/driver breaks when the driver initializes. The speaker suspects driver temperature monitoring/logic causes the crash; a driver toggle to disable temp monitoring or aggressive idle might help.
- Memory subsystem collapse: at LN2 temperatures memory controller performance degrades — high memory clocks fail, ECC/correction triggers, and throughput drops dramatically. Memory‑bound 4K/Ultra benchmarks suffer even if core clocks are higher. Fire Strike Extreme (less memory demand, benefits from Infinity Cache) fares better.
- Cold boot and load behavior: pot temperature vs die temperature differ; during load the die can be warmer and temporarily avoid driver crash.
- Practical LN2 benching tips: insulate thoroughly (plastidip), clean flux/oils before dipping, monitor VRM temps, and avoid exposing bare PCB at deep cold because condensation and shorts are high risk.
- Dry ice (~−73 °C) may be a better sweet spot for Navi than LN2.
- Power and cooling
- The PowerColor Red Devil cooler is praised — excellent air cooling (survived very high power draw for air runs).
- Chip power draws reported: approximately 531–582 W in different runs; board power likely ~600–700 W under LN2.
- Design oddity: two of the three 8‑pin connectors appear paralleled — an unusual wiring decision.
Hardware mods and what helped
- Input cap mod (additional input/filter capacitors) helped core overclock stability on air cooling.
- Small 4.7 µF (0402) caps near memory helped memory overclocking; larger added memory bulk caps did not help.
- Modifying the 0.75 V support rail for LN2 had no effect on the driver crash or cold behavior — it was pointless for that purpose.
- Rework warning: hot‑air rework is tricky; prolonged heating can destroy power stages. The speaker admits limited skill with hot air rework.
Lessons, recommendations and takeaways
- If a VRM high‑side MOSFET fails short, it can kill the GPU very quickly; large input capacitance increases destructive energy.
- LN2 benching Navi / RX 6900 XT is frustrating: severe cold bugs, driver crashes below ~−75 °C, and memory instability limit gains. Dry ice or chilled water are more practical for sustained overclocking.
- For LN2: insulate thoroughly, avoid bare PCB exposure, and be cautious of condensation and cold‑related component failures.
- Diagnostic tip: removing the GPU core can reveal whether the VRM itself is functional (if the VRM fires without the core, the core is shorted).
- Economically, replacing a GPU core is not sensible; salvage and RMA paths are limited for that kind of damage.
Relevant comparisons / references
- Driver vs hardware behavior: driver temperature monitoring / aggressive idle logic is likely implicated — a software fix by AMD (toggle to disable/adjust temperature/idle behavior) could help.
- Benchmark rules difference: HardwareBot allows disabling tessellation (explains differing score comparisons).
Mentioned tests, attempted procedures and failed goals
- Attempted to push top single‑GPU Fire Strike Extreme and Fire Strike Ultra on LN2; achieved Fire Strike Extreme top but the card died before securing the Ultra record.
- Planned additional runs (Time Spy, 3DMark 11/Vantage) were aborted due to card failure.
- Considered sending the die for X‑ray to document internal destruction.
Main speakers / sources
- Video author / main speaker: Bill Zoid (YouTuber / overclocker).
- Referenced entities: PowerColor (card manufacturer), AMD (drivers), HardwareNumbers (YouTuber/score comparison), OGS (overclocker / scoreboard competitor).
No further troubleshooting or follow‑up actions are provided in the video; the card is declared permanently dead and uneconomical to repair.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...