Summary of "Why you DON’T want a 20TB Hard Drive"
Summary: Why You “Don’t Want” a 20TB Hard Drive (Technical + Practical Takeaways)
Subject of the Video
The video argues that you may not want a 20TB hard drive for many real-world use cases—even though it offers impressive capacity.
Drive Type Discussed
The focus is on enterprise-style Western Digital 20TB UltraStar DC HC650–class drives (enterprise OEM sampling is mentioned), specifically featuring:
- Shingled Magnetic Recording (SMR)
- A helium-filled enclosure (to improve efficiency by reducing atmospheric resistance inside the drive)
- A nine-platter design, which increases mechanical complexity and requires tighter engineering tolerances
SMR Performance: Why Reads Are Fine, Writes Hurt
The key technical point is how SMR write behavior impacts performance:
- Reading is effectively fine
- Overlapping “shingles” don’t disrupt reads the way they disrupt writes.
- Writing has a penalty
- To update data in the middle of an overlapped shingle, the drive may need to delete/rewrite adjacent tracks to reclaim space.
- Under heavy random I/O, this becomes dramatically worse because updates cascade into extra work.
- The speaker compares this to SSD update behavior
- Both can require erase/move/rewrite-style operations when changing small regions.
- Partial analogy: Seagate’s earlier “j(in)gled” magnetic recording
- Intended for write-once/read-many scenarios, which can reduce the impact compared to random write workloads.
Capacity Scaling vs Performance Scaling
The speaker’s argument: capacity keeps growing, speed doesn’t keep up.
Hard drive performance improvements are limited by two main levers:
- Shrink bit size
- Spin faster
In practice:
- Spinning faster increases drag and complexity
- Shrinking bits has limits
So manufacturers often scale capacity mainly by adding more platters, rather than making random-write performance much faster.
Alternative/Future Concept: Seagate’s Dual-Actuator Idea
The video also mentions Seagate work on dual actuator technology, where the drive could perform two operations simultaneously.
The implied benefit:
- Instead of waiting for a single actuator to move across platters sequentially, the drive may access different platters in parallel, improving throughput.
Reliability / MTTF: The “What Happens When It Fails” Focus
The drives are marketed with an advertised metric around 2.5 million hours MTTF.
However, the speaker emphasizes what matters operationally:
- What happens during failure in real deployments, especially in arrays where recovery can dominate downtime and risk.
Napkin Math: How Long Full Writes / Overwrites Take
A rough calculation is used:
- Assume about 200 MB/s sequential write
- Writing/overwriting the entire surface of a 20TB SMR helium drive could take about a day (or close to it)
- In larger multi-drive systems, recovery could mean multiple days
Data Protection and Cost vs Risk (Consumer vs Enterprise)
The video frames different priorities across environments:
Home / Consumer NAS
- Long recovery times are described as painful and risky
- The risk grows if backups aren’t truly comprehensive
Data Centers / Enterprise Buyers
- Some buyers may prioritize reliability over “newer/bigger” drives
- Example mentioned: Backblaze allegedly prefers lower-capacity bulk fleet drives (around ~4–6TB) because their experience suggested better overall cost vs reliability
- The video references Backblaze’s published reliability reporting conceptually (with some details uncertain or potentially mixed)
SSHD (HDD + Small SSD Cache) Doesn’t Solve the Core Issue
The video clarifies SSHD as:
- A standard hard drive with a small SSD cache for frequently accessed/active small data
But it won’t materially improve:
- Writing the entire surface
- Reading the entire surface
So the speaker argues SSHD is not a solution to the “big drive + full-surface operations take forever” problem.
The “Danger Case” Scenario: Failure During Rebuild/Restore
A concrete example is given:
- Eight 20TB drives in a small enclosure
If one fails:
- A large fraction (or all) of the data may need restoration
- During rebuild/restore:
- the remaining drives must be read across large portions
- this can take over a day, depending on workload and parity/rebuild scheme
This is presented as where the “20TB isn’t what you want” argument becomes strongest.
Psychological/Usage Behavior Point: Bigger Drives Can Increase Damage
The video also points to a behavioral risk:
- Large capacity can encourage users to store “everything” without good backups
- That increases the real-world impact of a failure:
- bigger disk → bigger “oops”
Household Context: Responsiveness Matters Too
Non-technical examples reinforce the recovery/latency theme:
- Home server behavior includes drive power management (e.g., drives “park” or sleep)
- The speaker adjusted settings so drives don’t sleep, avoiding slow load behavior
- They considered switching to an SSD-based approach but noted it can be expensive
Main Speakers / Sources Mentioned
- Andre (referred to as Andre Arjun / Andre A in forum context)
- Linus Tech Tips (LTT) creators as the primary host/speakers (including mention of “Dave” as a colleague/fellow host)
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.