Summary of "Why you DON’T want a 20TB Hard Drive"

Summary: Why You “Don’t Want” a 20TB Hard Drive (Technical + Practical Takeaways)

Subject of the Video

The video argues that you may not want a 20TB hard drive for many real-world use cases—even though it offers impressive capacity.

Drive Type Discussed

The focus is on enterprise-style Western Digital 20TB UltraStar DC HC650–class drives (enterprise OEM sampling is mentioned), specifically featuring:

Shingled Magnetic Recording (SMR)
A helium-filled enclosure (to improve efficiency by reducing atmospheric resistance inside the drive)
A nine-platter design, which increases mechanical complexity and requires tighter engineering tolerances

SMR Performance: Why Reads Are Fine, Writes Hurt

The key technical point is how SMR write behavior impacts performance:

Reading is effectively fine
- Overlapping “shingles” don’t disrupt reads the way they disrupt writes.
Writing has a penalty
- To update data in the middle of an overlapped shingle, the drive may need to delete/rewrite adjacent tracks to reclaim space.
- Under heavy random I/O, this becomes dramatically worse because updates cascade into extra work.
The speaker compares this to SSD update behavior
- Both can require erase/move/rewrite-style operations when changing small regions.
Partial analogy: Seagate’s earlier “j(in)gled” magnetic recording
- Intended for write-once/read-many scenarios, which can reduce the impact compared to random write workloads.

Capacity Scaling vs Performance Scaling

The speaker’s argument: capacity keeps growing, speed doesn’t keep up.

Hard drive performance improvements are limited by two main levers:

Shrink bit size
Spin faster

In practice:

Spinning faster increases drag and complexity
Shrinking bits has limits

So manufacturers often scale capacity mainly by adding more platters, rather than making random-write performance much faster.

Alternative/Future Concept: Seagate’s Dual-Actuator Idea

The video also mentions Seagate work on dual actuator technology, where the drive could perform two operations simultaneously.

The implied benefit:

Instead of waiting for a single actuator to move across platters sequentially, the drive may access different platters in parallel, improving throughput.

Reliability / MTTF: The “What Happens When It Fails” Focus

The drives are marketed with an advertised metric around 2.5 million hours MTTF.

However, the speaker emphasizes what matters operationally:

What happens during failure in real deployments, especially in arrays where recovery can dominate downtime and risk.

Napkin Math: How Long Full Writes / Overwrites Take

A rough calculation is used:

Assume about 200 MB/s sequential write
Writing/overwriting the entire surface of a 20TB SMR helium drive could take about a day (or close to it)
In larger multi-drive systems, recovery could mean multiple days

Data Protection and Cost vs Risk (Consumer vs Enterprise)

The video frames different priorities across environments:

Home / Consumer NAS

Long recovery times are described as painful and risky
The risk grows if backups aren’t truly comprehensive

Data Centers / Enterprise Buyers

Some buyers may prioritize reliability over “newer/bigger” drives
Example mentioned: Backblaze allegedly prefers lower-capacity bulk fleet drives (around ~4–6TB) because their experience suggested better overall cost vs reliability
The video references Backblaze’s published reliability reporting conceptually (with some details uncertain or potentially mixed)

SSHD (HDD + Small SSD Cache) Doesn’t Solve the Core Issue

The video clarifies SSHD as:

A standard hard drive with a small SSD cache for frequently accessed/active small data

But it won’t materially improve:

Writing the entire surface
Reading the entire surface

So the speaker argues SSHD is not a solution to the “big drive + full-surface operations take forever” problem.

The “Danger Case” Scenario: Failure During Rebuild/Restore

A concrete example is given:

Eight 20TB drives in a small enclosure

If one fails:

A large fraction (or all) of the data may need restoration
During rebuild/restore:
- the remaining drives must be read across large portions
- this can take over a day, depending on workload and parity/rebuild scheme

This is presented as where the “20TB isn’t what you want” argument becomes strongest.

Psychological/Usage Behavior Point: Bigger Drives Can Increase Damage

The video also points to a behavioral risk:

Large capacity can encourage users to store “everything” without good backups
That increases the real-world impact of a failure:
- bigger disk → bigger “oops”

Household Context: Responsiveness Matters Too

Non-technical examples reinforce the recovery/latency theme:

Home server behavior includes drive power management (e.g., drives “park” or sleep)
The speaker adjusted settings so drives don’t sleep, avoiding slow load behavior
They considered switching to an SSD-based approach but noted it can be expensive

Main Speakers / Sources Mentioned

Andre (referred to as Andre Arjun / Andre A in forum context)
Linus Tech Tips (LTT) creators as the primary host/speakers (including mention of “Dave” as a colleague/fellow host)