PRODUCTgithub.com3 min read

Case Study: Recovery of a Corrupted 12 TB Btrfs Multi-Device Pool

By msedek

Case Study: Recovery of a Corrupted 12 TB Btrfs Multi-Device Pool

AI Summary

After a hard power cycle, a 12 TB multi-device Btrfs pool was severely corrupted, leaving the extent tree and free space tree in a state beyond native repair capabilities. The btrfs check --repair command entered an infinite loop, making no progress and rotating backup_roots slots past any pre-crash rollback point. Recovery was achieved using 14 custom C tools built against the internal btrfs-progs API, resulting in minimal data loss of 7.2 MB out of 4.59 TB. The case study outlines the environment, timeline, root cause, and proposes nine upstream changes that could prevent similar issues. These include improvements in progress detection, symmetric handling of delayed references, prechecks for sibling safety, and new rescue subcommands. The custom tools and a patch are shared for reference, not as direct upstream proposals, emphasizing the need for design discussions with subsystem experts.

Key Concepts

Btrfs Recovery

Btrfs recovery involves restoring data integrity and functionality to a Btrfs file system after corruption or failure. This process can include using built-in repair tools or custom solutions to address specific issues.

Custom Tool Development

Custom tool development involves creating specialized software tools tailored to address specific problems or enhance existing systems. These tools often extend or modify the functionality of existing software.

Category

Technology
M

Summarized by Mente

Save any article, video, or tweet. AI summarizes it, finds connections, and creates your to-do list.

Start free, no credit card