I’ve been using it for a few years on my NAS for the all the data drives (with Snapraid for parity and data validation), and as the boot drive on a few SBCs that run various services. Also use it as the boot drive for my Linux desktop PC. So far no problems at all and I make heavy use of snapshots, I have also had various things like power outages that have shut down the various machines multiple times.
I’ve never used BTRFS raid so can’t speak to that, but in my personal experience I’ve found BTRFS and the snapshot system to be reliable.
Seems like most (all?) stories I hear about corruption and other problems are all from years ago when it was less stable (years before I started using it). Or maybe I just got lucky ¯\_(ツ)_/¯
BTRFS Raid10 can seemlessly cpmbole multiple raw disks without trying to match capacity.
Next time I just replace my 4T disk in my 5 disk Raid10 with 20T. Currently I have 4+8+8+16+20 disks.
MD raid does not do checksumming. Although I believe XFS is about to add support for it in the future.
I have had my BTRFS raid filesystem survive a lot during the past 14 years:
- burned power: no loss of data
- failed ram that started corrupting memory: after a little hack 1) BTRFS scrub saved most of data even though the situation got so bad kernel would crash in 10 minutes
- buggy pcie SATA extension card: I tried to add 6th disk, but noticed after fee million write errors to one disk that it just randomly stopped passing data through: no data corruption, although btrfs write error counters are in 10s of millions now
- 4 disk failures: I have only one original disk still running and it is showing a lot of bad sectors
1) one of the corrupted sectors was in the btrfs tree that contains the checksums for rest of the filesystem and both copies were broken. It prevented access to some 200 files. I patched the kernel to log the exact sector in addition to the expected and actual value. Turns our it was just a single bit flip. So I used hex editor to flip it back to correct value and got the files back
I don’t use BTRFS raid, I don’t actually use any RAID. I use SnapRAID which is really more of a parity system than real RAID.
I have a bunch of data disks that are formatted BTRFS, then 2 parity disks formatted using ext4 since they don’t require any BTRFS features. Then I use snapraid-btrfs which is a wrapper around SnapRAID to automatically generate BTRFS snapshots on the data disks when doing a SnapRAID sync.
Since the parity is file based, it’s best to use it with snapshots, so that’s the solution I went with. I’m sure you could also use LVM snapshots with ext4 or ZFS snapshots, but BTRFS with SnapRAID is well supported and I like how BTRFS snapshots/subvolumes works so I went with that. Also BTRFS has some nice features over ext4 like CoW and checksumming.
I considered regular RAID but I don’t need the bandwidth increase over single disks and I didn’t ever want the chance of losing a whole RAID pool. With my SnapRAID setup I can lose any 2 drives and not lose any data, and if I lose 3 drives, I only lose the data on any lost data drives, not all the data. Also it’s easy to add a single drive at a time as I need more space. That was my thought process when choosing it anyway and it’s worked for my use case (I don’t need much IOPS or bandwidth, just lots of cheap fairly resilient and easy to expand storage).
BTRFS raid is usage-aware, so a rebuild will not need to do a bit-for-bit copy of the entire disk, but only the parts that are actually in use. Also, because btrfs has data checksumming, it can detect read errors even when the disk reports a successful read (however, it will not verify the checksum during regular operation, only during scrub).
More flexibility in drives. Btrfs's RAID1, isn't actually RAID1 where everything is written to all the drives, but closer to RAID10 it writes all data to copies on 2 drives. So you can have a 1+2+3 TB drive in an array and still get 3TB of usable storage, or even 1+1+1+1+4. And you can add/remove single drives easily.
You can also set different RAID levels for metadata versus data, because the raid knows the difference. At some point in the future you might be able to set it per-file too.
I’ve never used BTRFS raid so can’t speak to that, but in my personal experience I’ve found BTRFS and the snapshot system to be reliable.
Seems like most (all?) stories I hear about corruption and other problems are all from years ago when it was less stable (years before I started using it). Or maybe I just got lucky ¯\_(ツ)_/¯