Panic! At the Filesystem

Replacing disks in a software RAID is a nerve-wracking, stress-inducing, ITIL-remembering, backups-checking activity. And I’ve done it six times in the past week.

First round was to replace the 5 disks in my FreeNAS box to upgrade the storage from 8TB to 16TB. Thankfully, ZFS is a stupendously rock-solid volume manager and filesystem. It has this amazing feature where you can plug a disk into a spare SATA port and tell ZFS to replace one of the current disks with the new one. Once it’s resilvered, you eject the offlined disk, put a fresh one in its place, and replace the next down the line. All this without losing parity.

Tonight’s round is to replace a failing parity disk in my 3-disk software RAID5 on a Linux box that I use as a hypervisor. When I moved the server and powered it up at my new apartment months ago, the SMART daemon started yelling at me every day that there are pending sectors on one of the disks (when you run systems for years without powerdown, weird problems hide until you powercycle). The only fix is to pull the disk and wipe, which will reallocate those 2 sectors.

Anyway, the Linux system is using mdadm to manage the volume, which itself contains an LVM volume, and the system partitions are virtual disks inside that volume (it’s complicated). Considering these are ancient 640GB disks that have seen a lot of spindle time (even before I rescued them from the reclaim heap), I’m a little freaked.

C’mon, 84 more minutes…

Eventually I’ll replace or rebuild this hypervisor on modern disks, but for now, I just need it to work. It hosts a few VMs that I find useful (and not essential, thankfully), but the Xen hypervisor is lagging in industry support, and the OS it runs on is so old, I can’t even dist-upgrade. Oops.

I really should take more risk and stay up on things, or at least use a hypervisor like KVM that allows live migrations of guests so I can stand up a spare box and then spot-upgrade servers (y’know, like real sysadmins do). Set-and-Forget is a piss-poor management scheme.

Anyway, as I always jokingly say: Sysadmin@Home is a game nobody wins.

Published by Shawn

He's just this guy, you know?