[Update] I did it. Read about it here.

I do like Windows Home Server Drive Extender technology. It allows oneself to aggregate a bunch of various-sized hard disks into one big storage pool. You can then define a number of shares that will share (!) that storage pool dynamically. AND you can define which shares you want duplicated, which basically mean that DE (Drive Extender) will make sure that all the files on there are on two different physical hard drives, to safeguard those files against hardware failures.

What I don’t like about WHS is that it comes with a lot of stuff I don’t need or want, like a Windows 2003 kernel, a Windows-only backup system, Video/Music/Photos sharing, etc.

I’ve been searching for quite a while to find a similar system that would fit my needs, and never found one. Then yesterday, I had an idea on how I could implement such a system myself.

Here’s how I’ll try to implement WHS Drive Extender-like functionality. I’m posting this expecting comments and feedback on the choices I made below, so feel free to leave comments about anything. Thanks!

  • I started with a clean Ubuntu 9.10.
  • Using the Samba log, and the extd_audit VFS module that comes with Samba, monitor all file activity on the SMB shares: file writes, deletes & renames and directory creates, deletes and renames.
  • Enable follow symlinks in smb.conf
  • Each minute (using cron), a parser starts, and reads the Samba log (starting where it stopped last time it ran - saved in a _last_read_ variable in the database), and for each extd_audit entry it finds in the log, it inserts a task in a database. When it’s done, update the _last_read_ variable.
  • A tasks executer runs in permanence, and when the I/O on the server is not over a specific threshold, will start executing the pending tasks. Executing a tasks means something different, depending on what was logged:

    • New files: pick X random drives (as defined in the config of the current share), and copy the new file on all those drives, then create a symlink to one of those copies in the actual share.
    • Existing files changed: since a symlink already exists, one copy of the file is already up to date; we just need to find the other copies, and copy the newly changed file over those old files.
    • Renamed files/directories: (Only the symlink has been renamed.) Find all copies of the file/directory, and rename all of them.
    • Delete files/directories: (Symlink is already deleted.) Find all copies of the file/directory, and delete all of them.
  • Executed tasks will be removed from the database (and archived somewhere else).

  • Every 10 seconds or so, the executer daemon will check again to see if the I/O of the server is too high, and if so, will pause. Obviously, it should ignore it’s own I/O, since that doesn’t really count as server-busyness! This might be a tricky part.
  • Set _max log size = 0 _in smb.conf, and rotate the Samba log manually: rotate the actual log file, then immediately update the _last_read_ value that the parser uses to 0.
  • When a hard-drisk goes missing, a process will walk through the shares, and find all symlinks that point to that drive. All those symlinks will be changed to point to another copy of the file, if one is available. Another copy of the file should be made on another available hard drive, to keep the files safe. Not sure yet if this process should be manually triggered, or automatically triggered when a drive is missing for more than X minutes…
    So that’s it. I think that pretty much duplicate the only features of WHS I use: files on shares that a are important will be available on 2+ physical drives, and the combined storage space of all the hard drives will be available for any share.

The only downside of all this is that the reported free space of the shares will be the free space on the landing zone, not the actual free space of the complete storage pool. Might find a solution for this at some point… Maybe it’s possible to create a Samba VFS module to handle this..?
[Update] Indeed; you can specify, in smb.conf, an external command that will be used to query the free and total space on the specified share.

So again, feedback on all this would be welcome.

/me continues implementation now