[Update] I did it. Read about it here.
I do like Windows Home Server Drive Extender technology. It allows oneself to aggregate a bunch of various-sized hard disks into one big storage pool. You can then define a number of shares that will share (!) that storage pool dynamically. AND you can define which shares you want duplicated, which basically mean that DE (Drive Extender) will make sure that all the files on there are on two different physical hard drives, to safeguard those files against hardware failures.
What I don’t like about WHS is that it comes with a lot of stuff I don’t need or want, like a Windows 2003 kernel, a Windows-only backup system, Video/Music/Photos sharing, etc.
I’ve been searching for quite a while to find a similar system that would fit my needs, and never found one. Then yesterday, I had an idea on how I could implement such a system myself.
Here’s how I’ll try to implement WHS Drive Extender-like functionality. I’m posting this expecting comments and feedback on the choices I made below, so feel free to leave comments about anything. Thanks!
- I started with a clean Ubuntu 9.10.
- Using the Samba log, and the extd_audit VFS module that comes with Samba, monitor all file activity on the SMB shares: file writes, deletes & renames and directory creates, deletes and renames.
- Enable follow symlinks in smb.conf
- Each minute (using cron), a parser starts, and reads the Samba log (starting where it stopped last time it ran – saved in a last_read variable in the database), and for each extd_audit entry it finds in the log, it inserts a task in a database. When it’s done, update the last_read variable.
- A tasks executer runs in permanence, and when the I/O on the server is not over a specific threshold, will start executing the pending tasks. Executing a tasks means something different, depending on what was logged:
- New files: pick X random drives (as defined in the config of the current share), and copy the new file on all those drives, then create a symlink to one of those copies in the actual share.
- Existing files changed: since a symlink already exists, one copy of the file is already up to date; we just need to find the other copies, and copy the newly changed file over those old files.
- Renamed files/directories: (Only the symlink has been renamed.) Find all copies of the file/directory, and rename all of them.
- Delete files/directories: (Symlink is already deleted.) Find all copies of the file/directory, and delete all of them.
- Executed tasks will be removed from the database (and archived somewhere else).
- Every 10 seconds or so, the executer daemon will check again to see if the I/O of the server is too high, and if so, will pause. Obviously, it should ignore it’s own I/O, since that doesn’t really count as server-busyness! This might be a tricky part.
- Set max log size = 0 in smb.conf, and rotate the Samba log manually: rotate the actual log file, then immediately update the last_read value that the parser uses to 0.
- When a hard-drisk goes missing, a process will walk through the shares, and find all symlinks that point to that drive. All those symlinks will be changed to point to another copy of the file, if one is available. Another copy of the file should be made on another available hard drive, to keep the files safe. Not sure yet if this process should be manually triggered, or automatically triggered when a drive is missing for more than X minutes…
So that’s it. I think that pretty much duplicate the only features of WHS I use: files on shares that a are important will be available on 2+ physical drives, and the combined storage space of all the hard drives will be available for any share.
The only downside of all this is that the reported free space of the shares will be the free space on the landing zone, not the actual free space of the complete storage pool. Might find a solution for this at some point… Maybe it’s possible to create a Samba VFS module to handle this..?
[Update] Indeed; you can specify, in smb.conf, an external command that will be used to query the free and total space on the specified share.
So again, feedback on all this would be welcome.
/me continues implementation now
loading...
I’d also thought of implementing something similar, but instead writing a simple semi-fake Drive Extender file system driver instead of using Samba, so that it would be directly accessible instead of needing to access via a file share (like Drive Extender and your method). And Samba would be able to make use of it as a regular mount point.
* It would have its own two trees of symlinks stored somewhere.
* The symlink in the first tree for the primary copy of the file. The symlink in the second tree to the backup file. This would make finding the backup copy much easier when you have more drives.
* File read/write/deletes are passed through to drive/filesystem/mountpoint specified in the symlink tree. Results are piped back through the Drive Extender driver to the requester.
* Creates would select a drive/filesystem/mountpoint based on some criteria, create the symlink, and then pass the create command to the destination. Windows Drive Extender tries to create new files on the same drive as other files in the same directory. (Read the Balancing Storage section of the Windows Home Server Technical Brief – Drive Extender.docx for the reasoning.)
* From here it looks similar to your method. Any modifications would be logged in a database to be duplicated to the backup.
* A cron job would periodically sort the database by file and time, condense commands, then update the backup at the lowest I/O priority. IE, if a file is modified 10 times, you just need to copy the primary to the backup once. If the file is modified 10 times and then deleted, then just delete the backup. If the file is modified 10 times, deleted, then a new one created in place, just copy the primary to the backup, etc.
* Only remove entries from the database that you condensed and performed on the backup copies.
* If a drive goes missing, step through every file on both trees to see if they still exist, and make another copy if appropriate.
Possible optimizations:
* Hash file path/names for a field in the database as fixed field sizes are faster for indexing/sorting.
* Perform deletes first because they are so fast and will free up space. Then perform creates as having an out of date backup of a file is better than having no update at all. Finally, perform copies.
* Store the symlink trees in a virtual file system file. Since you’re only storing symlinks, you don’t need a lot of space, you can store the whole file system in RAM, and you can select a file system that is specifically fast at sorting lots of small files (symlinks). VFS described here: http://freshmeat.net/articles/virtual-filesystem-building-a-linux-filesystem-from-an-ordinary-file
Obviously your method is much easier as you wouldn’t have to maintain your own filesystem interface. The only thing I think I would add to yours would be to maintain a second set of symlinks that point to the backups. This would remove the need to hunt for the backup file for updates and syncing on a drive removal or reboot.
Thanks for the feedback.
I started implementation yesterday, and should be done tomorrow, with some testing and bugfixes after that (all this only part time, since I do have a day job…)
My method is indeed more a hack than anything else, but I’m pretty sure it should work fine, and it doesn’t have too much overhead. Plus it has the positive of me not having to code in C!
About your suggestion: Instead of keeping a tree of symlinks to point to the backup files, I’ll have .tombstones files in each directory, containing information about files contained in that directory. One such information will be an array of all the copies of the file, one of which will be flagged ‘is_linked’, meaning it’s the one currently pointed to by the symlink. I’ll use those .tombstones data files when maintenance is required, or when I need to find all copies of a file to update it.
The .tombstones are a good idea. The benefit of using a second set of symlinks is they could be browsed directly, although I’m not entirely sure how useful that would be. I’m very interested in how well it ends up working for you though.
The .tombstones weren’t such a good idea after all. Since I don’t really control the file system, the client was able to remove that file using rm!
So I started saving the metadata I needed in a separate mirror tree, which the client can’t see. I’m now using the original filenames as tombstones, instead of saving all the tombstones for the files in a directory in a single file; easier to work with that way too.
Since I want to keep more info than just the other copies of the file (I want to keep ‘online’ flags for example), I can’t just use symlinks in there.
I also needed to change how I select drives to receive the file copies; the drives with the most free space are used first, instead of random drives. Duh!
And I realized that I needed hard links, not soft links, for Samba to be happy. Trying to delete a soft link didn’t work, where deleting a regular file, or a hard link, worked fine. Not sure if this expected behaviour, but hard links are fine, I guess, since I do keep the info about the links targets in my ‘graveyard’.
I’m almost done now. File/directory creation, modification & deletion all work, and I tried to catch most cases of the ‘create-modify-delete-recreate’ kind. This is where I’ll need to make most of my tests & bugfixes, I’m sure.
I just posted the first public version of Greyhole – Easily expandable & redundant storage pool using Samba.
Read about that here:
http://www.pommepause.com/blog/2009/12/greyhole-easily-expandable-redundant-storage-pool-using-samba/
You should look into unRaid (http://www.lime-technology.com). You can put up to 20 drives in it (pro version), mix and match IDE and Sata and whatever sizes you want. It also protects against a single drive failure. You do not even need a system disk, just a flash drive to run it. Alot better than DE if you ask me, but that’s just me.
Good luck with your project.
I did look at unRaid before I started this.
But I really wanted to be able to choose the redundancy level per share.
I have a lot of recorded TV that I don’t care that much about, so I don’t protect that.
I don’t protect CrashPlan and TimeMachine backups either, as those can easily be re-created if a hard drive fails and take some files with it.
Having redundancy on those would cost me 2-3 TB more than what I currently have.
Plus, with my own system, I can now ’super-duplicate’ certain shares, like the Photos share, which contains 60k+ photos we took of our family since 2000. I really don’t want to loose those, so Greyhole now makes sure those files are on all my hard drives.
I can unplug one of them, and mount it anywhere ext3 partitions can be mounted, and I’ll be able to see my files without requiring any rebuilding / processing.
One of the huge benefits of Drive Extender is I can take a single drive out of a 20 drive array and read all of the files off of it by just connecting it to another computer. It is also possible to lose multiple drives and not lose any files. Granted, it’s not terribly space efficient, but it is dead simple.