So, I’m now happily using Greyhole. Good for me, you say?
Not long ago, a 1 TB hard drive that was part of my storage pool died (my fault really, handling it while it was powered up). Greyhole handled this beautifully, re-creating duplicate copies of the files that were stored on that drive to continue protecting all my data. But I didn’t have enough free space on the other drives to allow all the duplicates I want to be created.
Perfect timing to test a very nice feature of Greyhole: inclusion of remote hard drives in the storage pool.
I have a 1 TB hard drive attached to my Airport Extreme router, that I use as my Time Machine backup destination (the Airport makes it available through AFP and Samba). It had about 600 GB free. Perfect candidate for this.
I simply mounted that drive on my file server, and included it in my Greyhole storage pool. I then launched “greyhole –balance” to force Greyhole to balance the available space evenly on all drives. Files transferred at about 5MB/s from my file server to the remote drive, so I had to wait a couple of hours for the 600GB to get filled.
I now have about 10-12 GB free on all the drives included in my storage pool, and all my files are correctly protected once more.
Further thinking revealed an interesting use of such remote hard drive in a Greyhole storage pool. Since remote access is much slower than local access, it wouldn’t make much sense to keep a remote drive in my pool forever, since I do care about performance. But, for some files, performance is not an issue. For example, for my Photos share, I keep a copy of each file on all available drives in my storage pool (I do care about those files!) A remote drive could be used to store a copy of those files, and nothing else. The trick to achieve this is to simply indicate a very high number as the minimum free space for that drive in the Greyhole configuration.
With such a configuration, the remote drive will only be used as a last resort choice when Greyhole chooses where a file copy should be kept. And, minimum free space will be ignored in the case of files that needs to go on all drives.
What this means is that the remote drive will be used to store a copy of the files in my Photos share, and it will be used to store file copies on other shares only if all other hard drives are filled to capacity. Which is nice.
My important files are now backed up remotely (well, in the next room is remote to the file server!), plus if all my fast drives get filled, this slower option will be used until I can free up some space (by adding another internal drive, most likely).
How cool is that? Very cool I think. I don’t know any other pooling / redundancy system that would allow you to do something like that with such ease! ![]()
I’m glad to be using Greyhole right now. And you?

Instead of using the smb logs maybe you could use inotify? It will give you a nice way of getting all writes, closes deletes etc. Also, what do you think about storing vmware disk images on this system?
Thanks for the hard work. I’m setting up a system now to test.
We may be able to contribute some code if you want to look into inotify.
I’ve looked into inotify. The layer I used wasn’t sending me the file operations in the same order they were being executed. The order is very important for Greyhole, so that didn’t work. But I think this was caused by the specific layer implementation I used (was a cron-like thing). I could try anyther inotify layer for PHP, but don’t have the bandwidth at this time. I’ll be happy to integrate any patch I receive for this.
Storing VMware disk images: big files that change often are not cool on Greyhole, with replication turned on (num_copies > 1).
Each time a big file changes, it’s additional copies are deleted, and re-created on the partition with the most available space.
But if you use num_copies = 1 for your share that has those files, which will allow you use the drive pooling of Greyhole, then that would be fine.
Good luck.
inotify will give you changes as they happen in the order they happen and will allow you to kick off a process for each operation. I’ll look to see if we can release some code we use for that. We also have inotify code which will skip files if they are updated many times it will only kick off a rsync for the last change.
Instead of deleting and copying files you should look into rsync, it will copy files locally and only update bits that change. It may yield a huge increase in speed.
I understand you shouldn’t duplicate files until they close (inotify code) but in the case of vmware it could work nicely if you could schedule a FULL sync of the files every night or every hour etc for a certain directory. You can contact me via my email to discuss it further.
All of our code is PHP of course.
I seem to remember I hit a blocker when I tried inotify last… I think I realized something at the end that would prevent it from working, even if the events were received correctly… I can’t remember it at the moment… I’ll see if I can find a log of me detailing my findings somewhere!
Skipping file operations is risky. Especially when renames, deletes and writes can occur very fast. I already have some code that tries to minimize Greyhole’s workload, but I’m not sure it’s bulletproof.
I was using rsync during development, but I stopped using it when changed file copies became a tool to balance the data on the available drives. At the moment, for a file that change and that has multiple copies on Greyhole, I’ll delete the supplemental copies, and pick new partitions to receive the up to date copy. This way, the drive with the most available space is always use for new copies, for both existing and new files on your shares.
I’m not sure this is still necessary. I’ll think so more about that, and if I can’t find a reason to continue doing that, I’ll bring back rsync. Probably as an option in the config (probably enabled by default).
For your VMware full sync suggestion: I’ll think what I’ll implement is a config option to ‘freeze’ specific directories. File operations will always be logged for those, like every other directories, but Greyhole won’t process those logs until you explicitly ask it to (via a cron job, for example). This would also allow you to stop a service while Greyhole works on specific files, and start it back up when Greyhole is done. Would allow a user to keep MySQL data files on greyhole, for example.
Would that fit your need for your WMware data files?
I re-implemented the use of rsync in rev109.
See the changes I made in greyhole.example.conf for the details.
This will be included in the upcoming version 0.6.5.
Ah, I just remembered why inotify wouldn’t work!
Samba, with “unix extensions = off”, transform symlinks into real files. That would be missing in a inotify/local implementation.
If you were suggesting that I still use Samba for file access, but inotify instead of syslogd to log file operations, then I’ll answer that I don’t see the point in changing something that works. And using a Samba VFS module to write to syslogd is the fastest and most safe way to log file operations that happen on Samba shares.
Allowing a sync to run from a cronjob would be great.
If its not broke, don’t fix it
Thanks for implementing rsync.
I dont think Ive ever followed up on this…
This has been implemented as a new frozen directories option in the config file.
Cheers for the idea.
Why not prioritize files that have been around the longest without change for remote storage? If you have 10 year old files, then logically those are the files least likely to change. And files that don’t change are the best candidate for remote storage as they won’t have to be recopied on change. This would also save the trouble of manually specifying directories you think would be best on remote storage.
Also, do you specify to prefer local storage over remote storage for reading?
And do you prevent both copies of a file from being in remote storage?
> Why not prioritize files that have been around the longest without change for remote storage? If you have 10 year old files, then logically those are the files least likely to change. And files that don’t change are the best candidate for remote storage as they won’t have to be recopied on change. This would also save the trouble of manually specifying directories you think would be best on remote storage.
Currently, there is no options to specify what to send to remote shares. The only thing you can do is specify a very high “minimum free space” for remote shares, so only critical files would be sent there.
Last modified date would be a good criteria to use during balancing, but would require Greyhole to walk all the shared directories twice. Will see if this is worth it.
> Also, do you specify to prefer local storage over remote storage for reading?
No. Would indeed be a good logic to add.
> And do you prevent both copies of a file from being in remote storage?
No. Might be a good idea to try to keep at least one copy locally.
But until Samba developers fix a bug I reported, I won’t work on remote filesystems anymore.
https://bugzilla.samba.org/show_bug.cgi?id=7293
This bug prevents samba from being able to read the target of a symlink when that target is on a remote share… This makes remote shares on Greyhole unusable; writing works fine, but you can’t read any such files through a Greyhole share. (You can read it if you’re on the server itself, so you won’t loose anything; those files just won’t be readable through shares.)
Just want to say this project is amazing, I have been playing with WHS and while it is really neat and simple I would rather be using linux for my server needs simply because I can do so much more with it. I have not had the chance to try greyhole, I am in the works of that currently just getting hard drives situated. This was the biggest thing I loved about WHS and cannot wait to get it running on my server.
Glad it could be of use!
FYI, Amahi is working on integrating Greyhole into Amahi. That will mean much easier configuration of Greyhole.
Greyhole integration will appear in version 5.3, currently being tested.
If youd like to try Amahi, Im pretty sure once you got the rest setup, v5.3 will be released and youll be able to easily setup and test Greyhole using that.
Cheers.
Hi
Are there any Greyhole installation instructions/documentation
I am a bit of a newbie
Peter
There is nothing more than the couple of blog posts here, and the README file that comes with Greyhole.
But it’s nothing for unexperienced users.
I suggest you look into Amahi, which just released a version that includes Greyhole.
It puts an easy to use dashboard over Fedora 12, which allows you to one-click install many applications.
And their Greyhole integration offers an easy way for users to setup Greyhole.
Best of luck.
Hi
I have just started up Greyhole on my new Amahi server (on an old compaq desktop) with 2 * .5TB disks in a HDD toaster connected to the server using USB. .I have had lots of problems (not so much with greyhole) but now things are starting to behave. (The Amahi documentation makes lots of assumptions which is unhelpfull)
Greyhole is not a backup system. OK. But it can get pretty close I think. One improvement that would be nice is to keep half the storage pool off-site to protect against fire.
Greyhole would have to make sure that duplicates are created on the offsite disk
Good idea?
Peter
I’m still considering the options on how to handle remote drives correctly.
At this time, you should be able to add a remote mount in your GH storage pool (see samba issue posted above), and if you want that drive used first (until it’s full), you can define it as the sticky drive for any/all your shares:
sticky_files = Music/stick_into = /mnt/remote_mount/gh
sticky_files = Movies/
stick_into = /mnt/remote_mount/gh
[...]
This will ask GH to always use the ‘stick_into’ drive first, then any other drive for extra copies, if any.
But Amahi won’t allow you to customize the greyhole.conf file that way without overwriting it every time it rewrites the conf file itself. You might want to take that to them.
Also, as stated in a previous comment, there’s nothing preventing GH from using the remote copy of a file as it’s ‘live’ copy, making read and write to that file pretty slow.
With about 16 drives on my WHS as of now I dont think I would set any of my files to be stored on all drives, but if it could be implemented that any copies of files above 1(or maybe 2) would be preferably copied to remote drives, this would be a more interresting option.
The real killer feature Im looking for is JBOD with some feature that makes the redudant copies take less space than the complete files. Something equivalent of the extra disk usage in a Raid 5/6 set-up. Even something as easy as every redudant copy(maybe above 1 or 2) beeing autocompressed with a good compression algorithm would be sufficient, I think. 
As you might understand Im looking into alternatives to my WHS.
Thats just a few of my wishes for solutions like yours.
Hi,
At this time, I have no plans for implementing any kind of compression mechanism.
There’s many reasons for that:
- disk space is cheap enough;
- it would complicate manual recovery from the data files, which would require manual or automated uncompressing;
- your files couldn’t be accessed directly in the data directory (not something I recommend, but it’s sometimes useful);
- it would slow down the Greyhole daemon processing new / changed files;
- etc.
There are probably existing file systems that you could use that would handle the compression by itself (ZFS has a compression option). Using such file systems on partitions you include in your storage pool should satisfy your requirement.
As for remote partitions, I’m still waiting for Samba to fix the bug that prevents us from using remote SMB mounts in our storage pools. When that’s fixed, I’ll probably implement a system that will allow you to ‘tag’ the different partitions you include in your storage pool, and then, allow you to configure the order in which tags should be used (per share, with a default).
So you could create ‘remote’ and ‘local’ tags, then specify that for share XYZ, for which you want 3 file copies, you want to use: local, local, remote. GH would then pick two partitions tagged ‘local’ to store the first two copies, and one tagged remote for the last copy.
regarding the remote mounts, does this work if NFS is used to mount the remote filesystem rather than samba?
I haven’t tried. Let me know if you try.
On a positive note, a developer from Samba contacted me, and the bug report I filed with them regarding this should be looked at next week.
why does the install have to be sooo difficult?
i wish i could just click to install it like with Windows
Maybe at some point it will be like this. But that will require quite a lot of funding.
I wish I could get greyhole to work. All I get is an error about trying to create metadata. Had a look at the source code, but I can’t see why it is having a problem, although the documentation is very lacking in regard to what create permissions are appropriate for the shares and mounted drives in the pool.
http://www.greyhole.net/
Support or Live Chat links.