Apr
20
2009

BitTorrent-based Backup

This post is really just a reminder for myself to investigate this idea when I have time. I’d really like to have a go at this project if I ever get the chance.

What I’d love to have is a distributed file-system that’s based on some P2P protocol (like BitTorrent) where files are broken up into blocks, and those blocks are sent to different computers (storage nodes). One computer (tracker) will keep an index of where all these are. AFAIK, BitTorrent is not a great protocol to use for read/write filesystems, but my interest is in a write-once FS.

Ideally there would be some FUSE-based front-end so that a user could mount a folder to provide read/write access to the FS. When a computer (a client) wants to put a file into this distributed file-system, they drag-and-drop it into the folder and the application breaks it up into blocks, asks the tracker where to send them, and shoots them across to different storage nodes. To get a file, the user can open it or copy it to the local disk, and the application will read the blocks from whatever storage nodes have them.

The tracker is the first obvious point of failure here. I guess each node could function as a tracker and they could synchronise data among themselves. The tracker(s) would have all the responsibility, as the storage nodes would just receive, store, and send blocks, and the clients would just read blocks. If there is any level of redundancy, the tracker is responsible for instructing nodes to send blocks back and forth to each other to the point that the level of redundancy required is satisfied. I don’t think I could afford to have redundancy in my home network, and if that’s the case, then I’d like to have a “decommission” operation available. This would remove a storage node from the system. The node would transfer all its blocks to other machines and then cleanly remove itself once it is no longer needed.

Based on my googling this afternoon there seem to be quite a few distributed filesystems available, including some older ones like GFS, some inept ones like Microsoft’s DFS, some “community-based” online ones like wua.la, and some awesome-but-not-ready ones like Ceph.

What I really want is one that is:

  • Multi-platform: I want to use every machine in the house so Windows XP/Vista/OS X Tiger+Leopard/Ubuntu Linux.
  • Local: I don’t need/want to communicate outside my LAN. It’s slow and, to a lesser extent, means I need to think about security. I also lose some level of control.
  • Free: I’m really really cheap.
  • Lightweight: I don’t want to have to install Active Directory or Linux or anything. Coda is almost entirely in Python and doesn’t need to be installed—nice!
  • Compatible with commodity hardware: I just want to use the hardware available. That’s an ethernet network, consumer-level PCs/Macs with consumer-level operating systems, and a couple of hard drives in each.

I’d really love to get hacking on this…

Written by in: Uncategorized |

4 Comments »

RSS feed for comments on this post. TrackBack URL


Leave a Reply

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com