|Mark L Hill on dynamic prefix delegation can…|
|african mango tablet… on a farm of mules|
random things and thoughts … and bad ideas
Admittedly I’m a file sharer. I like keeping files available to the public that are rare and hard to find. In my opinion file sharing is the only way to prevent some media from vanishing into oblivion (think Star Wars pre-1997 LDs). My tool of choice for sharing files is amule connecting to the Kad network. I like Kad because it’s decentralized, easily accessible, allows concurrent sharing of a multitude of files and back in the days e-/amule were among the first p2p clients to properly support Unicode. Also other networks seem to have a lot more trouble with manipulated search queries.
Sharing a very large number of rare files with Kad will make you run into a little problem though. Emule and apparently amule too have a hard-coded limit of files getting published per day to prevent flooding peers. All files may be available for transfer requests so source exchange between peers still works. But if you’re the only source for a file on the whole network source exchange won’t be any help because usually there will be no other peer to exchange sources with. That means ultra rare files should stay published all the time for that one occasion every few months when someone does actually search for it else it will be playing dice for the file to be available at all. Obviously this problem is getting worse the more files you are sharing making the relative time frame files are getting published shorter and shorter. I came across this issue when I noticed my amule not having any uploads at all for hours despite sharing several thousands of files. Which is easily explained with amule only publishing rare files that never get requested for days.
What can be done to circumvent this issue? I could modify the publishing limits in the Kad code of amule but first I’m no coder and second publishing an abnormally large number of files might get you into the filter list of other peers. Reducing the number of shared files is no option either as I don’t want to decide which files deserve to be available and which don’t. The most lazy solution would be to just run several instances of amule each of them sharing around the maximum number of files published per day. Amule can be run as a headless daemon reducing CPU and memory usage quite a lot which actually makes this farm of mules feasible. So let’s get into the practical part how I set up this thing.
The first step was creating a client configuration that would serve as the blank and create separate configurations for all the instances. Using the regular amule client I connected to Kad, set up the defaults like security and appearance settings, column positions in the GUI etc. and most importantly activated external connection and set the password for connecting so I can use the amule remote GUI to control the instances later. To not get confused I decided to enumerate the single instances with Greek letters from alpha currently up to eta. Each instance got its own subdirectory for configuration ~/.aMule/alpha to ~/.aMule/eta. Also I created subdirectories inside TempDir and IncomingDir because I don’t like having them inside the configuration directory. The minimum changes to instance configuration so things don’t fall apart are basically just the network ports. Connecting amule to the Kad network only needs one TCP and one UDP port which can be the same number and one additional port for the remote GUI connection. Obviously the ports have to be unique for each instance. So I copied the amule.conf, remote.conf and nodes.dat from the blank to each instance’s subdirectory and adjusted the respective options inside both amule.conf and remote.conf. Port and UDPPort in the [eMule] section and again Port in the [EC] section. I also corrected the TempDir and IncomingDir options to match each instance’s directory structure. Changing the nickname of the instances isn’t necessary as clients can share names but I decided to append the instance name to my client nickname. After that I got seven independent client configurations with the same defaults.
Starting up the actual clients with their corresponding configuration was next. This wasn’t hard because you can define the path to the configuration directory with the -c option for both the daemon and the remote GUI. Launching the whole farm just needs calling amuled -f seven times each time appended with the configuration directory of the instance to start. This could be done with just around two lines of shell script. To stop the farm I just do a pkill -15 amuled. I know this is not the elegant way to do it but this is no system service I need the system to keep track of. Because the target host and correct port to connect to are saved inside remote.conf in each instance directory calling amulegui and specifying the instance’s configuration directory with the -c option will connect the GUI to the correct instance. I also added the -s option to the application commands so it won’t ask for the password every time. After I set up the shared directories inside the instance’s remote GUIs I had seven running clients all with different shared files getting published to Kad independently. As you can see the memory and CPU usage of the whole farm is still manageable.
If you kept reading so far you will probably have one question now: What about the traffic? Independent clients won’t share their upload limits normally. So either I waste upload capacity by configuring each instance’s limit so the total limit matches my connection or I could just choke my upload. I want neither and the solution is to ignore the upload limits inside the clients completely and set up traffic shaping outside of amule that limits the accumulated traffic of all clients. For that I have to filter all traffic originating from the farm which means I need a way to tell for each outgoing packet from what program or group of programs it was sent. I bet this would be a prime example for using cgroups but there’s a much more simple and lazy way to do it. Yay! On to the actually interesting part of this whole post.
Filtering network packets on Linux is done with netfilter which is configured with iptables (or nftables if you’re into bleeding edge). That should be a no brainer. Looking through the available matches I found the owner match which fits the job perfectly. Sadly most of its options don’t work on SMP systems so no matching for process name or id on almost any modern PC. One option still works on SMP systems though, and as it turns out later even the most useful in my opinion, the option to match for group id. With that I could match for packets coming from programs running in a specific group for example the group p2p. To be able to match traffic of the amule instances I only had to make them run with the group id of the p2p group. It’s probably well-known that running programs with a different user id can be done with sudo but it can also be used to change the group id of a program using the option -g. An important thing to check though are the permissions for running sudo. My distribution of choice had this line in /etc/sudoers: %wheel ALL=(ALL) ALL. This setting allows users in the group wheel to run ALL programs while switching to ALL user ids but it omits the setting for groups and trying to switch group id with sudo will throw some not very straightforward error messages. Changing the line to %wheel ALL=(ALL:ALL) ALL fixed that issue. It only took little change to the startup script to start the whole farm with the proper group id. The final script looked something like this.
#!/bin/bash for instance in alpha beta gamma delta epsilon zeta eta do sudo -g p2p amuled -f -c ~/.aMule/$instance/ || echo "Starting $instance failed" done
With all the amule clients are running in the group p2p using a single owner match is enough to filter all of their traffic. If I was going to do the shaping on the same computer running the farm I could just match the packets with iptables, set a filter mark on them and then shape traffic for the marked packets. But I want to do all shaping and queuing on the actual bottleneck of my Internet connection: the router. Neither process information nor the filter marks are transported inside an IP packet and will be lost once it reaches the router. There’s the TOS field though which is intended to carry information about packets across hosts to enable routers to treat them according to the service’s specific needs. Changes to the packet header with netfilter are done in the mangle table so I created a rule there to match the group p2p and change the TOS for matched packets to the nonstandard value 255 or 0xff. On the router I added a rule in the PREROUTING chain of the mangle table to match those packets with TOS 0xff and set a netfilter mark on them for the traffic shaping. To play nice I added another rule to change the TOS field back to the standard value 8 or 0x08 which was defined as “maximize throughput”. For reference these are the specific rules I created.
Host: iptables -t mangle -A POSTROUTING -m owner --gid-owner 994 -j TOS --set-tos 0xff/0xff Router: iptables -t mangle -A PREROUTING -m tos --tos 0xff/0xff -j MARK --set-xmark 0x29/0xffffffff iptables -t mangle -A PREROUTING -m tos --tos 0xff/0xff -j TOS --set-tos 0x08/0xff
While writing this post I noticed the information I used as a reference on the Linux Advanced Routing & Traffic Control HOWTO appears to be outdated. I will probably change the TOS field to 0 instead of 8 in the future.
That done every packet sent by an instance of the farm will have the same filter mark on the router and can be properly shaped however I like. I won’t go into setting up the traffic shaping itself because this would probably fill another blog post and there are other references already available on the net. The advantage of filtering by group is that applying the same limit to other programs just needs prepending “sudo -g p2p” to the command. I am now running all my different peer-to-peer programs in the p2p group having one global upload limit for all programs. Also changing just the group id of a program means it still runs with your own user id. That way you don’t have to fiddle with local file permissions because your user’s permissions still apply.
Obviously there is still the question if running multiple clients on the same host is harmful to the Kad network. As I understand it each client should have its own ID in the DHT and with that also a somewhat different neighborhood of peers. So it shouldn’t be harmful at all just the Kad network getting more peers. The mechanisms protecting against sybil attacks implemented in recent e-/amule are likely to backfire against myself though. Because recent versions seem to limit the number of connected Kad peers to one per IP address theoretically my instances should be able to reach less Kad peers out there the more instances I run. But I haven’t seen a significant drop in connected Kad peers so far maybe because there are still lot’s of ancient versions of emule running. Also I think there won’t be any noticeable drop in connections until I run several times the instances I am running now.
On the general level sadly I feel like the Kad network is beyond its prime. Cracking down on file sharers has driven many normal users to one click hosters and streaming portals. On the other hand it is pretty dated and I doubt it will ever see an upgrade to support IPv6. There are already some ISPs that only offer IPv6 connections with CGN IPv4 minimizing the use of their limited number of available v4 addresses and lack of IPv6 support will become more and more of a problem in the next years. So I am currently looking for a peer-to-peer network/client that is aimed at sharing lots of files, is as reliable as Kad and up to the current technical standards.
Anyone who can add something to one of these issues feel free to comment.