Benjamin Geer

Writing a Bittorrent client in Rust

Benjamin Geer
Table of Contents

Overview #

As my second exercise for learning Rust (after OrganicLox), I decided to implement a client for Bittorrent, the communication protocol for peer-to-peer file sharing. The Bittorrent clients I’ve used have complex GUIs that seem to be designed for users who download and upload torrents all day, rather than for the occasional user like me, who just wants to get one file. And like many GUI applications, Bittorrent clients tend to accumulate files that take up disk space in obscure directories. I do many things in the terminal, and when I need to download a file, my reflex is to use curl. So I thought, why not make something like curl for Bittorrent? Thinking about curl’s progress meter, it occurred to me that I could do something similar with the TUI library Ratatui, which I’d been wanting to try.

A Bittorrent client seemed like a good opportunity to learn about asynchronous programming in Rust. I had read a lot of criticism of Rust’s async, but I found it pleasant to work with (using the Tokio runtime), and certainly more comfortable than the comparable frameworks that I’d used in Scala (Akka and Cats Effect). I particularly like that an async function can be run in an existing task or in its own task, unlike Scala’s Future, which is assigned a thread and starts running as soon as it’s constructed. Rust’s approach seems more conducive to refactoring and maintainability. At the same time, I’m glad not to have to deal with the complexity of using monads as in Cats Effect.

The result is Sayaca, named after a South American bird that plays an important role in seed dispersal (in Bittorrent terminology, ‘seeding’ means uploading).

Design #

My first task was to implement a parser and formatter for Bencode, the data format used in the Bittorrent protocol. Conceptually it’s similar to JSON, but it’s designed for binary data rather than text. I wanted to try using a parser combinator library, and had no trouble implementing the parser using nom. Since I wanted to be able to serialise and deserialise Rust structs as Bencode, I added Serde support. I’ve published the resulting crate as bside.

The main difficulty in designing the client was the messy state of the Bittorrent specifications. There are ambiguities (the word ‘piece’ means two different things) and implementations have diverged from the specs (e.g. peer IDs are not used to identify peers). A great deal of knowledge about the protocol is not in the official specs, but in other documents scattered around the Web. There is a long list (not updated since 2018) of draft protocol extensions that are supported by many clients. A so-called encryption extension (which actually seems to describe a type of obfuscation) seems to be widely implemented, but as far as I can tell, since 2023, its specification has been preserved only in the Internet Archive’s Wayback Machine. There is a draft BitTorrent Protocol Specification v2 (not updated since 2017), which attempts to correct some of the original protocol’s flaws, such as its reliance on the SHA-1 hash function (which has not been considered secure since 2005). I could have wandered for a long time in a labyrinth of documents and in the source code of existing clients. To keep things simple, I decided to implement only the original BitTorrent Protocol Specification and the IPv6 tracker extension.

When saving data to disk, a Bittorrent client should perform well with large files, be able to interrupt a download and resume it later (which means that it has to note which pieces have been downloaded), and handle torrents that include multiple files. To meet these requirements, and to keep things transparent for the user, I chose to save downloaded data in an intermediate file (with the extension .sayaca), in the same directory where the completed file(s) will be saved. The intermediate file contains a bit field indicating which pieces have been downloaded (the same bit field that we have to send to peers anyway). Sayaca uses memory mapping to read and write this file; this should provide good I/O performance with large files. When the download is complete, it copies the data from the intermediate file into the destination file(s), and the intermediate file is deleted on exit if it is no longer needed.

Since a Bittorrent client is I/O-bound, I decided to use an event-driven architecture in which one task does most of the work, as in Redis. This main task delegates all the network I/O to other tasks, which it communicates with via channels. This was simple to implement, and had the advantage of making it easy to write unit tests for the main task.

Much of the work that a Bittorrent client does amounts to a sort of bookkeeping. In Sayaca, this job is the responsibility of the Bookkeeper. For example, in the recommended algorithm for selecting file pieces to request, the client tries to request the rarest pieces first. This requires a data structure that keeps track of the pieces that each peer has, as well has how rare each piece is, among other things. This can be done in different ways; I chose to use an ordered map, ordered by rarity, to store the IDs of pieces and the sets of peers that have them. The Bookkeeper can then easily find a piece, randomly chosen from among the rarest pieces, along with a randomly chosen peer that has that piece and is accepting requests from us. Thus, much of the client’s work involves updating the Bookkeeper when new information arrives and querying it to find out what to do next.

I like TUIs, and it was easy to make a simple one with Ratatui. But since the client is a library that communicates with its UI via channels, it would be straightforward to wrap the client in a different UI, e.g. a more accessible CLI.

Alternatives to this approach #

I chose to use Tokio because I wanted to learn about it and about async, but I think the program probably wouldn’t have been very different if I had used threads instead, with the same communication via channels.

Categories:
Topics: