The traceroute@home project studies the implications of highly distributed internet topology discovery infrastructures. If large numbers of route tracing monitors are deployed throughout the network, what will they discover that smaller systems do not? And how can these monitors coordinate their efforts in order to be highly effective while at the same time avoiding undue load on network resources?
A part of the traceroute@home project is to find a way to build a highly distributed measurement tool for tracing routes in the internet.
Today's most extensive tracing system, skitter, uses 24 monitors, each targeting on the order of one million destinations. Other well known systems, such as RIPE NCC TTM and NLANR AMP, conduct a full mesh of traceroutes between the order of one- to two-hundred monitors. An attempt to scale either of these approaches to thousands of monitors raise several major open questions. First, the measurement system demands attention to efficiency, in order to avoid duplication of effort. It must work hard to avoid sampling router interfaces and traversing links multiple times and to avoid pings of end systems. However, we recognize that some repeated sampling is valuable for detecting topology changes. Second, in order to avoid abuse, the system must be engineered carefully. Indeed, traceroutes converging on selected targets can easily appear to be a Distributed Denial of Service (DDoS) attack.
The Doubletree algorithm is a first attempt to efficiently perform large-scale topology discovery and in a network friendly manner. Doubletree acts to avoid retracing the same routes in the internet by taking advantage of the tree-like structure of routes fanning out from a source or converging on a destination. The key to Doubletree is that monitors share information regarding the paths that they have explored. If one monitor has already probed a given path to a destination then another monitor should avoid that path. Probing in this manner can significantly reduce load on routers and destinations while maintaining high node and link coverage. By avoiding redundancy, not only is Doubletree able to reduce the load on the network but it also allows one to probe the network more frequently. This makes it possible to better capture network dynamicity (routing changes, load balancing) compared to standard approaches based on traceroute.
Doubletree has been evaluated by simulations. It has been shown that compared to classic probing, Doubletree is able to reduce measurement load by approximately 70% while maintaining interface and link coverage above 90%.
We also implement the Doubletree algorithm in a Java prototype we call traceroute@home. traceroute@home is freely available in the download section. traceroute@home has been deployed and evaluated on the PlanetLab testbed. We showed that the performances of the algorithm are good and we also confirmed the simulation results.
One of the main aspects we would like to address in the near future is the stability of the whole system. Currently, it is like dominos: when a traceroute@home monitor fails, the whole system fails.
From a long term point of view, we aim to develop a peer-to-peer (p2p) or overlay application to manage the whole system. However, it is not yet obvious how this p2p/overlay should work. We need a transitional solution. A centralized solution seems to us to be the right choice. This centralized solution should
allow one to manage the dynamicity of the structure. Monitors could join or leave the structure at any time without disturbing the whole system.
allow one to manage failure recovery.
allow one to balance the probing load between monitors.
We are currently working on that next traceroute@home version.