Andrew‎ > ‎Technology‎ > ‎Spillway Algorithm‎ > ‎

A Spillway Infrastructure for Defense Against Distributed Denial of Service Attacks

Andrew H. Barkley
December 2000
Computer Science
Texas A&M University


Abstract

Distributed Denial of Service (DDoS) attacks have proven to be disruptive forces on the Internet. There is currently no means of defense from such attacks, as they are exploitations of mechanisms that are critical for normal operation. In this thesis an active response technique that lessens the effects of such attacks is described, and a communication protocol infrastructure on which a distributed defense can be implemented is detailed. 




Introduction

Objective

A particularly pernicious type of Internet attack, known as Distributed Denial of Service (DDoS), became quite infamous in early 2000 when popular commercial web sites were rendered useless for days at a time [12]. The attackers found the Internet unprepared for this type of exploitation, and the matter of defense still has very little treatment by industry and the literature. Fortunately, such attacks can theoretically be defended, if not prevented, for current Internet protocols, but there is currently no infrastructure to allow for a coordinated defense. The difficulty in defense of a DDoS attack is that many Internet hosts bombard a particular resource with malicious requests, which are indistinguishable from authentic requests without great effort. In the process of such a crowding of a particular resource, neighboring resources tend to become congested as well, thus spreading the influence of the attack. These target resources are generally network bandwidth, RAM buffers and CPU time. The large mass of requests and their responses consume processing and network resources until the target resource is unavailable for regular use, whether by complete utilization of that resource, a bottleneck near that resource, or a system crash by overwhelming requests in the worst case. Most DDoS attacks rely simply on wasting the time of a resource with subtle inconveniences that add up, so the attacks are inherently survivable. However, the extreme burden caused by attacks has been known to temporarily disable systems.

An effective response to a DDoS attack must reduce the number of spurious requests that reach the target resource, thereby increasing the quality of service for legitimate requests and reducing the risk of system failure. Quality of service is an aggregation of various performance metrics, such as availability, throughput, packet loss and latency [10]. Availability is a measure of how often a system works, throughput is the rate of delivery of messages, packet loss is the rate of losing messages (messages are bundled in the form of a packet), and latency is the average time necessary to completely deliver a message. Basically, a good response to a DDoS attack would ensure that application services under the effects of a DDoS attack (directly and indirectly) are usable in adequate time.

Many DDoS attacks target network resources specifically, rather than application services, therefore denying availability of the network services normally reachable through that network. This is done by consuming available bandwidth (maximum throughput) of that network, so that authentic messages rarely get through. This condition is called network saturation. Under this form of attack, there is always a point when the incoming traffic rate beats the filtering performance, so there can be no single point of implementation for a defense of a resource that would improve its availability in the general case. The limiting of traffic must be implemented in a distributed fashion, so that neither filtering performance nor network utilization can bottleneck authentic requests for a resource. A packet filter solution for DDoS traffic is a double-edged sword that will block some authentic requests as well as the malicious ones, because the packet formats are identical.

A request message is, of course, impossible to prove authentic. Measures can be taken to prove that a message is not authentic, such as sending additional verification messages back to the suspected originating system. Another technique is to maintain information about previous messages so that stateful (meaning retaining information, as opposed to stateless) inspections of messages can be performed to provide clues to its authenticity. However, the cost of using either of these techniques as a rule for all messages sent through a machine is prohibitively expensive.

The packet source address of DDoS traffic is often forged to conceal the originating machine, so filtering based on source properties is also not effective without knowing what non-forged addresses can normally pass through a network in a particular direction, a feature that is not available in the existing Internet infrastructure. Therefore, a filter based on destination properties is as complex as currently possible for an effective traffic limiter that is useful and economical to defend against a DDoS attack. The difficulty in the implementation of such filters is that their correct operation requires them to be quickly configured to defend against multiple attacks on various resources, while cooperating for defense over vast stages of networks. The implementation must also be secure enough so as not to be the means of a new attack. The primary objective of the research presented here is to design a protocol infrastructure for correct cooperation of these packet filters as previously defined.

It is important to note that defending against a particular attack first requires the detection of that attack. However, detection schemes are beyond the scope of this thesis. It is assumed that a means for detecting attacks to be defended is already in place.


The Network

Before fully describing the problem at hand, it is best to first describe the medium on which the problem occurs: Internet style networks. The Internet is a multi-staged interconnection network. Although there are many different types and speeds of physical networks that make up each of the dynamic and practically uncountable stages, they all implement the Internet Protocol (IP) [25] to afford compatibility in communication and a consistent communication model between any two nodes. IP defines a network layer, which is responsible for routing of information between nodes of the network, and provides services to higher-layer protocols [33]. The topology of the Internet can be considered to be an arbitrarily connected graph, with the vertices being computer nodes and the edges being network channels. Generally, an edge with its adjoined vertices represents a network stage. An aggregate of stages is considered an internetwork. A vertex belonging to multiple stages is a router.

The atomic unit of information in an IP network is the IP packet. The IP packet is marked with an IP protocol header that identifies the source and destination nodes' uniquely identifying IP addresses (information necessary for routing), and also an identifier for the higher-layer protocol to which the IP packet's information payload conforms. As not all nodes in an internetwork are assumed to be directly connected, packets must be correctly relayed between a number of networks for any two nodes to exchange information. The intermediate nodes, the routers, belong to multiple stages and make local decisions about which path a packet takes between nodes. Routers generally operate on a store and forward basis, holding packets in a queue until transmitted.

Because of finite buffer capacity for the queues on the routers, and of transmission hazards in some networks (for example, collision-based broadcast networks), the transmission of an IP packet between two nodes is considered unreliable. In the tradition of postal mail service, there is no guarantee that an IP packet reaches its destination, nor is there a receipt to indicate that it was or was not received. Furthermore, it is not guaranteed that IP packets will arrive in sent order, or that a successfully transmitted packet will be received only once. Reliability and sequence must be accomplished by other means, usually through building a reliable transport layer or application layer on top of the IP network layer. The network layer is responsible for flow of data between networks, and the higher layers build on network layer functionality to provide higher-level services. Above the network layer is the transport layer that handles flow of data between hosts, and above it is the application layer that handles the details for a particular application. While reliability can be implemented in any of the layers, a common technique is to use the Transport Control Protocol (TCP) [26], a transport layer protocol of IP that provides a three-way handshake scheme for synchronization, acknowledgements, and automatic resending of packets on the unreliable IP medium. TCP allows both parties to a transaction to be confident of each other's progress in the transaction, although duplicate packets are occasionally sent. Under extreme conditions, the resending of packets can lead to a “congestion collapse” condition in which network utilization is at or beyond the saturation point but throughput is merely a fraction of normal [22].

The basics of IP networking are as simple as this, and the simplicity is largely the reason that the Internet has scaled so gracefully to its current size. However, its simplicity can give rise to annoyance, as the closed environment for which the protocols were designed did not have the same security concerns as the open Internet. The IP protocols are the basis for an Internet that is far different than the network of users for which they were designed, and their inadequacies cannot be significantly addressed due to the well-established infrastructure.


The Attack

A Denial of Service (DoS) attack on a system is a malicious misuse of resources that often leaves the target system in a state where it is unable to perform adequately. A DoS attack also does not generally involve any authorized access to the target system, meaning that the security on the machine is not cracked. When it comes to attacks, DoS attacks are more of a “stick in the spokes” style of sabotage than other “breaking and entering” techniques that are traditional concerns of computer security. There are many ways to perform a DoS attack. A common method is to generate an excessive number of seemingly legitimate but confusing requests to a target system, consuming the computing resources of the system rather than compromising its contents. The misuse of resources is a difficult problem, as a myriad of dangers in possible misuse must be considered, and it is very difficult to differentiate legitimate requests from the vicious ones due to their identical appearance.

Obviously, it is generally much easier to overload a given system with requests from multiple systems than it is with a single system. A Distributed Denial of Service (DDoS) attack is an attack in which processes on many machines work together to perform a DoS attack on a target machine. Examples of such attacks are the ones that rendered many popular commercial web sites inaccessible in early 2000 [12]. These attacks exploited a weakness in TCP, as its mechanism for imposing reliability on an unreliable medium can be easily misused. A simplification of the attack is that the many machines send forged TCP “SYN” packets at a target machine in unison at a combined rate much higher than the target machine can service. This “SYN-flood” attack has many side effects that congest both the network and the protocol stack of the target machine.

The details of the attack are not very complex. A TCP session is normally established after the reception of three packets sent between two hosts. The host intending to establish a session encodes and sends a SYN packet. After receiving the packet, the other host replies with a “SYN-ACK” packet encoded for the particular SYN packet. Reception of that packet triggers yet another reply from the initiator, called an “ACK” packet, that signifies that the session is initialized. The multiple acknowledgement packets ensure that each host is aware of the progress of its opposite. Sequence numbers are maintained between packets so that each host can identify the particular session (out of possibly many concurrent sessions) through the information in that packet. Failure to receive an expected acknowledgement packet before a deadline warrants the resending of the packet until the problem disappears or a limit for the number of resends has been reached (see Chapter 21 of [33]). The obligatory retransmissions that work to enforce reliability are the means of the SYN-flood attack. In the attack, many machines send forged SYN packets to the target host. The target host then tries to SYN-ACK, but in failing must either resend or receive a packet from the host at the forged address. Either outcome adds to the traffic in the network. A sort of localized congestion collapse in the host upon a bombardment of traffic from a SYN-flood can render the system unavailable for regular use through any protocol, while the traffic itself consumes network bandwidth that could be used otherwise for productive purposes.

Another common attack is the “ICMP smurfing” attack that is based on the standard ping mechanism of Internet hosts, and which is actually required to meet the definition of Internet host [4]. ICMP stands for Internet Control Message Protocol [24], and is an IP-based suite of protocols used for network diagnostics and error message transmission. In this attack, a compromised host on a high-speed network sends forged “echo request” packets to the hosts of the network [33]. Those hosts reply with an “echo reply” to the forged address, which is the target system or network.

Such coordinated attacks generally require intrusion of the many attacking machines so that attacker processes can be installed, thus providing automation of the attack and secrecy to the coordinators. The possibilities of similar DDoS attacks seem endless, but they have one important feature in common: extreme amounts of traffic directed at a particular target machine as a means of reducing availability. Fortunately, these sorts of DDoS attacks are survivable, as long as the rate of requests can be reduced to tolerable levels.

The extreme traffic directed toward a target machine under a DDoS attack creates network congestion around the target machine that has been called a “hot spot” [19]. Congestion means a saturation of the network channels with massive flows of packets, compounded by information loss due to buffer overflows on the routers. Naturally, the traffic destined for the hot spot is called “hot traffic,” while other traffic is called “cold traffic.” A hot spot can cause congestion of multiple stages of neighboring routers, so cold traffic can be greatly affected by attacks as well. The simplest explanation of why a hot spot forms is that there is a bandwidth mismatch between the distributed attackers and the target network. The hot traffic is bottlenecked as it converges on the limited network access points surrounding the target resource.

A useful metaphor for the situation would be that of many streams converging into a dammed river. A dam can handle a limited amount of stream input flow before the flow is greater than the maximum output flow of the dam gates, causing a flood and/or dam breakage. In the metaphor, the streams flowing to the dam are analogous to the network inputs to the hot spot. The breaking of the dam translates to the disabling of the system that the excessive hot traffic can cause. This metaphor is not entirely original, as many DDoS attacks are commonly referred to as floods. This metaphor will be used to derive further terminology; however, it should be noted that the metaphor does not describe the distinction between hot and cold traffic.

There are many well-recognized congestion control schemes that aim to prevent congestion altogether, or to manage it under occurrence (see Chapter 5.3 of [34] and also [22]). However, these solutions are usually limited to internal networks administered by the same organization and rely on the traffic generating machines to cooperate in reducing traffic. Both assumptions are unrealistic with regard to preventing DDoS attacks, as no cooperation is provided in limiting traffic toward that internal network. It may then only be handled at or within that network's border, making it possible for a hot spot to form at that border area.

The hot spot model of a DDoS attack is extremely useful in that hot traffic can be measured in a real network once it is defined. Strictly speaking, any traffic directed toward the DDoS attack target machine helps maintain the hot spot as far as network congestion is concerned, so the definition of hot traffic can consist simply of the traffic's destination machine (or network for broadcast packets). However, because attacks usually target a particular process on a machine or are somehow of similar format otherwise, extra identifying information maintained in layers above the IP network layer can be used to refine the definition of hot traffic. Such information is usually used by the destination machine to bind a particular message in packet form to an active process. Examples are the application/transport protocol number of the network layer or the process port number indicated in the transport layer. A more specific definition of hot traffic serves to reduce the amount of hot, but non-attacking (authentic), traffic affected by the defensive response. Again, details of detection schemes capable of defining and refining hot traffic definitions are beyond the scope of this thesis.

Monitoring an attack directly from within the hot spot can never be significantly useful. In the absence of defense, the target system is soon saturated by the attack as a hot spot develops, yielding no further useful information and often leaving the target system crippled for a significant amount of time. In the presence of defense, an ongoing attack should be undetectable at the hot spot for the defense to be a success. There can therefore be no effective drop-in solution between the “rock and the hard place.” With this in mind, the monitoring and defense must take place in a distributed fashion, distantly skirting the target network. The distribution of monitoring needs to be over the different paths that attacking traffic can take to arrive at the hot spot. A natural choice to house the monitoring and prevention components is in the routers, as they ordinarily control the flow of traffic between networks. All discussion of algorithm implementation will be limited to components running on the routers of a network in the vicinity of the hot spot. The terms capable router and participating router will be used to indicate routers that could be or are involved in the response to a particular attack, respectively.


Related Research

Although the matter of defense against DDoS attacks has had little treatment as of this writing, there is much supporting research that provides building blocks for defense. In this chapter we briefly review that research.

CERT Coordination Center

The CERT advisory board focuses more on DDoS prevention than defense, and proposes several solutions. The solutions are classified into three categories: detection, prevention and response [36]. Detection involves a variety of tools to detect the presence of DDoS tools used by hackers, e.g. trin00 and Tribal Flood Network (TFN), on the systems [41]. Prevention includes setting up firewalls and packet filters, and increasing the security of the server. Response to the attack includes how to protect data and how to recover the system after the attack.

All of the solutions proposed by CERT and some other organizations [36][37][44] lack one important feature of the response: the active defense of the system during the attack. If an attack cannot be prevented beforehand, a response should include the possibility of minimizing its effects without hazardous retaliation. The deployment of a firewall that scans every incoming packet can filter out the unwanted packets and even limit the rate of wanted packets. However, the incoming traffic still consumes network resources at the firewall boundary, and can cause excessive delays on the routers so that a hot spot can still form. This has been the case in many real attacks. CERT's recommendations are simply a collection of good practices for configuration of existing technology, and it is reasonable that they do not propose an active response system, as none are currently available. However, the holes that exist in their good practices are reason enough to support research of new DDoS response techniques.

Spillway Testbed

Rather than attempting to avoid the DDoS attacks, as in the solutions proposed by CERT, spillway algorithms aim to subdue the attack where it is most manageable. By counting the hot packets on many routers that forward packets to the target, the sources of unusually high traffic can be identified and prevented from maintaining a hot spot. The system works solely by the definition of hot traffic, so it easily handles DDoS attacks that fake the source IP address of the packets, as in SYN-floods. The extra computation in the participating routers will likely decrease their performance, as the computational complexity to manage each packet through a router grows linearly with the number of attacks. A fast implementation is critical.

The testbed of [1] is a proof-of-concept system for a simple spillway algorithm (not presented there under that name), so it must be configured manually. Furthermore, it can only defend a single protocol at a time, and can only defend an attack on a particular node. The testbed needs to be updated with a dynamic infrastructure and support for more complex spillway algorithms to be a more realistic demonstration system. This is the goal of the protocol infrastructure developed in this thesis. Spillway algorithms are treated in more depth in Chapter III (page [*]).

Infrastructure Changes and Secure Protocol Design

Meadows' solution [21] deals with securing against DoS within the communication protocols. This would require a retrofit of all existing IP protocols to be effective, but still does not handle the strictly valid class of DDoS attacks. These attacks involve no forgery, and are absolutely indistinguishable from high levels of authentic traffic. The “Slashdot Effect” [20] involves only HTTP traffic from real or simulated web browsers, and appears to the web server like a massive inflow of valid requests. Such examples of nonuniform access behavior are common of DDoS attacks, but are not easily addressed at the communication protocol level. Meadows does provide a secure design model that would greatly help proposed anti-DDoS protocols.

Prevention of SYN-Floods

One of the most comprehensive discussions of DDoS attacks and active response is the paper by Schuba [32], in which the author presents a thorough analysis of the SYN-flood attack, and many means to its defense. Schuba discusses many good-practice configuration techniques that will reduce the susceptibility of a resource to DDoS attack, and a few new defense techniques that stop some attacks at the firewall boundary of a domain. The discussion of defense is limited to SYN-flood attacks, and border firewall techniques, which are still susceptible to the hot spot problem. Schuba also presents a router infrastructure in which routers can recognize and discard forged packets, preventing DDoS attacks such as the SYN-flood from reaching the intended server altogether. Unfortunately, such an infrastructure would only be useful if instituted as a rule for all routers, which is unlikely. It also does not address attacks that do not forge sender IP addresses, as do SYN-floods.

Network Traceback

A major difficulty of common DDoS attacks lies in the fact that the source IP address on packets is forged, or “spoofed.” Because a packet's origin is ambiguous, it is difficult to discern its source. The logical way to discover the source is to learn the route that the packet has taken to narrow the search space for the true source. The traditional method of recording routes is in the “record route” field of the packet. Each router that sees the packet will append its address to a packet so that the route can be read by the destination or routers in between. However, the record route packet attribute must first be turned on, and the originator is unlikely to do that. There is also a severe limitation in the number of router addresses, nine, that can be recorded [33]. Changing the packet size also increases the potential for the packet to be fragmented into multiple packets that will increase the network loads.

The other standard mechanism for discerning a packet's route is the standard traceroute type program. Traceroute uses a series of packets sent back to the source with a short “time to live” (TTL) that causes an ICMP error message to be returned from the router at TTL number of hops from the source. This way, the route can be discovered by growing the route hop by hop with successive packets, ignoring possible redundant routes. Of course, this method is hopeless because the true source address is required for a proper traceroute, so the source address that is traced is likely forged. It is like trying to trace prank callers on a phone line only to find someone who has been framed.

There are, however, two proposed solutions that tackle the need for a traceback mechanism for spoofed packets. The Internet Engineering Task Force (IETF) recently proposed a new ICMP message type to be implemented by routers [2]. Because originating addresses are often forged in DDoS packets, it is historically nearly impossible to determine their origin. The new ICMP message would be sent from intermediate routers to the destination to help the destination discover from where the DDoS packets are coming. The other proposed solution was developed at the University of Washington and is implemented as a stronger form of the IP record route option [30].


DDoS Defense with Spillway Algorithms

There is a distributed defense algorithm that can be implemented in the routers with minimal intercommunication. This algorithm has been shown to be effective in a testbed environment. The content of this chapter is based on material previously published in [1] (© 2000 IEEE. Material reprinted with permission. See Appendix).

Conceptually, the algorithm produces a distributed firewall with configurable packet filtering. The resulting system can be configured to reduce total incoming hot traffic below a desired rate (for the definitions of the hot spot and hot traffic, see Chapter I, Section C on page [*]). In this chapter we present this class of algorithms, as it is the class of algorithms that the proposed DDoS defense protocol suite implements.

Spillway Algorithms

At the onset of a DDoS attack, the attack must somehow be detected. A detection scheme is assumed to exist. The detecting component then defines hot traffic, thus configuring packet filtering on the routers. Packet filtering is a process in which all IP packets are judged with a set of rules that decide whether the packet should be forwarded or should be discarded. Each participating router is configured with two parameters, a threshold rate and a blocking duration. If the threshold rate of packets of hot traffic is exceeded on a router, the router will drop, meaning to not forward and discard, all hot traffic until the rate is below threshold for the blocking duration. Thus, offending hot traffic will be choked off far away from the hot spot while cold traffic is unscathed, and even given higher quality of service (increased bandwidth and reduced latency) to the next network stage if the filtering performance is adequate. Although it is likely that some legitimate packets that are found to be “hot” will be discarded, the impact on communication is of the same or lesser degree as that found at the hot spot, as packet loss normally due to saturation is still present due to filtering, but packet loss is now present mostly in the traffic that causes saturation. Communication reliability mechanisms on the originating host will simply initiate retries upon packet loss, increasing the probability that the message will get through. Of course, it is ideal to choke off the hot traffic as close to the source as possible to reduce resource consumption by offending traffic, but this is dependent on the proximity of the routers to these sources and the breadth of the sources themselves in the network topology. The blocking duration is meant to allow the hot spot to clear, as the congestion will dissipate when no longer maintained by hot traffic. If the combined measure of hot traffic from the participating routers is below the threshold rate for sufficient time, active monitoring and defense of the attack can safely cease.

The dam metaphor presented earlier, explaining a DDoS attack as a flood of network traffic into a dam-like area of limited capacity, can be extended to this algorithm. A common way to manage excessive flow into a dam is to shunt the excessive flow to a spillway, avoiding the dam altogether. The algorithm effectively implements shunts to infinite sinks at many control points, not just near the dam, effectively digging long ditches, or perhaps trenches, around the hot spot to reduce the flow into the center. The downside is that, just as in a normal spillway, excess flow has no utility. The hot traffic is simply discarded. It could be argued that discarding all hot traffic is like “throwing out the baby with the bath water,” but that is not quite an accurate analogy. Although it is possible that service is disrupted to legitimate clients using the same service under DDoS attack, it is also possible that many legitimate clients using the same service access it with higher quality of service. Regardless, either possibility is no worse than the normal effects of a DDoS attack on that service without a defense scheme that would cause its complete unavailability. Furthermore, other services which are not labeled as hot will experience improved quality of service to all clients as compared to under attack conditions without a defense. The term capable router introduced earlier can now be replaced with the more specific term spillway router.

This algorithm is extremely simple and effective, so long as the threshold and blocking duration for each spillway router are chosen wisely in advance. This limitation requires intelligent configuration of the threshold rates and blocking durations by hand. More intelligent algorithms to automatically tweak those parameters are necessary for a practical system. These algorithms should actively tailor the prevention system to be as specific as possible in blocking attacks, improving the service to as many valid nodes as possible. Performance of these algorithms is of critical necessity, as the routers will need to communicate with each other under high-load conditions. Specific guidelines for the design of the protocols are detailed in Chapter IV, Section A (page [*]); however, only basic improvements on the basic algorithm are suggested in this thesis.

An undesirable effect of the simple algorithm occurs in configurations with multiple tiers of routers. If a packet successfully reaches the target, it is counted by every router along the path. If the routers are of similar threshold and much hot traffic travels on the same path, it is likely that multiple routers along the path will block concurrently. This could be less than ideal as only one router is needed to block that offending traffic. However, again the worst case is still better than the results of the DDoS attack even for the simple algorithm, as the hot resource is widely unavailable but its neighboring resources are now available. Extra communication between participating routers can be added to the algorithm to prevent this effect. There are many possible optimizations of the simple algorithm to improve the quality of defense; however, the basic principles of the defense remain the same. The simple algorithm defines a class of algorithms that can all effectively prevent a DDoS attack's intended effects. This class of algorithms will hereafter be referred to as spillway distributed rate reduction algorithms for DDoS defense.

Spillway algorithms for DDoS defense are a method to prevent DDoS attacks in the real world, meaning that they are applicable to the existing Internet infrastructure; however, they currently do not have a means of implementation. At this time, there is no realistic protocol infrastructure on which these algorithms may be implemented. The functional facilities necessary for such an infrastructure are discovery of capable routers around a hot spot and a means to coordinate a response. Of course, there are many nonfunctional issues to consider, such as security, and the infrastructure is only of use if paired with a DDoS attack detection scheme.

The basic spillway algorithm is logically a zero-order spillway algorithm, in that only local information is necessary for correct performance after hot traffic definition, as all other necessary parameters are statically configured. However, fixed threshold rate and blocking duration values for any given attack are a horrible limitation. Higher-order spillway algorithms can address the deficiencies of the basic algorithm through inter-router information exchange and control. The development of a protocol infrastructure for support of higher-order spillway algorithms is the subject of this thesis. There is another advantage of such a higher-order spillway infrastructure that makes it even more useful: automatic discovery of adjacency.

It is important to note that the described defense mechanism is not an infrastructure change to the Internet. It is designed as an infrastructure addition that works with all old components, and can be incrementally instituted. The method of dropping packets with filters to reduce traffic is not all very novel, as it usually occurs in DDoS anyway due to broadcast hazards and buffer overflows. However, with this defense approach, packets would be dropped not due to buffer saturation, but selectively and far away from the destination, thus preventing the damage that can be done to other services by the attack.

Example Spillway Algorithm Usage

This section contains an example of how spillway algorithms work to reduce the effects of a DDoS attack, and a summary of the testbed results first presented in [1].

The experiment was constructed using a testbed of Linux-based machines, each configured with multiple IEEE 802.3 [40] network interfaces so as to act as routers. The Linux kernel's firewall facility was used for both retrieving statistics about hot traffic and implementing the packet filtering necessary for spillway algorithms.

A simple spillway algorithm was implemented under constraints that would not be realistic outside of a testbed. Defense was limited to a single target machine and a single TCP/IP protocol. Each of the routers was configured with information about its adjacent routers, forming a static tree routing structure rooted at the target machine and having client/attacker computers as leaves to generate traffic routed toward the root. The test network has the topology exhibited in Figure 1.


Figure 1: Example Network Topology

The TCP/IP protocol to be defended was HTTP [3], the protocol used to receive web pages. A detection component was created by modifying the open-source Apache web server [45] to measure the rate of access and trigger the routers to prevent an attack when a threshold rate is surpassed. Rate of access to HTTP is the sole measure of attack that was used. Other possible exploitations, such as SYN-floods, could serve a means for an attack on the server's protocol stack rather than the web server itself. As the detection component can never be sure of an attack without knowing of specific attackers, nonuniform access is the trigger. Of course, nonuniform traffic from unusually high loads or an attack could then trigger a defense.

The experiment was geared to test four qualities:

  1. Correct behavior in the absence of an attack
  2. Attack detection
  3. Blocking of attacking traffic
  4. Return to normal behavior when the attack subsides

Attack detection is not the focus of spillway algorithms, but is useful in discovering the characteristics of an effective defense.

The routers were configured and events were scheduled to meet the requirements. Three 30-second attack waves were directed at the server, superimposed on a baseline of normal traffic, all generated by the client machines. The experimental results are shown in Figures 2 and 3, with thick black lines denoting the attack waves. The experimental events and their effects are summarized as follows:

  1. The web server is given a baseline of traffic that it handles normally.
  2. A 30-second wave from seconds 36 to 66 was intentionally below the web server's detection threshold of 60 requests per second, and is served normally by the web server.
  3. Another 30-second wave from seconds 117 to 147 was above detection threshold, and the web server triggered the routing system to begin monitoring HTTP traffic. Router 1 detected the wave as above spillway threshold, resulting in a block of the last ten seconds of the wave through that router. The normal traffic from Client 1 is also blocked, as visible in Figures 2 and 3.
  4. A third 30-second wave from seconds 196 to 226 is also above spillway threshold, and was detected early by Router 1. However, because the wave is 30 seconds long and the blocking duration is set at ten seconds, the wave is detected and blocked twice. This oscillation is due to the fact that the attackers used TCP, and slowed their attacks when being blocked so that they were no longer detected.


Figure 2: Example Web Server Statistics


Figure 3: Example Router Statistics

The example's constraints hide the simplicity of the spillway algorithm used. It is very much a static defense, with no real cooperation between the routers. If the example were extended to multiple levels of routers, the simple algorithm would show the disadvantage of such a scheme, as all routers along major hot traffic paths would block simultaneously when only the outermost router is necessary. Such adaptivity is a requirement to be addressed by the infrastructure protocols.


Spillway Protocols

Analysis of the need for an infrastructure for implementation of spillway algorithms to prevent general DDoS attacks gives rise to a suite of network communication protocols. This infrastructure is crucial to the effective utilization of spillway algorithms.

A communication protocol has two qualities, the format for communication messages and the goal-oriented algorithms performed by multiple parties that drive the exchange of messages. The need for particular protocols and their high-level requirements (objectives, information payload) are developed in this chapter, while lower level issues, meaning security, message types (formats), and the algorithms are discussed in later chapters.

Before the prevention algorithm can actively limit the rate of hot packets toward the hot spot, spillway routers along the paths of hot traffic must be enlisted to help. This would be necessary to defend any resource that is found to be a target of an attack. The variation in availability and relative topology of spillway routers creates many different scenarios in a defense. The worst case scenario is when some hot traffic can arrive at the hot spot uncontrollably, limiting the possible effectiveness of the prevention algorithm. The best case is when all of the hot traffic routes are covered by stages with spillway routers, so that the hot traffic can be throttled down far away from the hot spot. A reliable Internet style protocol (as found in the IETF RFC's) for discovery of capable routers around a particular resource under DDoS attack is necessary. After the routers have been discovered, they must be able to collaborate appropriately for defense with the spillway algorithms. A coordination protocol is therefore also necessary.

Design Guidelines

The protocol suite must be efficient and effective as shown by available measures for it to provide a realistic solution. Unfortunately, it is difficult to evaluate a design with any objective measures other than asymptotic time and space complexity. As it is a protocol suite, which is a specification for interoperation capabilities rather than the implementation, it is difficult to differentiate the functional (defining correct operation) and nonfunctional (capabilities not dependent on operational features) requirements. Therefore, the protocol is designed to satisfy a balance of the following uncategorized guidelines:

  • Secure communication: the protocol is difficult to be tricked by malicious messages, and does not have any known flaws or vulnerabilities useful to a new type of attack. Weaknesses that are found are well-documented.
  • Minimal bandwidth requirements: the protocol requires minimal messages for operation, and will be able to operate correctly in the absence of available bandwidth. It will also impose its own reliability scheme with respect to message retries and acknowledgements.
  • Quick configuration: the protocol allows nearly immediate propagation of defense commands for a particular recognized attack.
  • Topology discovery: the protocol supports the discovery of what is necessary of the defense network's topology to afford quick configuration when needed.
  • Interdomain cooperation: not all routers are to be administered by the same organization, so appropriate trust and security relationships are supported for coordination between each organization's domain.
  • Handling of multiple concurrent attacks: the protocol does not by itself limit the number of concurrent attacks that may be defended against by a particular router, although there will need to be a limit imposed by the router (due to finite resources) that should be handled appropriately within the protocol.
  • Accommodation of dynamic networks: the protocol correctly operates under changes in network topology and routing at any time.
  • Defense of arbitrary resources under influence: the protocol supports different types of definitions of hot traffic, from loose definitions with only target network information through more constrained definitions with particular machine and/or protocol information.
  • Cooperation for defense over vast stages of networks: the protocol is able to correctly manage large defense networks without imposing severe communication requirements (communication should not scale directly with network size).
  • Growth of trenches: the protocol allows for incremental growth and shrinkage of the perimeter of the defense network so that only the appropriate or necessary routers are involved in a defense.
  • Interoperation without significant manual configuration: manual local configuration for any system only requires what is necessary for security concerns. Systems should otherwise configure defenses automatically.

The following are qualifying limitations to the scope of the protocol suite:

  • The protocols are merely an infrastructure to handle defense coordination for recognized attacks. Detection of these attacks is left as an issue to be addressed elsewhere.
  • Adjacency of capable routers in the defense network can be assumed by the protocol, so that extraordinary measures are not necessary in discovery of defense network topology.

These requirements are thought to be sufficient design guidelines and limitations to produce a useful protocol suite, and further requirements would likely require arbitrary design commitments.

Domains of Control

For an effective use of spillway algorithms, it may be necessary to enlist spillway routers that are administered by other organizations. Collaboration between participating spillway routers therefore poses many security issues, as it must be assumed that not all of the spillway routers are controlled by trusted parties. Blindly trusting requests from what seem to be other spillway routers could lead to the worst of denial of service attacks, disabling critical communications altogether. Security is the subject of Chapter V (page [*]).

The spillway protocols will use domains of control to establish trust relationships between routers, after security measures are taken. A domain of control is a set of routers administered to trustingly interoperate. A router can belong to multiple domains of control. Logically, there are two categories of such domains of control, internal domains and border domains. An internal domain is a domain of control administered by one organization, while a border domain is a domain of control administered by cooperating organizations. All routers should belong to at least one internal domain of control. Border domains are useful in allowing interoperation for routers of different domains in a terminable trust relationship, without affecting the performance of connected internal domains. This loose concept of domains of control will be used to simplify protocols by allowing the basic assumption that routers of common domain can trust each other, while routers of different domains cannot. Establishing this trust is the topic of Chapter V (page [*]). Furthermore, routers will need a mechanism to present other routers with a supported list of domains of control, so that a common domain can be found.

To altogether prevent the possibility of a domain identifier clash and eliminate the need for a distributor of domain identifiers, such as is necessary for domain names, a word-length IP version 4 (IPv4 [25]) address of the given domain of control should be used as the domain identifier. Although there are many choices for this address, a specific address should be used to identify the domain of control.

Hello Protocol

The first step in maintaining an infrastructure for an implementation of the spillway algorithm requires that spillway routers know of other spillway routers, so that they may configure each other for defense when needed. Even if addresses of other routers are known, they should periodically be verified. Much like with routing protocols, such as OSPF, this process will be called “hello” (see Chapter 6 of [16]). There are many possible approaches to this.

The fundamental issue is that each spillway router needs to maintain a list of other spillway routers that are known to be functional, and that can be enlisted for help when needed. Particularly, each router needs to know which spillway routers are logically adjacent to it, meaning that there are no hot traffic routing paths between it and the spillway router in question that pass through another spillway router. These logically adjacent spillway routers are the ones of particular utility, assuming that logically adjacent routers can be properly discovered and ignoring the issue of domain of control. This is because nonadjacent spillway routers are those that are separated from one spillway router by at least one other spillway router, and there is no reason that nonadjacent spillway routers should be enlisted when there is an appropriate adjacent router, as the messages must travel farther under attack conditions (less reliable), and extra work is done if the router is already enlisted for the same defense by an intermediate router.

The need to communicate across network stages implies the need for a protocol whose messages can be routed. Given that the target environment is the Internet (or Internet-style networks), the messages will need to be based on IP.

When considering the issue of domains of control, the only option might be to use a nonadjacent spillway router if the adjacent domain is not helpful. However, it would be polite if that unhelpful domain would return its nondomain adjacencies so that no extraordinary measure would be necessary to discover those possibly helpful spillway routers.

The following subsections outline a few strategies that can be used with the hello protocol to discover adjacency.

Discovering Adjacency

To maintain a list of adjacent routers, those routers must first be discovered. In an environment where logically adjacent spillway routers are fixed and known, it is useful if administrators can configure lists of these machines for each router, so that it automatically knows some routers that are available when needed. This would be particularly useful in situations when logically adjacent routers cannot be discovered easily by other means, such as in the case when two logically adjacent spillway routers are separated by a non-spillway router. Logical adjacency can be difficult to establish in the absence of physical adjacency.

IP offers a simple mechanism for sending messages to unknown physically adjacent recipients, called local broadcast addressing. A message sent to a broadcast address is received by all machines on the message's target network. On broadcast networks, there could be many adjacent machines, or there may be only one (ignoring the sender). Unfortunately, local broadcasts are the only efficient and reliable way to discover adjacency without further configuration.

Beyond Physical Adjacency

It is possible that islands of spillway routers exist, isolated from each other by paths through routers not capable of supporting the protocol. In this situation, broadcast messages would be useless along the border of each island with respect to establishing adjacency lists. There are many possible solutions to this problem.

One possibility useful in some situations would be to configure a forwarding port on the router, much like what is used for BOOTP [6]. This would work as a bridge between islands. However, like a real bridge, it would need to be enabled and would remain relatively fixed thereafter. It would simply need to recognize broadcast packets for the hello protocol, and forward them to another network.

Another solution would be a much more selective process of discovering spillway routers, but used only on demand. The proposed network traceback messages (see Chapter II, Section E on page [*]) could be detected by the routers in transit. If the description of the messages it traced match that of a current definition of hot traffic, the originating router could be queried with the hello protocol.

An elegant solution would be for all routers of a particular domain of control to subscribe to an IP multicast address (see Chapter 12 of [16]). Multicast routing allows a distributed set of hosts to basically share an IP address for receiving of messages. Hello request and response messages can be sent to this address, and all routers of that domain of control will receive the messages, barring transmission hazards. Furthermore, the same IP address that is used for the domain of control identifier could be this multicast address, easing its discovery. However, such ease of discovery and broadcast could leave the routers vulnerable to DDoS attack, as one message becomes many messages when multiplied by the routers for each of the destination networks. Furthermore, IP multicast is unreliable, so it is not guaranteed that all routers will receive the message, a particular concern under DDoS network conditions.

After Discovery

Discovering adjacent spillway routers is only one facet of the protocol. As mentioned before, routers should be polled periodically to ensure availability. However, although an adjacent spillway router is available, it is not necessarily configured to be helpful if under another domain of control. It is apparent that other information about adjacent routers needs to be maintained, such as its domain and potential willingness to help when needed. These characteristics should affect both the priority with which a router is sought for help and the expected trust (and therefore reliability) with which it can comply.

Help Protocol

Beyond discovery of spillway routers prior to need, the second half of the solution is to actually ask for and receive help from those routers. Asking for “help” is basically providing a definition of hot traffic to a spillway router, or asking for the modification of behavior for a currently participating router. There are many ways that this could alter the behavior of that router.

The spillway router is faced with many decisions upon receiving a request for help. Requests from an untrusted router could be discarded as superfluous, and the router could go about its business as usual. It is very possible that granting such requests could simply be a waste of resources, or could possibly be an attempt to maliciously curb traffic. However, if denial of help for certain classes of requests is the standard policy, it is probably better to discard hello messages from those classes, and never service help requests for routers not on the adjacency table. This will save a lot of work on both sides. Such a policy taken as a rule for many domains of control would severely compromise the effectiveness of a defense.

A request may be denied for other reasons, such as finite defense resources on the router. If resources are available, the request for help could be granted.

What the router does after a request is granted should be dependent on measures of hot traffic through that router, local policy, and the spillway configuration parameters provided to it (threshold, blocking interval). As that information varies by scheme, provisions for standard scheme labeling in the help messages should be available, as well as the ability to request statistics (rates) and modification of spillway parameters in participating spillway routers.

Defining Hot Traffic

Hot traffic is the type of traffic that is defined to be part of a DDoS attack. For the help protocol to be successful, it must be able to correctly and quickly convey a single definition of hot traffic.

Known DDoS attacks have varying types of hot traffic. A smurf attack uses ICMP [24] traffic directed at a particular network address. As the attack traffic is ICMP traffic with a particular protocol number and target IP address, that information is sufficient to identify it. SYN-floods are an attack on the network protocol stack, as that traffic is never seen by application software. SYN-floods can be identified by destination IP address, the IP protocol number for TCP/IP, and possibly also a specific TCP/IP protocol number, although this can be made random in an effective attack. It is obvious from just these two examples that the identifiers of hot traffic vary. There needs to be a hot traffic definition scheme general enough to handle such different types of hot traffic, but with enough granularity to specify the type of traffic well enough that cold traffic is less affected.

It is helpful to consider what the requirements are to decide if a received packet is hot or cold traffic. This is accomplished through comparisons to a hot traffic definition. A match to a definition implies that a packet should be handled as a hot packet.

The first test that should be performed, as it is the most decisive of the tests, is a comparison of destination in the IP address. It should be possible to allow matching by network addresses in the definition. IP networks are designed to consist of nodes with IP addresses in a certain range. Routers between these networks usually have an address in each of the adjoining networks. Usually, routers check IP packets transmitted on the network, and forward packets whose IP is out of range to other networks. The address test it uses to determine if the address is out of range is a bitwise Boolean operation with a “netmask” of the same size as IP addresses (32 bits). The netmask is precisely the granularity definition mechanism needed in the definition of hot traffic. The hot traffic destination address definition should use a similar technique of using both an IP address and netmask to determine if the destination of the packet is a hot packet.

All IP packets must contain an IP header (as they would otherwise not be IP packets and not be routed), which contains information about which particular protocol format to which the packet's payload conforms. There are transport layer protocols that build on the IP network layer such as UDP, TCP, and ICMP. Because no known attack varies transport protocols in its hot traffic, there is really no use in indicating multiple transport protocols in the definition of hot traffic.

It becomes more difficult to have hot traffic definitions that use application protocol numbers defined for a given transport protocol, as the transport protocols vary themselves. As the transport protocol varies, the packet format changes. Fortunately, the 16-bit port number that identifies the application is in the exact same location in the TCP header as it is in the UDP header. Because these protocols provide the bulk of all Internet traffic, allowing the flexibility to also match to a particular port number or list of port numbers would be very useful granularity in defining hot traffic.

It is thought that being more specific than destination, protocol, and port would be of little use, particularly because the packet formats vary widely beyond that information.

Modification of Behavior

After a router is already assisting in a defense, higher-order spillway algorithm implementations may wish to alter the behavior of that router. It may be useful to refine the definition of hot traffic, increase or decrease the flow of hot traffic, or abandon the defense altogether.

Query Protocol

Statistics are the driving force for advanced spillway algorithms. If routers cannot communicate their defense statistics effectively, the defense will not be effectively tailored to the attack, the attack target will gain little information about it, and therefore the given network or others may be doomed to suffer similar attacks.

The ideal statistics would yield a global table of packet rates for all participating routers over the course of the defense; however, synchronization of such a global table would require far more communication than would be necessary for the immediate goal of routers making decisions based on the statistics offered by its adjacent routers.

Polling

Polling should be a fast process of determining the defense statistics for an adjacent router. To be specific, intelligent spillway algorithms would require that a router know the average hot traffic rate through each of the participating routers under its control, and logically upstream to that router with respect to hot traffic. This statistic should be well-defined and supported by all spillway routers. There are possibly other statistics that could be necessary for particular spillway algorithms, so the best practice would be to leave statistics to be later defined after the message format is set. Therefore, some mechanism is necessary to retrieve a list of particular fields in a polling message. Standard statistics are defined in Chapter VI, Section E.

Reconstruction

Reconstruction is the process of retrieving a global table of statistics for participating routers, when network conditions permit. Although the computation of this statistic may be much more complex, its delivery can be accomplished through the same means as the polling message by using a different format for statistical output. Therefore, it will simply be considered a special case of the polling message.

Security

In this chapter we present security issues for the spillway protocol suite, and means to their solution.

Security Issues

There are a few fundamental issues to security that must be addressed for network communication protocols. The basic issues of security to be addressed are authentication, integrity, and confidentiality (see pages 9-10 and 202-203 of [29]). The application at hand must be fully analyzed with respect to these issues so that a system cannot be easily compromised, and so that a compromised system may be detected. Fundamentally, security in communication protocols is all about establishing trust (see pages 218-219 of [29]). Without trust that a message was sent or received intact by the correct party, communication is insecure.

Authentication

Authentication is the process of ensuring that the other party to a communication transaction is the party that it is believed to be (see page 10 of [29] and page 2 of [31]). After all, communication with a particular party cannot be trusted if the originator's or recipient's identity is not certain. Authentication can be thought of as establishing a trust relationship between parties based on identity.

Integrity

Integrity is the property that a sent message cannot be altered undetectably in transit (see page 202 of [29] and page 2 of [31]). Although authentication may be successful in that at least one of the two parties to a communication transaction trusts the identity of its opposite, authentication provides no insurance that a hidden intermediary can not intercept and alter a message. Also, errors in transmission could garble parts of the message. Integrity can be considered as establishing trust that a message's content is that intended by its author.

Confidentiality

Confidentiality is the process of ensuring that a message between parties is only understood by intended recipients (see page 202 of [29]). If a message's author is authenticated and the message has integrity upon reception, there is no guarantee that a hidden intermediary cannot read and understand the message. Confidentiality can be considered the process of establishing trust in the secrecy of information in a message.

Security Solutions

Unfortunately, there are no known perfect solutions to any of the security issues. The security solutions presented here simply add greatly to the complexity of defeating established trust relationships.

Hash Functions: Checksums and Digests

A checksum, usually meaning cyclic redundancy check (CRC) [14], is a simple integrity mechanism useful to ensure that a message is not altered through a communication error (see [13] and page 149 of [29]). Basically, an algorithm is applied to a message to generate a relatively small code that is appended to the message. Due to properties of the algorithm, it is unlikely that a small alteration in the message would generate a message with the same checksum, so the errors can be detected. A common use of checksums is in the header of all IP packets, based on 16-bit one's complement addition [25].

A checksum is an application of a hash function, a function that takes a variable-length sequence and generates a code of finite maximum length (see pages 161-164 of [35]). For a hash function to be of much use, it should have widely varied values for a set of given inputs. If the hash function generates different values for every member of a set of inputs, it is “perfect” for that set. Unfortunately, it is impossible to have a perfect hash function if there are no constraints on contents of the input, as there cannot be a one-to-one mapping from an infinite input set to a finite output set. When a hash function is known to be imperfect, there is a set of known inputs that generate the same value. Hashing multiple inputs to the same value is known as a hash collision. If a hash function's collision set is well-known or well-understood, it might be possible to take a given input and produce another that has the same hash value.

A digest is a cryptographic hash function, meaning that it has certain properties that make it suitable for security applications. It should be computationally intractable to find an input for that hash function that has the same digest as a given input. This implies that hashing collisions should be extremely rare. A digest should also be strictly one-way (see pages 30-31 of [31]). If a digest were even partially reversible, then information about the input and possibly its content could be found. Digest functions are critical primitives of the popular cryptography schemes (see page 29 of [31]).

Hash functions in general are not useful for ensuring integrity. It may be possible to undetectably modify a message in transit without modifying the hash. Also, if the hash function is well-known, the message can be changed and the hash recomputed, all without detection. However, hash functions provide the basis for more useful security mechanisms.

The three digest algorithms that are popular as of this writing are MD5 [28], SHA-1 [43], and RIPEMD-160 [9]. MD5 outputs a 128-bit message digest, and is well-supported by industry. However, hash collisions have already been found, so it is not regarded as completely secure. SHA-1 and RIPEMD-160 produce 160-bit message digests, and do not currently have any known security issues.

Encryption

Encryption is a confidentiality process in which an input is encoded so that it cannot be directly understood, and later decoded. There are two models of encryption commonly used for security purposes, private key cryptography and public key cryptography. Regardless, the goal of perfect secrecy is impossible, because any reversible encryption algorithm can eventually be broken by brute-force. Therefore, the goal of encryption is to make understanding the content of a message much more expensive than the value of the message.

Private key cryptography, also known as symmetric key cryptography, is a process of encoding a message with a secret piece of information, so that it can only be decoded by using that same secret key. Private key cryptography provides both authentication and confidentiality, trusting that the private key remains only in the hands of those authorized.

Public key cryptography, also known as asymmetric key cryptography, is different from private key cryptography in that there are two keys. A public key is freely distributed, and its paired private key is kept secret. A message encoded with a public key can only be decoded with the corresponding private key, and vice versa. Likewise, messages encoded with a private key can be decoded by anyone with a private key. These two properties make public key cryptography very applicable to authentication, as well as confidentiality. Public keys are normally distributed by a trusted third party organization. If a public key is certified by this third party, and a message is successfully decrypted with that public key, then the sender is authenticated.

For either cryptography scheme to be of any use, private keys need to be trusted to remain private, although this seems much simpler with public key cryptography.

Private key cryptography is the classic form of cryptography, although it is not as commonly used today. Examples of such cryptography are DES [38] and the World War II era “enigma machine” (see page 13 of [31]). The current popular public key cryptography schemes are RSA [17] and the ElGamal [11] public key variant of the Diffie-Hellman [8] problem.

Digital Signatures

In public key cryptography, successfully decrypting a message with a public key is sufficient proof that the originator holds the private key paired with that public key. However, encrypting with a private key does not provide any confidentiality, and there is no point in wasting so much space for a cipher (encrypted message) to yield the same result. Instead, a message digest is computed, encrypted with a private key, and appended to the message. This encrypted digest is called a digital signature. A digital signature has the properties that it can only be generated by the holder of the private key, and the message integrity can be verified with high confidence. Therefore, it is a means of both authentication and integrity.

Popular digital signature schemes are DSS [39] and RSA [17]. DSS produces a fixed-width signature while RSA signature length is dependent on the key length used.

Challenge/Response

Third party certifiers are not always available as public key distributors, so sometimes it is necessary to use a challenge/response scheme for authentication (see pages 241-252 of [29]). The most basic of challenge/response schemes is the familiar password prompt, but that relies on one party to trust the other before giving a password that could likely be reused if intercepted. More secure challenge/response schemes rely on demonstrating knowledge of a password in a way that cannot be easily reused, and never really transmitting the password.

A popular challenge/response scheme is kerberos [18]. Kerberos relies on a central authenticating service that then provides transient tickets useful for access to other services.

If only one-to-one communication is possible, digest functions are applicable. Both parties need to share a secret password. One party challenges the other by providing it with a random sequence. The recipient then appends that password to that sequence, applies the digest function, and returns the result. If the initiator computes the same digest and gets the same result, then the recipient either knows the secret or has intercepted a similar challenge before. It is therefore critical that the challenges are unique (see pages 44-46 of [31]). If this is the case, then the initiator can trust the recipient. If the result is different, it cannot trust the other party. If the initiator does not know the password and is trying to steal it, it is left with a digest instead. The only way to then steal the password would be to brute-force the digest for all passwords. The time necessary to do that is exponential with respect to the maximum length of the password, and therefore intractable with a moderately sized password.

Security in the Spillway Protocol Suite

The spillway protocols offer many security challenges to overcome. The standard protocol operating environment is hazardous at best, as the protocols must perform well on networks under DDoS attacks. Reliability of packet transmission will be greatly reduced, particularly on broadcast networks, but other networks such as ATM should be able to provide the protocol traffic with higher quality of service (see pages 61-66 of [34]). Furthermore, spillway routers should also be able to provide a higher quality of service to the spillway protocol packets. However, it is assumed that the protocol should also work on machines without rewriting the protocol stacks, and still be able to perform under saturated broadcast network conditions. With this in mind, messages should be minimal in amount, size, and required computation. Also, communication should be limited to just the parties involved as dictated by the requirements of the spillway protocols, as distant third parties are likely to be much less reliable because multiple message transmissions are necessary to cross multiple network stages. Therefore, security concerns may only be addressed in messages between these machines.

Protocol Requirements

The primary security concern with respect to the spillway protocols is that they could become a means of a new attack, disabling critical network traffic with simple commands. As speed is a concern, minimal cryptography should be employed. The issue of confidentiality will be avoided altogether, so only the issues of authentication and message integrity will be addressed. As spillway protocol messages will not be encrypted, spillway activity can be monitored through the message transmissions. However, there is no failure in security if that information cannot be used to access and control spillway routers.

A Secure Handshake

The constraint made in Section 1 above allows for a security scheme to address both authentication and message integrity issues with a single mechanism. Furthermore, this mechanism can be completely separate in content from protocol messages, but provide the secure foundation for them. This mechanism to build trust relationships between two parties will be called a handshake, based on the similar but insecure process of TCP/IP (see page 211 of [33]).

The first issue is that two routers will need to establish a two-way trust relationship before any communication can be trusted. The challenge/response scheme of Section 4 (page [*]) can provide two-way authentication (see page 52 of [31] and page 244 of [29]). However, the two-message scheme must be augmented with two messages in the opposite direction. Fortunately, the second and third messages are in the same direction, so they can be overlapped. The two routers will be labeled the client and server, with the client being the initiating router. The following messages are required: a challenge from client to server, a response and challenge from the server, and a response from the client. Allowing overlap of smaller messages in the same direction, the total of three messages is the optimal amount, because no other combination of messages can yield such a small amount due to the required sequence. Each message acts as an acknowledgement of the previous message in the sequence (causality), so a failure message can be sent to the opposite party upon reception of an invalid response to a challenge. However, there is no provision for a positive acknowledgement to the client after its response to the server's challenge. This poses a problem on unreliable networks, as in IP, because the client is unsure that its response was received and honored. A fourth message is required to provide this acknowledgement, yielding a lower bound of four messages for two-way authentication on an unreliable medium. If an acknowledgement message is not received for a given message, the message should be resent until either an acknowledgement is received, or a limited number of retries have occurred. Of course, no acknowledgement to the fourth message is necessary under this scheme, so there is a terminal condition for the handshake transaction.

The four messages can be summarized as:

  1. The client asks for attention, challenging the server.
  2. The server responds to the challenge, challenging back.
  3. The client responds to the challenge.
  4. The server acknowledges reception of the challenge response.

Of course, for the handshake to allow authentication, a secret password must be shared between the two routers. This secret password should be an attribute for each router that belongs to a particular domain of control. It is recommended that the secret password be an octet stream of arbitrary length, such as a text string. The arbitrary length requirement will be limited by memory constraints on the router, but longer allowable lengths drastically decrease the probability that a brute-force attack will be successful. Although a password rotation and retirement scheme is recommended, it will not be further addressed by the protocols.

This scheme effectively provides authentication. We now discuss how it can also be used to provide integrity. A failure in message integrity should generate an acknowledgement of failure to a challenge, or no acknowledgement at all if the message is altered so much as not to be received by the designated machine due to critical modification to network or transport layer fields. An integrity mechanism is necessary. As one message digest is already being performed, it seems useful to simply include all message fields necessary to protect within the same digest. However, a true digital signature would still be required for the message to never be rejected for authenticity due to its integrity. This provides a difficult situation, in that public keys would need to be exchanged. Public keys could be provided in challenge messages, but this is of little merit. Sending the key with the message leaves it just as susceptible to modification as was done previously, as an intercepted message could be resent with a modified key so that the challenge still fails. Using a third party key distributor would add an extra party to the transaction, which is a violation of the requirements. The only solution would be to store public keys on each router, and the only really secure and scalable way to do that would be to have each router of a domain of control have the same public and private key pair, in which case symmetric key cryptography seems to be a better fit, as it can provide confidentiality as well.

Although having each router of a domain of control hold the same public and private key pair would be a good solution, as this information could simply be configured simultaneously with the domain secret password string and use whatever mechanism is used for that, it would seem to be simpler to integrate a failure/retry scheme to compensate for integrity problems and not be bogged down with extra cryptographic calculations. Under this scheme, challenge failures due to integrity cannot be distinguished from those due to incorrect password.

The case could occur that a hidden intermediary is situated between the two communicating routers, and gains control of the progress of the messages. It is not possible for it to successfully alter the two messages protected by a challenge response in the digest, but this is not true for the initial challenge by the client and the acknowledgement from the server. The only way to protect the first message would be to provide a unique challenging condition that would be agreeable to both parties. The only practical method for doing that would be to include a date and time field within the digest; however, the server would have to agree to that time, implying a degree of time synchronization between the two, which would likely require a new protocol and constant synchronization messages. A looser form of time synchronization could be to use a sequence of numbers. Every handshake message could include an incremented index in this sequence, and only messages with increasing indexes would be honored. This would require each router to maintain the last known index for each router with which it communicates, and the finite sequence could lead to message replays upon wrapping around. Rather than adding these complications to the handshake, the same failure/retry scheme used for challenges will be employed. If a response is not received, or a negative response is received, the process will be retried up to a certain limit, and can be diagnosed by administrators if a major disruption is detected in the logs, indicating such an intrusion. As a cryptographic checksum is not necessary for the first message, a less computationally intensive checksum should be used to allow for integrity under the aforementioned stressed network conditions.

The intermediary can also alter the unprotected acknowledgement message (the fourth message). The failure/retry scheme will be employed for this case as well, but protection will be necessary because the message has another purpose that will be discussed at the end of this section.

All of the identified spillway protocols act as having single transaction (request message causes response message) primitive operations. The point of the handshake is ultimately to allow for security in these protocol transactions. If the information is assumed to not be confidential, then it can be passed in any of the available given messages, as long as the request from the client is received before the response can be sent. There are three possible combinations, each with its advantages and disadvantages. Two combinations have the client sending its request in the first message, which is unacceptable as message integrity cannot be assured without challenging circumstances, a topic discussed previously. The only acceptable combination seems to be to include the request with the third message. The advantages to this are that the client has correctly authenticated the server at that time, and the client is authenticating itself with the response that is sent with the same message as the request. Because the request is bundled in the same message as the response, then it can be well-protected for integrity by the digest scheme. Without the challenge response tied to the digest, the message could possibly be reused later if intercepted. Unfortunately, this combination leaves the protocol response in the fourth message, which is not protected by the digest scheme. The same challenge can be reused for this last message, but a digest containing the protocol response must be recomputed.

The handshake mechanism should also handle the case when there is assumed to be no trust relationship between routers, meaning they believe each other to be of different domains of control. If this is the case, the only critical messages are three and four. It should therefore be possible to initiate a transaction with message three, ignoring the challenge/response scheme, and using faster or smaller digest schemes.

In summary, the handshake mechanism can be implemented as a four-message scheme underlying any spillway protocol traffic. It will require a message component in the message format, containing the handshake phase (numbered 1 through 4), a digest (variable scheme), a challenge (messages 1 and 2), and a challenge acknowledgement status (messages 3 and 4).


Message Format and Delivery

All of the previous chapters have built up to this. In this chapter, the spillway protocols are defined to meet the requirements previously presented. The basic algorithm for each of the previously defined protocols (Chapter IV, Sections C-E) is the handshake mechanism (Chapter V, Section C 2).

Field lengths will be described in bits, octets (8-bit bytes), and words (4 octets). Message format diagrams will be drawn with a width of one word, with some bit positions labeled from above to indicate field positions.

The requirement of using routable messages necessitates a basis in IP packets for the messages. The requirements for the protocol messages and enforcing reliability in their delivery imply the need for a connectionless protocol, narrowing the choices to IP [25], UDP/IP [27], and even ICMP/IP [24]. IP can be used alone as a lightweight datagram protocol, and UDP packets are fundamentally IP packets with UDP header information (assuming no IP fragmentation). This header information contains fields useful to UDP as a connectionless transport protocol. The spillway protocols will be implemented atop UDP packets to ease implementation, as tools more readily support using UDP because transport layer protocols were meant to carry application traffic, while IP is intended as a network layer protocol. The expected transition from the current version of IP, IPv4 [25], to IPv6 [7] is another good reason to use UDP. ICMP is a well-defined Internet infrastructure protocol that is not commonly extended, although it is the ideal choice of packet if the spillway protocols are to be widely accepted.

To avoid confusion and crowding of the UDP application protocol number space, all of the spillway protocols will be carried under one UDP application protocol code, and thus will use another means of differentiation. It is not necessary to commit to a particular UDP protocol number at this time, but it should be standardized should the proposed spillway protocols be accepted.

There are some fields that need to be of variable length. The format of any given section will be broken into fixed-length and variable-length components, with fixed components coming first, so that their locations with respect to the start of the section are constant. Lengths of the variable-length fields will be specified in the fixed-length section, so maximum lengths will be imposed. The situation in which it is absolutely unacceptable to have an upper limit to the number of elements in a list would warrant the use of a termination sequence, and that sequence would likely cause internal fragmentation, or wasted space, between list ends and word boundaries. The format layouts will be constrained so that sections begin at word boundaries.

External fragmentation could also occur if the packet is larger than the Maximum Transfer Unit (MTU) of a particular network stage that it must cross (see Sections 3.2 and 11.5 of [33]). If this were to happen, a router would break up a single IP packet containing the UDP datagram into multiple packets, each containing a portion of the UDP datagram. This does not affect message integrity, but it does adversely affect reliability, because all IP packets related to the one IP datagram must be received for the UDP datagram to be passed up to the transport layer. The goal is to avoid this by using small UDP datagrams that will fit under the MTU ceiling, but fragmentation is permissible. It is not necessary to enable the Do Not Fragment (DNF) bit of the IP header.

Handshake Header

The handshake mechanism provides a support layer for delivery of request/response transactions for the other protocols, and will therefore be treated as a security layer on top of which the spillway protocols operate. It will be designed so that the spillway protocol messages could operate correctly without it in the messages, should another solution be preferred.


Figure 4: Handshake Header Format

The previously identified fields required in a handshake message (see Chapter V, Section C 2 on page [*]) are the following, presented here in detail. Their format in the header is diagrammed in Figure 4.

challenge (C)
A challenge field is necessary in messages 1 and 2 of the handshake header. It is a variable length field.

challenge length (CL)
The challenge string allows for challenge strings of variable size. A length in words can allow for this variability, and can be zeroed for messages in which a challenge is not used, to keep consistent header format across handshake phases. An octet field provides sizes in the range 0-8160 bits, in increments of words.

challenge acknowledgement status (CA)
Messages 3 and 4 require the ability to deny service based on outcome of a challenge response. This can be accomplished easily with one bit of information in the handshake header, with the zero value being negative.

digest (D)
This field contains the message digest. Upon each inclusion, it is computed over all contents of the handshake header, the message content, if any, and fields to protect from the IP header, the source and destination IP addresses. Strictly speaking, it is a violation of the protocol stack layering to include such information from a lower level protocol, but the IP address is the only field that is truly necessary, and that information is generally necessary for all layers above the network layer, but is kept there in the more general location for consistency.

In messages when a challenge is provided for a particular domain of control, the challenge string and secret password should also be included. The order of field inclusion is important for the digest algorithm, so it must be preserved.

The ordering is as follows:

  1. Source IP address (1 word for IPv4)
  2. Destination IP address (1 word for IPv4)
  3. Complete spillway message, from handshake header down through specific protocol message (variable in length, with all undefined message bits set as zeros)
  4. Secret password (variable in length)

digest length (DL)
The message digest needs to be allowed to vary in length by number of words to allow for implementation of multiple digest schemes. This allows the protocol to support different strengths of digest algorithms based on security demands, or to change algorithms if a weakness is found.

The digest length field is one octet in length, providing the same range as challenge length.

A digest length of zero indicates that no digest was computed, as a digest algorithm with null output is trivial.

digest scheme (DS)
A number space for definition of different digest schemes is necessary to support use of those different digest schemes. The following schemes are immediately useful and are therefore defined here (may be rejected with negative handshake acknowledgement status if not supported):

checksum (0)
The standard IP style 16-bit checksum is located in the first two octets of the digest word. This is not a secure hash function, so its use should be confined to cases in which a challenge is not used. Furthermore, such an attempt should be rejected with a negative handshake acknowledgement status.

MD5 (1)
The 4-word output of the MD5 [28] digest algorithm is well-accepted in industry, particularly in the IETF protocol drafts [15][42].

SHA (2)
The 5-word output of the government standard SHA [43] digest algorithm has no known weaknesses.

RIPEMD-160 (3)
The 5-word output of the RIPEMD-160 [9] digest algorithm is based on MD5's predecessor, MD4.

This field is one octet in length, leaving 256 total possible schemes for use. Numbers beyond 127 can be used for proprietary schemes, as it is unlikely that there would ever be a need for more than 128 well-known schemes.

handshake phase(HP)
There are four identified handshake phases, requiring two bits to represent.

Spillway Header

It is necessary to have a format for spillway protocol information that is common to all protocols and messages. The messages themselves will have different formats between protocols, and will even have different formats between the request and response of the same message. The common information will be kept in another header, called the spillway header. The header contains the following fields, and their layout in the header is diagrammed in Figure 5.


Figure 5: Spillway Header Format

acknowledgement status (AK)
The acknowledgement is either positive or negative, and represented with a binary zero or one, respectively. Of course, this field is only useful in acknowledgement messages, so its default value when not in use should be zero. Deeper granularity in acknowledgement status, meaning further explanation of the response, could lead to security problems, as such granularity could be used to gain information about the router's configuration and lessen the difficulty of brute-force attacks.

domains of control (DC)
A list of domains of control supported by a given router should be presented to another router with the first message of the handshake mechanism in the spillway header. The second message will return a list containing either one domain or no domain identifiers, signifying a potential trust relationship or no trust relationship, respectively. Router policy can govern whether the return list is accepted or denied, but that would be handled appropriately by the handshake mechanism.

If there are multiple matching domains of control, the first match should be used so that further comparisons are not necessary. Therefore, there should be a designed order in the listing, with the most internal, or trusted, of domains listed first.

As previously stated in Chapter IV, Section B (page [*]), a domain identifier is an IP address. IPv4 [25] IP addresses are one word in length, but the next generation IP protocol, IPv6 [7], uses addresses that are four words in length. For IPv6 spillway-capable routers, this will need to be detected and the longer address used in the format.

domains of control list length (DCL)
This is the number of domains of control in the list. Considering word-length IPv4 addresses, this field provides for a maximum of 255 domains.

request/response identifier (RR)
In addition to the spillway protocol used for the message, a further bit of information is necessary to mark the message as the request or response. Although this could be identified using the handshake phase, this higher-level protocol should be able to operate in its absence should a different security/reliability mechanism be used.

session identifier (SI)
A simple means to identify to a router which persistent data a message is coupled with is to have an identifier bound to that data. For example, after hot traffic is defined, it is useful to alter the behavior of a router with respect to that traffic. If a new message is sent, it would either need to provide the hot traffic definition again or provide such a session identifier. It is likely that a session identifier can perform the same matching task much more efficiently, considering both matching speed and message size.

A double-octet (16-bit) field allows for $2^{16}$ concurrent sessions. A large field such as that also allows for the use of the field as a mask instead of a binary number, allowing for a simpler hardware implementation with the capability to defend against fewer concurrent attacks.

spillway protocol identifier (PI)
This message denotes the format of the spillway protocol message following this header. It needs to be at least two bits in length to distinguish the given protocol number, but is lengthened to seven bits to provide for future expansion and consume some unused header space. This extension also allows the router to consider it concatenated with the closely related request/response identifier field.

Hello Message

A hello message serves two functions: to discover other capable routers or to refresh a capable router listing in both routers. The initial handshake message may be a broadcast message, but subsequent messages will need to be unicast due to the challenge/response nature of the handshake.

The motivation for the hello protocol is presented in Chapter IV, Section C (page [*]).

Request

The request component of the hello message is trivially small, as the handshake mechanism is responsible for negotiating common domain of control (from information in the spillway header), and the purpose of the hello message is completely assumed.

Response

The response to a hello message can be negative in the absence of a common domain of control or for whatever reason router policy may decide. The acknowledgement bit is present in the spillway header, so that information is not part of the response format.

While a positive acknowledgement needs to say nothing further, a negative acknowledgement could opt to include a list of external domain participating routers that may be helpful. However, support for that data is already present in the spillway header.

Help Message

The help message is used for defining hot traffic and otherwise altering router behavior. It is the only command-oriented spillway protocol.

The motivation for the help protocol is presented in Chapter IV, Section D (page [*]).

Request

As this is a command protocol, the majority of the protocol information is buried in the request. A request will be constrained to making either one hot traffic definition, or to making one behavioral modification to ease implementation and simplify the acknowledgement message. Given this constraint, multiple definitions and modifications will each require separate messages, greatly increasing the total number of packets when considering the handshake mechanism (at least four packets per handshake).

The request has a variable-length component containing the given request, identified in type in the fixed-length component. The fields of the message are diagrammed in Figure 6 and described as follows:


Figure 6: Help Request Format

definition identifier (DI)
This field identifies an existing hot traffic definition on the server router for the particular help transaction. When the definition is not yet made, the value is not known. The bit field should be initialized to all zeros in this case, so the zero value is reserved. A full word of field width supplies $2^{32}$ possible identifiers, a figure realistically much greater than the expected need for identifiers, but not really a waste of space when considering the effect of internal packet fragmentation on the packet format. Also, such a large bit field could allow router implementations to choose practical identifier allocation schemes that are simple to implement, such as a unary scheme where the position of one active bit decides which definition is referenced, or the requesting router's IP address could actually be used, as it would be necessary persistent information in the router anyway (routers other than the creator may not access a given rule).

request (RT)
This variable length field is defined in format by the listing of known request types. There are a few well-known request types defined for the protocol, but the size of the identifier field allows for vendors to define custom operations useful for their system and spillway algorithm implementation.

request type identifier (RTT)
A 32-bit request type identifier is necessary to indicate which of the defined request formats the rest of the packet conforms to.

Request Type Formats

This section presents candidate request types for the help protocol. Of particular utility is the definition of hot traffic, on which everything in the spillway algorithms is based. The request type identifier (RTT) field is one word in length, so each value will be considered a string of 8-bit ASCII characters [5] to aid in recognizability.

hot traffic definition (hott)
The first possible rule indicates a definition of hot traffic. A hot traffic definition requires the following fields, formatted in Figure 7:


Figure 7: Hot Traffic Definition Format

destination IP address (IP)
The word-length IP address (IPv4) identifies the destination host or network of hot traffic packets.

netmask (NM)
Normally used as word-length, as it is not often transmitted, the netmask has two sections, a left section containing all ones and a right section containing all zeros. This unary field will be represented in five bits that identify which bit position contains the first zero, with the five bit binary representation for zero meaning a mask of all zeros, and the five bit binary representation of 31 indicating a mask of all ones. The left section containing ones of the expanded IP address indicates the corresponding bit positions of an IP address that must match those of the hot traffic definition for the packet to be labeled hot.

port (PT)
The two octet field from UDP or TCP headers (same header position) identifies the target application. To indicate that the field is not used, each bit of the field should be set to one.

spillway scheme (SS)
Three bits can be used to identify any of eight possible defense schemes based on this type of hot traffic definition. These schemes would basically identify how the hot traffic definition is used in defense. Two are identified at this time:

basic (0)
The standard threshold/blocking scheme tested in the testbed (see Chapter III, Section B).

rate limiting (1)
Rate limiting provides a scheme in which traffic is never allowed above threshold, without ever blocking for a duration as is normal in the spillway algorithm. Without blocking, it may be difficult to reduce hot spot effects that arose before the defense was configured, but rate limiting does address the bandwidth mismatch problem.

transport protocol (TP)
The octet field from the IP header identifies the higher level protocol, commonly 6 for TCP or 17 for UDP [23]. To indicate that the field is not used, each bit of the field should be set to one.

The significance of these fields is explained in Chapter IV, Section D 1 (see page [*]).

threshold rate (thrs)
This directive is followed by a one word long value in the request (RT) field. Its purpose is to modify the current spillway threshold parameter (see Chapter III). Its value is meant to hold an unsigned integer value of number of hot packets per second. The 32-bit range of the field is excessive by about an octet in length, but that addition to the packet length is trivial.

blocking duration (blck)
This directive is also followed by a one word long value in the request (RT) field. Its purpose is to modify the current spillway blocking duration (see Chapter III), and its value is meant to hold an unsigned integer value indicating the duration of time in milliseconds to discard hot traffic upon it surpassing the threshold rate.

avoid blocking (nblk)
This directive indicates that a router in the same hot traffic path is already blocking. Its value is meant to hold an unsigned integer value indicating the remaining duration in milliseconds of the blocking. This message could be propagated, adjusting the remaining duration at each hop, so that adjacent spillway routers can appropriately modify their behavior to avoid excessive simultaneous blocking of hot traffic. This is likely most useful for an adjacent router downstream with respect to hot traffic flow to reduce its blocking interval if it is unnecessarily large, given that an upstream router is blocking.

stop (stop)
This directive causes a router to effectively end a defense against the identified attack (identified by hot traffic definition identifier). The receiving router should also attempt to notify routers it enlisted for help as well. Statistics about the attack should be kept as long as resources or policy allow so that the attack can be analyzed when the hazardous conditions have ceased.

Many other directives are possible within the allocated space, but none are currently apparent. As more intelligent spillway algorithms are designed, new directives can be defined to support them without changing the protocol. Unsupported directives can be rejected early with a negative acknowledgement in the spillway header.

Response

The response needs to contain acknowledgement status of the request, as well as the identifier of the hot traffic definition on the serving router. The response message is identified as the response to a particular request by the session identifier in the spillway header. Because the acknowledgement status is held in the spillway header, there is only one necessary field in the response, diagrammed in Figure 8 and described as:


Figure 8: Help Response Format

definition identifier (RI)
This 32-bit field returns the identifier binding of the supplied definition or behavioral modification, so that the requesting router will later know how to identify it to the serving router.

The definition identifier (DI) field is common to both request and response, so it may be considered as yet another header.

Query Message

The query protocol is used for gathering information, particularly useful for performance statistics. The query protocol is much like the help protocol in that there is a user definable message, but the query message has that as the response rather than the request. It is necessary to have the definable response message to handle the statistical output; however, this definition must be set and agreed upon elsewhere, as format information is not available in the message itself.

The motivation for the query message is presented in Chapter IV, Section E (page [*]).

Request

The request message for this protocol is extremely simple as it contains only one field of fixed size and semantics. The format of the field in the message is diagrammed in Figure 9 and described as follows:


Figure 9: Query Request Format

response type (RST)
The response type is one word in length. It designates what values are to be returned in the response.

Response

The format of the response message is completed with the format given by the response type. For completeness, Figure 10 describes the format of the message's one field of variable length, described as follows:


Figure 10: Query Response Format

response (RS)
This field is variable in size, and depends on the chosen response type format indicated in the request message.

Response Type Formats

As a convenient mnemonic, the word-length response type format will be considered as four eight-bit ASCII characters [5]. The response type format could just as easily use another representation, as long as the resulting codes are standardized for implementation.

hotp
This response type represents a statistic useful for polling. The return value is one word in length and represents an unsigned integer, containing the number of hot packets through the router since the last polling. Packets that are dropped should be counted.

levl
The defense of a particular resource generates a control tree of spillway routers around that resource. This response type is one word in length, and represents the unsigned integer number of levels of participating routers rooted at the subtree of the responding router. This can be implemented as a recursive call. This measure is most useful in determining the paths of greatest flow of hot traffic.

thrs
This response type corresponds with the request type of the same name, and returns a field one word in length identifying the number of hot packets per second that are allowed through under the indicated hot traffic definition.

blck
This response type corresponds with the request type of the same name, and returns a field one word in length indicating the number of milliseconds that the router will block hot traffic packets under the specified hot traffic definition.

rout
This response type returns a list of routers that aided the responding router in a defense. This is most useful when the attack has ceased, and a single machine needs to consolidate information about the attack for analysis. In this scheme, it can query the helping routers directly so that recursive messages are not needed. The format of the response type is one initial word to indicate the length of the list (only the last octet should be necessary), and then the variable length list of supporting routers.

Other response types can be later defined to support more complex spillway algorithms or more detailed reconstructions of the events in the attack, and unsupported response types can always be refused early with a negative acknowledgement in the spillway header.

Retransmission Schemes

Reliability is the remaining algorithmic ambiguity to address in the spillway protocols. Reliability must be enforced for all packets sent, but UDP is an unreliable transport layer that has no standard reliability scheme. In this section we present the common techniques used for implementing retry schemes for reliability, and offer suggestions for reliability schemes for use in spillway routers. Of course, it does not matter what scheme a router uses, so long as reliability is enforced.

TCP, a reliable transport layer, has complex mechanisms for enforcing reliability that are in widespread use (see Chapter 21 of [33]). The basic relevant concepts from its scheme are the retransmission timer, round-trip time (RTT) measurement, and the backoff algorithm. There are other concepts used in TCP that are more relevant for long-term connections, and are not presented here. The retransmission timer is set upon sending each packet that needs an acknowledgement. If a timeout occurs, meaning that the set time has elapsed before the acknowledgement is received, then the packet is retransmitted. The transmission timer is set based on the expected time to receive the acknowledgement, the RTT. The RTT can be measured for each request/response transaction and used to update the appropriate record of that host. If a timeout occurs, the packet must be resent. It is usually not appropriate to immediately resend a packet when a timeout occurs, so a backoff algorithm is used to decide how long to wait. TCP typically use a particular exponential backoff algorithm in which the time to wait for retransmission starts at one second, and is doubled for each retry.

Spillway protocols need to operate under hazardous network conditions, with the network assumed to be saturated with traffic. Some spillway routers will be physically adjacent, while others will be separated by multiple network stages, so the expected RTT will vary. Also, help messages are much more likely to be transmitted under hazardous conditions than hello messages, so it does not make sense to expect the same performance.

It is recommended that hello transactions use the same smoothed RTT and exponential backoff of TCP. The client of a hello transaction can sample the RTT twice, while the server can do so only once (because of the handshake phases presented in Chapter V, Section C 2 on page [*]). This scheme should provide good performance for the hello process, and for the reconstruction process in some query messages.

However, the help messages and polling query messages must be made reliable under saturated network conditions. It can be safely assumed that there are few network stage hops between two spillway routers in these conditions, because a defense that is too distributed is not likely to be useful. Therefore, it is recommended that help and polling query messages be sent with a fixed timeout and backoff interval proportional to the RTT measured in low network load conditions. This will increase the probability of successful transmission at the expense of more traffic. The constant factors multiplying the RTT use in generating backoff and timeout durations should be determined by experimentation.


Summary

The spillway protocol suite is proposed as a realistic means for the implementation of spillway algorithms on routers. Due to security and reliability concerns, the protocols are complex to understand and more complex to implement, despite their minimalist design goals. Unfortunately, the proposed spillway protocols are simply a first draft, and likely require rigorous testing and refinement so that any possible weakness not considered can be found and corrected. Substantial development of possible spillway algorithms to intelligently and efficiently tailor a defense to an attack is also necessary, but this first attempt at creating useful spillway protocols addresses some of the complexities of a spillway router implementation.

As primitive as they are, the spillway protocols manage to present a new angle on quality of service. The traditional approaches to load balancing and congestion control can be augmented with a new form of filtering to reduce the excess and sometimes crippling noise that some malicious Internet abusers take pleasure in generating. This new approach to quality of service addresses a major problem of Internet sites. Due to the growing commercial nature of the Internet, the acceptance of spillway protocols likely depends on the fact that their implementation provides a marketable piece of hardware that is critical for maintaining commerce. If a loyal spillway router implementation can be shown to be fast and otherwise painless, the demand would be expected to be high.

It is hoped that this optimistic outlook will drive the continued development of spillway algorithms and protocols.

Bibliography

1
A. Barkley, J. Liu, Q. LeGia, M. Dingfield, and Y. Gokhale, “A Testbed for Study of Distributed Denial of Service Attacks (WA 2.4),” in Proc. IEEE Systems, Man., and Cybernetics Information Assurance and Security Workshop '00, West Point, New York, June 2000.

2
S. Bellovin, “CMP Traceback Messages,” Internet Engineering Task Force, http://search.ietf.org/internet-drafts/draft-bellovin-itrace-00.txt, July 2000. [Accessed Sept. 2000]

3
T. Berners-Lee, R. Fielding, and H. Frystyk, “Hypertext Transfer Protocol: HTTP/1.0,” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc1945.txt, May 1996. [Accessed Aug. 2000]

4
R. Braden, “Requirements for Internet Hosts: Communication Layers,” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc1122.txt, Oct. 1989. [Accessed Oct. 2000]

5
V. Cerf, “ASCII Format for Network Interchange,” University of California at Los Angeles, http://www.faqs.org/rfcs/rfc20.html, Oct. 1969. [Accessed Sept. 2000]

6
B. Croft and J. Gillmore, “Bootstrap Protocol (BOOTP),” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc0951.txt, Sept. 1985. [Accessed Sept. 2000]

7
S. Deering and R. Hinden, “Internet Protocol, Version 6 (IPv6),” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc2460.txt, Dec. 1998. [Accessed Oct. 2000]

8
W. Diffie and M. Hellman, “New Directions in Cryptography,” IEEE Trans. on Information Theory, vol. IT-22, pp. 644-654, 1976.

9
H. Dobbertin, A. Bosselaers, and B. Preneel, “RIPEMD-160: A Strengthened Version of RIPEMD,” in Proc. Fast Software Encryption: Third International Workshop, pp. 71-82, Cambridge, United Kingdom, Feb. 1996.

10
A. Dutta-Roy, “The Cost of Quality in Internet-style Networks,” IEEE Spectrum, vol. 37, no. 9, pp. 57-62, Sept. 1996.

11
T. ElGamal, “A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms,” IEEE Trans. on Information Theory, vol. IT-31, pp. 469-472, 1985.

12
L. Garber, “Denial-of-Service Attacks Rip the Internet,” IEEE Computer, http://www.computer.org/computer/articles/April/TechNews_400.htm, Apr. 2000. [Accessed Sept. 2000]

13
D. Howe, “Checksum,” The Free Online Dictionary of Computing, http://burks.bton.ac.uk/burks/foldoc/81/18.htm, Mar. 1996. [Accessed Oct. 2000]

14
D. Howe, “Cyclic Redundancy Check,” The Free Online Dictionary of Computing, http://burks.bton.ac.uk/burks/foldoc/35/27.htm, Aug. 1997. [Accessed Oct. 2000]

15
C. Huitema, J. Postel, and S. Crocker, “Not All RFCs are Standards,” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc1796.txt, Apr. 1995. [Accessed Oct. 2000]

16
C. Huitema, Routing in the Internet, Upper Saddle River, New Jersey: Prentice Hall, 2000.

17
B. Kaliski and J. Staddon, “PKCS #1: RSA Cryptography Specifications, Version 2.0,” ftp://ftp.rsasecurity.com/pub/pkcs/ascii/pkcs-1v2.asc, Sept. 1998. [Accessed Sept. 2000]

18
J. Kohl. “The Kerberos Network Authentication Service (V5),” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc1510.txt, Oct. 2000. [Accessed Sept. 2000]

19
J. Liu, K. Shin, and C. Chang, “Prevention of Congestion in Packet-Switched Multistage Interconnection Networks,” IEEE Trans. on Parallel & Dist. Systems, vol. 6, no. 5, pp. 531-541, May 1995.

20
C. Mann, “Net Versus Norm: The Slashdot Effect,” Forbes, http://www.forbes.com/asap/00/0221/043.htm, Sept. 2000. [Accessed Sept. 2000]

21
C. Meadows, “A Formal Framework and Evaluation Method for Network Denial of Service,” in Proc. IEEE Computer Security Foundations Workshop '99, Mordano, Italy, June 1999.

22
J. Nagle, “Congestion Control in IP/TCP Internetworks,” Ford Aerospace and Communication Corporation, http://www.ietf.org/rfc/rfc0896.txt, Jan. 1984. [Accessed Sept. 2000]

23
J. Postel, “Assigned Numbers,” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc0790.txt, Sept. 1981. [Accessed Sept. 2000]

24
J. Postel, “Internet Control Message Protocol,” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc0792.txt, Oct. 2000. [Accessed Sept. 2000]

25
J. Postel, “Internet Protocol,” Information Sciences Institute University of Southern California, http://www.ietf.org/rfc/rfc0791.txt, Sept. 1981. [Accessed Sept. 2000]

26
J. Postel, “Transmission Control Protocol,” Information Sciences Institute University of Southern California, http://www.ietf.org/rfc/rfc0793.txt, Sept. 1981. [Accessed Sept. 2000]

27
J. Postel, “User Datagram Protocol,” Information Sciences Institute University of Southern California, http://www.ietf.org/rfc/rfc0768.txt, Aug. 1980. [Accessed Sept. 2000]

28
R. Rivest, “The MD5 Message-Digest Algorithm,” Internet Engineering Task Force, http://www.ietf.org/rfc/rfc1321.txt, Apr. 1992. [Accessed Sept. 2000]

29
D. Russell and G. Gangemi, Computer Security Basics, Cambridge, Massachusetts: O'Reilly & Associates, 1991.

30
S. Savage, D. Wetherall, A. Karlin, and T. Anderson, “Practical Network Support for IP Traceback,” http://www.cs.washington.edu/homes/savage/traceback.html, Aug. 2000. [Accessed Sept. 2000]

31
B. Schneier, Applied Cryptography, Second Edition, New York: Wiley, 1996.

32
C. Schuba, I. Krsul, M. Kuhn, E. Spafford, A. Sundaram, and D. Zamboni, “Analysis of a Denial of Service Attack on TCP,” in Proc. IEEE Symposium on Security and Privacy '97, Oakland, California, May 1997.

33
W. Stevens, TCP/IP Illustrated, Volume 1, The Protocols, Reading, Massachusetts: Addison Wesley Longman, 1994.

34
A. Tanenbaum, Computer Networks, Third Edition, Upper Saddle River, New Jersey: Prentice Hall, 1996.

35
I. Witten, Managing Gigabytes, San Diego: Academic Press, 1999.

36
“CERT® Advisory CA-2000-01: Denial-of-Service Developments,” Carnegie Mellon Software Engineering Institute, http://www.cert.org/advisories/CA-2000-01.html, Jan. 2000. [Accessed Sept. 2000]

37
“CERT® Security Improvement Modules,” Carnegie Mellon Software Engineering Institute, http://www.cert.org/security-improvement/, July 2000. [Accessed Sept. 2000]

38
“Data Encryption Standard (DES),” National Institute of Standards and Technology (United States of America), http://csrc.nist.gov/fips/fips46-3.pdf, Oct. 1999. [Accessed Sept. 2000]

39
“Digital Signature Standard (DSS),” National Institute of Standards and Technology (United States of America), http://csrc.nist.gov/fips/fips186-2.pdf, Jan. 2000. [Accessed Sept. 2000]

40
“IEEE Standards for Local Area Networks: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications,” The Institute of Electronics and Electronics Engineers, New York, New York, 1985.

41
“ISS Security Alert: Denial of Service Attack Using the Trin00 and Tribe Flood Network Programs,” Internet Security Systems, http://xforce.iss.net/alerts/advise40.php, Dec. 1999. [Accessed Sept. 2000]

42
“Request for Comments (RFCs): Overview,” Internet Society (ISOC), http://www.rfc-editor.org/overview.html, Jan. 2000. [Accessed Oct. 2000]

43
“Secure Hash Standard,” National Institute of Standards and Technology (United States of America), http://www.itl.nist.gov/fipspubs/fip180-1.htm, Apr. 1995. [Accessed Sept. 2000]

44
“Strategies to Protect Against Distributed Denial of Service (DDoS) Attacks Cisco Newsflash,” http://www.cisco.com/warp/public/707/newsflash.html, Feb. 2000. [Accessed Sept. 2000]

45
“The Apache HTTP Server Project,” Apache Software Foundation, http://www.apache.org/httpd.html, Sept. 2000. [Accessed Sept. 2000]