Currently, clusters of PCs are considered a cost-effective alternative to large parallel computers. In these systems thousands of components are connected through high-performance interconnection networks. Among the high-performance network technologies available to build clusters, InfiniBand (IBA) has emerged as a new standard interconnect suitable for clusters. Indeed, has been adopted by many of the most powerful systems currently built (top500 list). As the number of nodes increases in these systems, the interconnection network grows accordingly. Along with the increase in components the probability of faults increases dramatically, and thus, fault tolerance in the system, in general, and in the interconnection network, in particular, becomes a necessity. Unfortunately, most of the fault-tolerant routing strategies proposed for massively parallel computers cannot be applied because routing and virtual channel transitions are deterministic in IBA, which prevent packets from avoiding the faults. This book focuses on methodologies for providing adequate levels of fault tolerance to PC clusters, specially tailored to IBA networks.