Wide range of applications in Machine Learning and Data Mining (MLDM) area have increasing demand on utilizing distributed environments to solve certain problems. It naturally results in the urgent requirements on how to ensure the reliability of large-scale graph processing systems. In such scenarios, machine failures are no longer uncommon incidents. Traditional rollback recovery in distributed systems has been studied in various forms by a wide range of researchers and engineers. There are plenty of algorithms invented in the research community, but not many of them are actually applied in real systems. In this book, we proposed two failure recovery mechanisms specially designed for large-scale graph processing systems. To better facilitate the recovery process without bringing in too much overhead during the normal execution of the large-scale distributed systems, our mechanisms are designed based on an in-depth investigation of the characteristics of large-scale graph processing systems and their applications.