Continuous advances in silicon technology have enabled a new generation of chip multiprocessors, called manycores. Current designs of these systems comprise many simple and energy-efficient processor cores for an unprecedented computational power and energy efficiency. Nonetheless, this book identifies three fundamental performance bottlenecks that prevent future manycores from scaling to larger core counts. In particular, highly contended synchronization in barriers and locks, along with the overhead due to coherency activity of hardware coherence protocols. To overcome such performance bottlenecks, three distinct and complementary simple and power-efficient hardware-based solutions have been proposed: GBarrier, GLock and ECONO, respectively. A comprehensive evaluation of these new architectures utilizing a current industrial tool flow, full-custom state-of-the-art technology, full-system simulation, and a representative set of current benchmarks, reveals that integrating these three hardware solutions constitutes a step forward for both power-performance efficiency and improved scalability in future manycore systems.