With hundreds of cores on processor in existing technology and near future, it becomes extremely difficult to map the application software efficiently to the hardware platforms. The software must be tuned manually to gain the performance and is a non-trivial task burdening the software programmer. To evade this problem high level languages such a OpenMP, MPI etc. are used. These languages abstract the hardware details and provide a clean interface to the software programmer. Mapping of the primitives defined in such languages is a challenging task. We use the hardware communication and synchronization primitives that are provided in SARC multi-processor architecture in order to accelerate these primitives. We see enormous performance gains as compared to the state of the art processors. For evaluating, we run various OpenMP benchmarks.