This work presents latency optimizations for a specific hardware architecture, which was developed based on the combination of different design paradigms and thus requires sophisticated design optimizations. The architecture comprises synchronous and systematic bit-serial processing without a central controlling instance. It was patented in 2004 and targets future high-speed applications due to the abdication of long wires. So-called routers, achieving a reconfigurable system, can overcome the application specificity of the basic version of the architecture. This work focuses on the challenge of latency optimizations also covering data synchronization problems when implementing the architecture. We propose and evaluate several variations for the realization. The latency of an evaluated IDCT implementation was reduced from 167 down to 67 clock cycles. The throughput of that implementation was improved by about 17%, while, as a side effect, area consumption was also reduced.