This book is a first effort to make a step further in the understanding of speech recognition mechanisms in humans. The starting point of the ideas here presented goes back to the early language scientific theories, which have been followed in time, by a set of psychoacoustic experiments, models, and technical realization attempts. An hypothesis is assumed, which is called 'multi-granular': the human auditory system needs that more parallel cognitive functions operate a chunking on the unfolding of the information over time, to catch all the information coming from the signal. The left-to-right speech stream is captured in a multilevel grid in which several linguistic analyses take place simultaneously. Here, I present an example of realization of a multi-granular automatic speech recognizer. Dynamics coming from the signal, which are segmental or supersegmental in nature, are caught in a single model which tries to take the best of them, in order to improve system performances.