Synthesis methods which do not require symbol inputs, such as articulatory synthesis, are useful in continuous speech synthesis and pitch control based on dynamic body motion, in which there are no inherent symbols. Conventional applications based on these methods, however, are strongly dependent on their input media because those applications are designed to make use of their specific characteristics. Once an application is constructed for one media therefore, its methodology is difficult to apply to another media. Considering this point, we treat speech generation from body motion as a mapping problem between different media, non-acoustic media to speech, and propose a media-independent methodology. As one example of our methodology, media conversion from hand motion to speech is discussed. In recent years, the GMM-based statistical mapping techniques have become widely used for voice conversion. Using similar techniques, we have developed a speech generation system which maps gesture space to vowel space and converts hand motions to vowel transitions. In this study, we focus attention on the design of the system.