Deployable speech translation (ST) systems typically need to be trained on (1) hundreds of hours of manually transcribed speech audio; (2) bi-lingual text corpora of manual translations, often comprising tens of millions of words; and (3) monolingual text corpora, often comprising hundreds of millions of words. Therefore, ST system development is very costly and requires months or even years of effort. Such a delay is unacceptable for many situations that call for rapid development of automatic ST solutions, as given by disaster relief operations or military operations. Urgency, combined with the absence of automatic ST solutions, consequently necessitates the deployment of interpreters in these situations. In this work, we develop methods to directly train ST systems on audio recordings of interpreter-mediated communication. By employing unsupervised and lightly supervised training techniques, the proposed methods allow us to omit most of the manual transcription effort and all of the manual translation effort that has typically characterized ST system development. Thus, the amount of costly and time-consuming human supervision is substantially reduced.