The increased use of video documents for multimedia-based applications has created a demand for strong video database support, including efficient methods for browsing, retrieving and classifying video data. Most solutions rely on visual information only, ignoring the rich source of the accompanying audio signal and texts. The semantic gap between mental and computational world is not yet surmounted by current algorithms and systems. Hence, there is a need for much research effort. Speech is the significant information that has a close connection to video contents. The closed caption text facilitates the acquisition of the video transcript. It provides visual text to describe dialogue and sound effects in video documents. In this study, two approaches are proposed; one for rating or classifying video scenes and the other for retrieving video scenes. Both approaches are based on the utilization of the Arabic closed caption text. Some techniques such as the Arabic light stemming and the Rocchio classifier are employed in the proposed approaches.