A vision–based 3D scene analysis system is described that is capable to model complex real–world scenes like old building, bridges and vestiges automatically from a sequence of calibrated images. Input to the system is a sequence of calibrated stereoscopic images which can be taken with a hand held camera. The camera is then moved throughout the scene and a long sequence of closely spaced views is recorded. A multi-view algorithm is used to link the corresponding points along a sequence of images. 3D model is reconstructed using triangulation directly from the image sequence, which allows fusing 3D surface measurements from different viewpoints into a consistent 3D model scene using a Kalman filter. The surface geometry of each scene object is approximated by a triangular surface mesh which stores the surface texture in a texture map. From the textured 3D models, realistic looking image sequences from arbitrary view points can be used in many applications. We demonstrate the successful application of the approach to several outdoor image sequences for some famous Egyptian vestiges in a framework that aims to electronically document Egypt’s cultural and natural heritage.