Networked 3D virtual environments allow multiple users to interact with each other over the Internet. Users can share some sense of telepresence by remotely animating an avatar that represents them. However, avatar control may be tedious and still render user gestures poorly. This work aims at animating a user’s avatar from real time 3D motion capture by monocular computer vision, thus allowing virtual telepresence to anyone using a personal computer with a webcam. The approach followed consists of registering a 3D articulated upper-body model to a video sequence. The first contribution of this work is a method of allocating computing iterations under real-time constrain that achieves optimal robustness and accuracy. The major issue for robust 3D tracking from monocular images is the 3D/2D ambiguities that result from the lack of depth information. As a second contribution, this work enhances particle filtering for 3D/2D registration under limited computation constrains with a number of heuristics, the contribution of which is demonstrated experimentally. A parameterization of the arm pose based on their end-effector is proposed to better model uncertainty in the depth direction.