In this book, the problem of video object detection has been addressed. The object is detected by integrating the spatial segmentation as well as temporal segmentation. The spatial segmentation of frames has been formulated in spatio-temporal framework. A Compound MRF model is proposed to model the video sequence. This model takes care of the spatial and the temporal distributions as well. Besides taking in to account the pixel distributions in temporal directions, it also model the edges in the temporal direction. This model has been named as edgebased model. The MAP estimates of the labels have been obtained by a hybrid algorithm and is devised by integrating that global as well as local convergent criterion. Similarly temporal segmentation is obtained by a proposed entropy based window growing scheme. The spatial and temporal segmentation have been integrated to obtain the Video Object Plane (VOP) and hence object detection.