The past decade will be remembered as one of maturity for Artificial Intelligence (AI). Successful applications such as Google Goggles, Siri, IBM Watson have positively impacted people’s everyday life. These systems are able to interpret in real-time highly complex natural signals, in the form of text, audio or video data: a task thought the exclusive domain of human intelligence before the two-thousands. This book discusses methods of computer vision, a branch of AI where the input for the system is represented by images and videos depicting visual scenes. Most computer vision tasks have the objective of recognizing visual concepts such as the presence of a particular object or the occurrence of a specific event in the input data. These systems learn visual concepts through examples (i.e. images) which have been manually annotated by humans. While this paradigm allowed the field to tremendously progress in the last decade, it has now become one of its major bottlenecks. This work tap into the wealth of visual data available on the net and presents methods able to exploit this information to learn visual concepts without the need of major human annotation effort.