Computer vision: Beyond the images


Posted on: Apr 25, 2012 by Francisco José Ruiz Jiménez

What distinguishes an object, a person, an animal of another? What is the essence that gives entity to their nature, how are we able to feel it? These questions have been asked ??for centuries in Philosophy, perhaps the story that best illustrates this point took place in ancient Greece. In the Dialectic and Dialogues from Plato, it is argued that any specie could be identified by its genus and specific difference, so the man was described as a "featherless biped animal". Plato explained this definition to Diogenes, the cynic refuted it by plucking a chicken and dropping it into the Plato’s Academy while saying, "Behold! I've brought you a man.”

Obviously Plato's definition was not formally and ontologically right, it didn’t define the nature of man, so it didn’t identify its essence as human. Bringing it into image recognition technology, the possibility to identify objects is a key point, a really complex issue, just a few examples to illustrate the difficulty of this in order to implement this capacity in artificial vision systems. Humans are able to identify and distinguish an eagle in many different situations: we can identify and eagle on the ground, flying, with wings spread or distinguish it from a hawk, a vulture or an owl. This is still an unresolved problem in computer vision and image recognition technology, something which becomes more important day by day.

The image recognition process has been developed very significantly from the 80's. Its application has been crucial in industrial processes: recognition of numerical data, bar codes or shapes and patterns recognition in artificial vision. In the same way, it has been very important in the process of digitization and character recognition (OCRs), which more recently have led to automatic translation processes, Word Lens for iPhone is the perfect example of what can be performed with this and augmented reality technology.

An interesting area in terms of image processing is facial recognition, programs like Picasa, cameras or social networks like Facebook have developed this feature, they can identify people and classify photos based on learning patterns from facial features of the people. This kind of developments are also important in video surveillance systems at airports and security systems, they have been also used for unlocking mobile terminals, as Android does in recent ICS version.

Computer vision has also application in domestic and leisure, whose best example is Microsoft Kinect. Also in the field of automotive where more and more systems are based on image computer vision principles: identification of tracks, signals, parking support systems and more recently autonomous driving systems developed by Google and traditional manufacturers, which recently have been allowed in USA states like Nevada.

Image recognition technology is a promising area in the coming years: Kinect 2 is supposed to add new features such as facial recognition or analysis of mood, and many other areas like robotics or surveillance systems. However, there remains the most complex problem: identify the nature of objects. The content inherent in the images, which could support searches and classify the content beyond the associated metadata: knowing whether it is adult content, violent scenes, or classify images according to their themes sports, documentary, action. This is an area where Google has been working for a long time, and whose most promising development has been so far Google Goggles, which is nothing but an image recognition system based on patterns, far from recognizing the nature of the forms and essence that allows us to identify objects and images. However, recent announcement of Project Glass is really promising; it combines the most advanced techniques of augmented reality with other technologies as context aware computing or image recognition, and will perform for sure a complete integration of the most advanced techniques. Could it be the beginning of full image recognition including its real nature in the search of images? Only time will tell.

Share this blog article


About Francisco José Ruiz Jiménez

Technical Architect Manager and member of the Scientific Community
Francisco José Ruiz Jiménez is Technical Architect Manager at Atos Spain. He has a Computer Science Degree at Universidad Politécnica de Madrid. He is CTO of BPS Iberia and member of Atos Scientific Community since 2010. He began his professional career in a research project for ONCE (Spanish Blind People Organization) providing a way to improve computer management for people with lacks of vision. He joined Atos in 1998 developing his career mainly in Telecom market during 10 years, working for the main Telco clients (Orange, Vodafone, Telefónica, etc.). As Senior Architect he has worked providing technical solutions in BSS, from provisioning to billing systems. He joined Atos BPS Iberia organization in 2008, leading Technical Architecture Team, providing global solutions in different markets (Telco, Finance or Manufacturing). His expertise covers telecom knowledge, auditory, technical and enterprise architecture, and also research and innovation regarding new technologies.

Follow or contact Francisco José Ruiz Jiménez