In the past 20 years, a lot of research has been done towards visual information retrieval on pictures and video files. Not all of it has been successful. But on the last years, the quality of these visual search engines has reached levels that are beginning to be acceptable for eDiscovery, compliance, law enforcement and intelligence applications.
More and more electronically stored information (ESI) is non-text based or does not contain any searchable text components: sound recordings, video and pictures are growing exponentially in size and more and more collaborative and social network applications support (only) these information formats. In addition, a whole generation is growing up that no longer uses written communication forms such as letters or emails: they only use social networks and other new media forms for communication and collaboration.
Search Challenges in Visual Information Retrieval
This transformation results in a huge future search problem, because:
- Electronic files containing one of more text components or embedded objects with text components can be searched by using text-based queries.
- Document scans (images) and even pictures can be enriched with the text of the original document or even with recognizable logo’s in the pictures. The same technology can also be applied to video shots.
- Audio and the audio component of a video file can be processed by a phonetic search engine and users can search the content by looking for specific words or phoneme sequences.
- In addition, audio-, pictures- and video files can be searched on contextual information such as the file name, added meta-information or text that surrounds the picture or the video on a web page.
Furthermore, in general, it is not possible to search a picture or a video on its content.
Web search engines such as Google, Bing and Yahoo use primarily contextual text information from pictures and video’s to search on these object. This text can be tagged by users or can be found in the file name, file location, surrounding text on the webpage, etc. In some cases, words that are recognized in the images and videos with Optical Character Recognition (OCR) technology is used, or nudity is recognized and filtered, but that is about it. There is not or limited influence from pure visual information retrieval technology such as: give me all outdoor pictures or all images with a helicopter in it.
Additional Challenges in Visual Information Retrieval
There are a number of additional challenges in visual information retrieval that are related to the various input formats of files, internal encoding and compression (aka Codex for video), the query format (query by example of query by text), the result list format (text-based or visual-based result navigation with thumbnails and video summaries) and the viewer for the image and video files.
State-of-the-art visual search technology should address all of these aspects and support both text-based as image or video example based querying, result navigation and viewing.
Various image input formats are more or less supported by the ZyLAB Platform, but for proper video support, one needs one of many Codex engines in order to view the video. In some examples, video is treated as a “set of images” without taking into account the proper temporal relationships. Others have a more thorough and complete internal representation, allowing for faster and more accurate viewing and navigation.
The best approach is to convert all videos and images to one common format with the same dimensions, codec and compression. Only then, extracted image features can be compared properly. There are a number of open source standards to realize this. Most vendors use the same open source LIBAVCODEC libraries form the FFMPEG project.
Enormous File Sizes
Images and videos in particular can be of enormous file size. 20 Gb for a video file is more rule than exception. As a result, processing the data often required specialized hardware with very fast and large hard disks and special graphical processing power. Viewing files requires smart streaming techniques to prevent band width overload.
There are many open source solutions available to solve these problems and many vendors use the same open source libraries.
Browsing Video and Images
When searching images and videos, the best result is almost never on the #1 position. It is even possible that it is not among the first 10! Ranking images is based on complex statistics and other mathematical properties that are not always intuitive to humans. Users need a much more exploratory and visual result list that uses all available dimensions when searching images and videos.
A result list as it is used in text-based information retrieval does not work for searching images and video.
An example of a well defined video or image result list is shown hereunder (University of Amsterdam Forkbrowser, http://www.science.uva.nl/research/mediamill/demo/forkbrowser.php):
Use cases for Visual Information Retrieval
There are many use cases in the field of visual information retrieval varying from searching pictures on the internet to recognizing faces of hooligans at the entrance of a high risk football match, monitoring airports with surveillance cameras and investigating child abuse.
Many of these applications are highly specialized applications requiring a lot of specialized knowledge and experience to work effectively.
With all of the recent developments in deep learning, real visual information retrieval is now a reality. As more and more annotated training data is available and as techniques such as training data augmentation and transfer learning work better every day, we can expect visual information retrieval to become common functionality in our everyday search tools soon.