
When automatic video indexation rocks!
More than 80% of all data in a company is unstructured information. This includes emails, word documents, paper documents, images, web pages, videos and hundreds of additional formats. Unfortunately, attempts to use this immense and strategic resource often fail because many businesses lack the requisite technology to understand and effectively utilize content that resides outside of traditional structured databases.
The question is: what to do to understand this unstructured information? And the solution is offered by automatic indexation! Or, in other words, how to structure unstructured data.
FRANCE24, uses some of the audiovisual indexation tools of Autonomy - a leading company in automatic indexation - for its archive system. Thanks to these tools, through video and audio analysis, documentary files are automatically filled out as soon as they are created. Then FRANCE24 archivists complete the files. The goal is to provide users with as much information as possible before any human intervention!
The automatic indexation used in FRANCE24 consists of several technological solutions:
- Speech recognition: each video’s audio is analyzed, extracted, and indexed with the TimeCode. Even if this “speech to text” functionality is not perfect, it allows the extraction of some of the video’s concepts without any human intervention.

- The Automatic Scene Detection automatically generates a storyboard, that is to say a succession of shots that represent the video. Each time a transition is recognized a shot is captured. Just in a glance, users know what the video is about.

- Letter recognition: the system analyzes each image in order to detect letters or words inside the videos. So you can automatically catch the name of a guest in a program that appears on screen.
So that is what FRANCE 24 uses, but you can add many other functionalities:
- Speaker tone recognition: first we train the system with the tone of the people speaking who we want to be recognized. Then the system can analyze the videos and be able to tell which speaker is talking when in the video! Great functionality when, in a debate, you search for some specific sentences of one of the panelists.
- Face recognition: using the same method, we train the system with the faces of the people we want to be recognized in the videos. Then the system can analyze the videos and tell who are the people in the video and when they appear!
Impressive
You could create a great buzz about that if you had a sample of video, something like the latest embedded video with:
- The unedited speech to text result,
- Automatic Scene Detection,
- Letter recognition.
Random thoughts:
+ Do you use it for SEO purposes?
+ What is the video editing workflow?
+ Did you try to fussy-match teleprompter's text of the news presenter with the video to have clean text?
Could this technologies also be used to manage UGC like video comments?
Sorry for not answering
Sorry for not answering sooner but, as most people do in Summer, I was on holidays! :)
In addition to what I wrote in the article and to answer to your questions:
I unfortunately do not manage to capture a video to show how it works in our archive system : I am trying to find a tool to capture videos on my screen but unsuccessfully until now.
Here is our workflow:
FRANCE24 archivists select and send the videos they want to archive through an editing software (Avid Newscutter). Then, Autonomy servers automatically scan them in order to generate some primary metadatas : the speech-to-text, the storyboard and the letter recognition (all time-coded)
Once this automatic indexation is done, a file is created that the archivists can complete : for now, we can describe better than a human eye!
And all the files can be consulted through a video search software based on an autonomy search engine (IDOL).
So, for the moment, we only uses it in order to find more easily our archive videos.
Your idea to fussy-match teleprompter's text of the news presenter with the video to have clean text is a really good idea but a little bit tricky to implement!
The first step we have done is that the archivists copy-paste the scripts of the videos during the manual indexation step: So we have both the speech-to-text, approximative but linked with the time code, and the script, perfect match but not linked with the time-code.
Speech and ROI
Nice to see that a french TV is investigating in speech recognition technologies to index, make them searchable, and overall monitize videos.
My (small) company in working in this field since 2006 and we are providing consulting and speech recognition solutions for Video SEO (VSEO).
I have a question reguarding your projects with MS and Autonomy, how did you calculate your ROI ?
Thanks for your response,
Regards
Thierry MICHEL
Senior IT Consultant