Sunday, February 25, 2007

Blinkx and searching videos by transcribing the audio of the video into text and tagging the video with it..

Here is an interesting article from NYTimes about how Blinkx.com does video search (just as image search, effective video search
needs to be done in terms of textual descriptions. Unlikes images, videos do have sound track that does sometimes provide
auditory annotations. Blinkx claims that it does speech recognition on these sound tracks, transcribes the recognized words as
tags on the video--and uses them to retrieve the video).

Rao

February 25, 2007

Millions of Videos, and Now a Way to Search Inside Them

THE World Wide Web is awash in digital video, but too often we can't find the videos we want or browse for what we might like.

That's a loss, because if we could search for Internet videos, they might become the content of a global television station, just as the Web's hypertext, once it was organized and tamed by search, became the stuff of a universal library.

What we need, says Suranga Chandratillake, a co-founder of Blinkx, a start-up in San Francisco, is a remote control for the Web's videos, a kind of electronic TV Guide. He's got just the thing.

Videos have multiplied on social networks like YouTube and MySpace as well as on news and entertainment sites because of the emergence of video-sharing, user-generated video, free digital storage and broadband and Wi-Fi networks.

Today, owing to the proliferation of large video files, video accounts for more than 60 percent of the traffic on the Internet, according to CacheLogic, a company in Cambridge, England, that sells "media delivery systems" to Internet service providers. "I imagine that within two years it will be 98 percent," says Hui Zhang, a computer scientist at Carnegie Mellon University in Pittsburgh.

But search engines — like Google — that were developed during the first, text-based era of the Web do a poor job of searching through this rising sea of video. That's because they don't search the videos themselves, but rather things associated with them, including the text of a Web page, the "metadata" that computers use to display or understand pages (like keywords or the semantic tags that describe different content), video-file suffixes (like .mpeg or .avi), or captions or subtitles.

None of these methods are very satisfactory. Many Internet videos have little or obscure text, and clips often have no or misleading metadata. Modern video players do not reveal video-file suffixes, and captions and subtitles imperfectly capture the spoken words in a video.

The difficulties of knowing which videos are where challenge the growth of Internet video. "If there are going to be hundreds of millions of hours of video content online," Mr. Chandratillake said, "we need to have an efficient, scalable way to search through it."

Mr. Chandratillake's history is unusual for Silicon Valley. He was born in Sri Lanka in 1977 and divided his childhood among England and various countries in South Asia where his father, a professor of nuclear chemistry, worked. Then he studied distributed processing at Kings College, Cambridge, before becoming the chief technology officer of Autonomy, a company that specializes in something called "meaning-based computing." This background possibly suggested an original approach to search when he founded Blinkx in 2004.

Mr. Chandratillake's solution does not reject any existing video search methods, but supplements them by transcribing the words uttered in a video, and searching them. This is an achievement: effective speech recognition is a "nontrivial problem," in the language of computer scientists.

Blinkx's speech-recognition technology employs neural networks and machine learning using "hidden Markov models," a method of statistical analysis in which the hidden characteristics of a thing are guessed from what is known.

Mr. Chandratillake calls this method "contextual search," and he says it works so well because the meanings of the sounds of speech are unclear when considered by themselves. "Consider the phrase 'recognize speech,' " he wrote in an e-mail message. "Its phonemes ('rek-un-nise-peach') are incredibly similar to those contained in the phrase 'wreck a nice beach.' Our systems use our knowledge of which words typically appear in which contexts and everything we know about a given clip to improve our ability to guess what each phoneme actually means."

While neural networks and machine learning are not new, their application to video search is unique to Blinkx, and very clever.

How good is blinkx search? When you visit blinkx.com, the first thing you see is the "video wall," 25 small, shimmering tiles, each displaying a popular video clip, indexed that hour. (The wall provides a powerful sense of the collective mind of our popular culture.)

To experiment, I typed in the phrase "Chronic — WHAT — cles of Narnia," the shout-out in the "Saturday Night Live" digital short called "Lazy Sunday," a rap parody of two New York slackers. I wanted a phrase that a Web surfer would know more readily than the real title of a video. I also knew that "Lazy Sunday," for all its cultish fame, would be hard to find: NBC Universal had freely released the rap parody on the Internet after broadcasting it in December 2005, but last month the company insisted that YouTube pull it.

Nonetheless, Blinkx found eight instances of "Lazy Sunday" when I tried it last week. By contrast, Google Video found none. Typing "Lazy Sunday" into the keyword search box on Google's home page produced hundreds of results — but many were commentaries about the video, and many had nothing to do with "Saturday Night Live."

Blinkx, which has raised more than $12.5 million from angel investors, earns money by licensing its technology to other sites. Although Blinkx has more than 80 such partners, including Microsoft , Playboy, Reuters and MTV, it rarely discloses the terms of its deals. Mr. Chandratillake said some licensees pay Blilnkx directly while others share revenue and some do both. Blinkx has revealed the details of one deal: ITN, a British news broadcaster, will share the revenue generated by advertising inserted in its videos.

For all of Blinkx's level coolness, there are at least three obvious obstacles to the company's success.

First, because Google Video is not much good now doesn't mean it won't get better: after all, when Blinkx was founded, it first applied machine learning to searching the desktops of personal computers, a project that was abandoned when Google and Microsoft released their own desktop search bars.

Second, even if Google improbably fails to develop effective video search, the field will still be crowded: TruVeo, Flurl, ClipBlast and other start-ups are all at work on different subsets of the market.

Finally, Blinkx might not go far enough in searching the content of videos: the company searches their sounds, but not their images.

THIS last objection is the most serious.

"Because Blinkx emphasizes speech recognition, there is a great amount of multimedia content that they cannot address, like photographs," said John R. Smith, a senior manager in the intelligent information management department of I.B.M.'s T. J. Watson Research Center in Hawthorne, N.Y. "But what's worse, speech is not a very good indicator of what's being shown in a video."

Mr. Smith says he has been working on an experimental video search engine called Marvel, which also uses machine learning but organizes visual information as well as speech.

Still, at least for now, Blinkx leads video search: it searches more than seven million hours of video and is the largest repository of digital video on the Web.

"Search is our navigation, our interface to the Internet," said John Battelle, chief of Federated Media Publishing and author of "The Search," an account of the rise of Google. With Blinkx, we may have such an interface for digital video, and be a little closer to Mr. Chandratillake's vision of a universal remote control.


No comments: