Writing for the New York Times Science Desk, John Markoff reports on how computer vision and machine learning will create the next generation Internet where search engines find images and videos with the same degree of relevance as they do now with text.
And the need is crushing… in the next 60 seconds, YouTube will have uploaded 72 hours of video.
Today, unless images and videos are labeled, search engines have no way to match them against your query. Even then, labels can be unreliable (e.g., “junk” versus the objects that comprise it).
To give search engines something akin to human sight, Stanford’s Dr. Fei Fei Li has teamed up with fellow computer scientists at Princeton to develop ImageNet, the world’s biggest image database.
Given the enormity of the task and limited budget, Dr. Li connected with Mechanical Turk, the Amazon.com crowdsourcing system where, for a small payment per task, humans label photos. The database now has over 14 million images in over 21,000 categories thanks to the efforts of nearly 30,000 participants a year.
As the database of labeled images grows, machine learning algorithms enable software to recognize similar, unlabeled images. Over time, accuracy rates improve dramatically.
Surprisingly, when tested on a large collection of labeled images by Google computer scientists Andrew Ng and Jeff Dean, the system nearly doubled the accuracy of previous neural network algorithms designed to model human thought processes.
To further improve speed and accuracy, images are classified against WordNet, a hierarchical database of English words. With skillful programming to make educated choices about how to search the hierarchy, the database continues to rise to this growing challenge.
See the full article here.