BYU Engineer Creates Unsupervised Machine Learning Algorithm

Machine learning is the newest thing at BYU, thanks to the work of engineer Dah-Jye Lee, who has created an algorithm that allows computers to learn without human help. According to Lee, his algorithm differs from others in that it doesn’t specify for the computer what it should or shouldn’t look for. Instead, his program simply feeds images to the computer, letting it decide on its own what is what.

Photo courtesy of BYU Photo.
Photo courtesy of BYU Photo.

Similar to how children learn differences between objects in the world around them in an intuitive way, Lee uses object recognition to show the computer various images but doesn’t differentiate between them. Instead, the computer is tasked with doing this on its own. According to Lee:

“It’s very comparable to other object recognition algorithms for accuracy, but, we don’t need humans to be involved. You don’t have to reinvent the wheel each time. You just run it.”

Machine Learning Exemplified By The Never Ending Image Learner

Photo courtesy of Coursera

Meet NEIL, the Never Ending Image Learner. Its mission? To scour the Internet, 24 hours a day, seven days a week, building and strengthening its database. Its goal? To teach itself common sense.

Photo courtesy of Coursera
Photo courtesy of Coursera

Of course, computers can’t think, reason, or rationalize in quite the same way as humans, but researchers at Carnegie Mellon University are using Computer Vision and Machine Learning as ways of optimizing the capabilities of computers.

NEIL’s task isn’t so much to deal with hard data, like numbers, which is what computers have been doing since they first were created. Instead, NEIL goes a step further, translating the visual world into useful information by way of identifying colors and lighting, classifying materials, recognizing distinct objects, and more. This information then is used to make general observations, associations, and connections, much like the human mind does at an early age.

While computers aren’t capable of processing this information with an emotional response–a critical component that separates them from humans–there are countless tasks that NEIL can accomplish today or in the near future that will help transform the way we live. Think about it: how might Computer Vision and Machine Learning change the way you live, work, and interact with your environment?

Google, NASA Launch Quantum Artificial Intelligence Lab

Robin Wauters at TNW reports on Google’s move to establish a Quantum Artificial Intelligence Lab inside of NASA Ames Research Center.

Quantum computing holds out the promise of actual parallel processing.

Google Logo Sign (c) TheNextWeb.com
Google Logo Sign (c) TheNextWeb.com

While your smart device of today may appear to be multi-tasking with GPS, text messaging and music streaming all running at once, in reality, it’s cycling between these tasks, serially.

Computers have been operating this way since the computer age began.

Quantum computers, on the other hand, would address simultaneity from the ground up. They would perform many operations in parallel and be well-suited to machine learning where there’s a need to search instantly through a myriad of possibilities and choose the best solution.

One of the more controversial aspects of quantum computing’s massive potential is to render today’s data encryption technologies, obsolete.

(For a surprisingly easy-to-follow explanation of the difference between classical computing versus quantum computing, see  this 1999 article by Lov K. Grover, inventor of what may be the fastest possible search algorithm that could run on a quantum computer.)

One focus of the lab will be to advance machine learning. Google Director of Engineering, Hartmut Neven blogs:

Machine learning is all about building better models of the world to make more accurate predictions.

And if we want to build a more useful search engine, we need to better understand spoken questions and what’s on the web so you get the best answer.

The new lab will be outfitted with a D-Wave Systems quantum computer. NASA, Google, and Universities Space Research Association (USRA) plan to invite researchers worldwide to share time on  the quantum computer starting in Q3 2013.

The lab will serve as an incubator of practical solutions that require quantum computing. Neven goes so far as to write:

We actually think quantum machine learning may provide the most creative problem-solving process under the known laws of physics.

Machine Learning Predicts Students’ Final Grades As the Course Unfolds

“Can we predict a student’s final grade based on his or her behavior in the course so far?”

Writing for the Wall Street Journal, Don Clark showcases Canadian company Desire2Learn, a provider of cloud-based learning systems to enterprises and academia that recently announced this very capability.

With 10 million learners over 14 years, the company has collected detailed records on student engagement with instructional materials and their subsequent performance on tests.

Desire2Learn has developed machine learning algorithms it applies to its historical data that make predictions of how students will fare as the course unfolds.

Such predictive analysis serves as an early warning signal so instructors can give at-risk learners the additional, personalized attention they need, when they need it most.

The company’s CEO, John Baker, claims Desire2Learn’s algorithms yield greater than 90% accuracy at predicting letter grades.

John Baker, CEO Desire2Learn (c) Desire2Learn
John Baker, CEO Desire2Learn (c) Desire2Learn

Just the same, privacy issues do crop up. For example, instructors having student’s individual engagement statistics can expose a student’s general level of effort.

As a safeguard, Desire2Learn anonymizes personally identifiable information about student activities by stripping off such data when the student finishes the course.

The company makes predictive findings available to instructors through its Students Success System and they plan to do likewise for students through Degree Compass, a product currently in beta.

So, how would your business change if you had time predictions about future outcomes?

Machine Learning Startup Skytree Lands $18 Million

In venture capital circles, machine learning startups are about to catch fire. This makes sense as the size of data sets that companies and organizations need to utilize spirals beyond what the human brain can fathom.

As Derrick Harris at Gigaom reports, Skytree landed $18 million in Series A funding from US Venture Partners, United Parcel Service and Scott McNealy, the Sun Microsystems co-founder and former CEO. The company began just over a year earlier with $1.5 million in seed funding.

Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger
Skytree co-founder Alexander Gray (second from left) at Structure: Data 2012. (c) Pinar Ozger

As big data gets bigger ever more quickly, machine learning makes it possible to identify meaningful patterns in real time that would elude sharp humans even with the best of query tools.

Still, there’s often a place for human judgment to flesh out the findings of machine learning algorithms.

For example: Netflix recommendations, the ZestFinance credit risk analysis platform and ProPublica’s Message Machine project that combs through vast volumes of crowd-sourced emails to find important news stories on a given topic.

The flagship Skytree product, Skytree Server, lets users run advanced machine learning algorithms against their own data sources at speeds much faster than current alternatives. The company claims such rapid and complete processing of large datasets yields extraordinary boosts in accuracy.

Skytree’s new beta product, Adviser, allows novice users to perform machine learning analysis of their data on a laptop and receive guidance about methods and findings.

As the machine learning space becomes more accessible to a wider audience, expect to see more startups get venture funding.

And with DARPA striving to make it easier for machine learning developers to focus more on application design and less on the complexities of statistical inference, this trend could have momentum for some time to come.

Machine Learning Touches All Aspects of Medical Care

Jennifer Barrett
Courtesy of George Mason University

Writing for Mason Research at George Mason University, Michele McDonald reports on how machine learning is helping doctors determine the best course of treatment for their patients. What’s more, machine learning is improving efficiency in medical billing and even predicting patients’ future medical conditions.

Using complex algorithms to mine the data, individualized medicine becomes possible according to Janusz Wojtusiak, director of the Machine Learning and Inference Laboratory and the Center for Discovery Science and Health Informatics at Mason’s College of Health and Human Services.

Wojtusiak points out how current research and studies focus on the average patient whereas those being treated want personalized care at the lowest risk for the best outcome.

Machine learning can identify patterns in reams of data and place the patient’s conditions and symptoms in context to build an individualized treatment model.

As such, machine learning seeks to support the physician based on the history of the condition as well as the history of the patient.

The data to be mined is vast and detailed. It includes the lab tests, diagnoses, treatments, and qualitative notes of individual patients who, taken together, form large populations.

Machine learning uses algorithms that recognize the data, identify patterns in it and derive meaningful analyses.

For example, researchers at the Machine Learning and Inference Lab are comparing five different treatment options for patients with prostate cancer.

To determine the best treatment option, machine learning must first categorize prostate cancer patients on the basis of certain commonalities. When a new patient comes in, algorithms can figure out which group he is most similar to. In turn, this guides the direction of treatment for that patient.

Given the high stakes consequences involved with patient care, the complexity that must be sorted out when making diagnoses and the ongoing monitoring of interventions against outcomes, machine learning development in health care is risk-mitigating and cost-effective.

For more about The Machine Learning and Inference Lab and the health care pilot projects they are working on, see the original article here.

DARPA Sets Stage for Giant Leap Forward in Machine Learning

Probabilistic Programming for Advanced Machine Learning
Courtesy of DARPA.mil

As the new frontier in computing. machine learning brings us software that can make sense of big data, act on its findings and draw insights from ambiguous information.

Spam filters, recommendation systems and driver assistance technology are some of today’s more mainstream uses of machine learning.

Like life on any frontier, creating new machine learning applications, even with the most talented of teams, can be difficult and slow for a lack of tools and infrastructure.

DARPA (The Defense Advanced Research Projects Agency) is tackling this problem head on by launching the Probabilistic Programming for Advanced Machine Learning Program (PPAML).

Probabilistic programming is a programming paradigm for dealing with uncertain information.

In much the same way that high level programming languages spared developers the need to deal with machine level issues, DARPA’s focus on probabilistic programming sets the stage for a quantum leap forward in machine learning.

More specifically, machine learning developers using new programming languages geared for probabilistic inference will be freed up to deliver applications faster that are more innovative, effective and efficient while relying less on big data, as is common today.

For details, see the DARPA Special Notice document describing the specific capabilities sought at http://go.usa.gov/2PhW.

Machine Learning Software Grades Essays and Gives Students Feedback—Instantly

EdX, a nonprofit enterprise founded by Harvard and the Massachusetts Institute of Technology, will release automated software that uses artificial intelligence to grade student essays and short written answers.
EdX, a nonprofit enterprise founded by Harvard and the Massachusetts Institute of Technology, will release automated software that uses artificial intelligence to grade student essays and short written answers.
Courtesy of Gretchen Ertl for The New York Times

John Markoff at the New York Times reports on a fast-moving, back-and-forth exchange where students submit their essays online, receive a grade almost immediately, and  improve their grades based on system-generated feedback.

EdX, a nonprofit consortium of Harvard and the Massachusetts Institute of Technology that offers courses on the Internet,  has developed the automated essay-scoring software powering this new reality.

While controversy rages over the reliability of artificial intelligence to grade essays, EdX software is free to any institution that wants to offer its courses online. So far, the program has been adopted by 12 prestigious universities and it is spreading rapidly worldwide.

Proponents of the software argue that instant feedback is an invaluable learning aid to students versus waiting weeks for professor-graded feedback. Moreover, students find it engaging in much the same way as video games and claim they learn better from the process.

Critics counter that even with the best machine learning algorithms in place; computers cannot perform the essentials of assessing written communication. Les Perelman, a researcher at MIT, has tricked such grading systems into awarding high grades with nonsensical submissions.

A group of educators to which he belongs known as Professionals Against Machine Scoring of Student Essays in High-Stakes Assessment, has collected nearly 2,000 signatures and makes the case that “Computers cannot ‘read.’ They cannot measure the essentials of effective written communication: accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity, among others.”

The EdX program has human graders assess the first 100 essays or essay questions. From then on, the system uses various machine-learning algorithms to train itself automatically. Once trained, it can grade any number of essays or answers in near real time. The software lets the teacher create the scoring system based on letter grades or numerical rankings.

Dr. Anant Agarwal, president of EdX, believes the program is approaching the capability of human graders. Skeptics point out how formal studies comparing the system against qualified human graders have not been done. Nevertheless, Dr. Agarwal claims the quality of EdX grading is as consistent as that found from one instructor to another.

Instant, automated feedback has its adherents elsewhere as well, including start-ups Coursera and Udacity. Both are funded by Stanford faculty members as part of their mission to create “massive open online courses,” or MOOCs.

Coursera founder, Daphne Koller, believes instant feedback turns learning into a game students feel compelled to master where they resubmit their work until they achieve a certain level of proficiency.

So, if automated grading is possible in academic settings, the general idea of assessing new written content based on previous human assessments of existing content is sure to explode over the next few years.

Applications that mine blogs, social media and forum postings to understand markets and communities come to mind.

What do you see happening in your field once automated interpretation of extended passages of text goes mainstream?

Can Machine Learning Give Investigative Journalism the Scoop?

While attending the recent Nicar 2013 conference in Lousiville, Kentucky, Andrew Trench, Media24 Investigations editor and technology blogger reported on a fascinating demonstration of machine learning for finding news stories and insights humans would typically overlook.

Machine Learning at a Glance
Courtwesy GrubStreet.co.za & Andrew Trench

He then shares his vision for how machine learning will impact news as we know it in terms of gathering and shaping stories as well as the news business itself.

ProPublica’s Jeff Larson was the presenter at Nicar. By way of the Message Machine project, his non-profit investigative reporting group used machine learning to uncover a number of major stories.

The project clustered documents and applied decision trees to comb through vast volumes of crowd-sourced emails from their readers on a given topic.

In this case, the topic was how US political parties raised money by tailoring their pitches to suit the demographics of the email recipients.

Under the hood, algorithms convert every word in every email to a number. Documents then have mathematical properties that allow them to be clustered as similar or different.

Apart from the tedium such clustering and conversion tasks would impose on the human mind, scouring the sheer volume of content collected would be too time-consuming and expensive. All the more so given the ever shorter news cycles we’ve come to accept nowadays.

Trench envisions using machine learning to more accurately predict which stories will yield the most likes, click-thrus, and commentary.

He then ruminates about what it would be like for editors concerned about daily sales to have hard data to go on instead of gut instinct.

Upon reading this, I recalled the timeless bible for direct response marketers known as Scientific Advertising by Claude Hopkins.

Written in the 1920s, Hopkins takes advertisers to task for going with their guts when they could craft ads and calls to action that would give them the data they need to continually improve their response from readers.

In effect, Google Adwords is Scientific Advertising on steroids because it forces all businesses to be better direct marketers in real time.

Meanwhile, chances are, machine learning is already quietly building out Trench’s vision of newspapers organized using prediction engines.

After all, if sites and apps like Flipboard allow readers to pull in their own personalized magazines, I suppose the big challenge for traditional media online is to push out an engaging product that differentiates itself by expanding the reader’s horizons in ways they would not on their own.

Which brings us back to Larson and machine learning as a way to make investigative reporting economically viable again.

Starting in the 1980s, the media business converted news from a loss leader to a profit center, amid a flurry of mergers and acquisitions. Along the way, investigative reporting gave way to infotainment because it was seen as an anathema to making profits.

Today, many complain that the mainstream media in the US offers too much commentary and too little “hard news.” In turn, news networks from overseas are gaining American viewers by filling this void.

And so perhaps, we’ve come full circle. With the help of machine learning, traditional media can strike the right balance between catering to their audience’s known preferences and wowing them with authentic, hard-hitting stories as a counterweight to ubiquitous fluff.

How do you see machine learning transforming the way you communicate and publish?

Beyond Spam Filters: Machine Learning to Keep Your Inbox Manageable

SaneBox uses machine learning to manage your inbox
SaneBox uses machine learning to manage your inbox
Courtesy of Inc.com and SaneBox

Even with the best spam filter, managing an inbox overflowing with legitimate business emails can still gobble up precious time.

Many of us different have ways of coping with this daily onslaught.

Some of us slog through every email and do our best to reply to all of them. Others scan subject lines and senders to prioritize which ones are worth opening.

And still others create new email accounts for specific purposes to keep business, personal and commercial messages separated.

Christina DesMarais, an Inc.com contributor, wrote the article Email Doesn’t Have to Suck about her experience with a new service designed to address overwhelm from legitimate emails, aptly named SaneBox.

Based on her description, it’s clear SaneBox is using machine learning to help categorize and prioritize messages.

The service watches how you engage with senders over time to predict which new messages you’ll consider important.

Those messages it considers less important it moves out of your inbox and into an @SaneLater folder you can look at whenever you like.

If you notice an important message in your @SaneLater folder, you can move it to your inbox and SaneBox will remember so that next time you receive a message from that sender, SaneBox will leave it in your inbox.

The service also equips you with a dashboard so you can track your volume of important versus non-important messages. DesMarais gained insight into just how much time email was sucking out of her workday.

Additional folders include @SaneNews (so all your newsletter subscriptions are in one place) and @SaneBlackHole (for those messages you want sent straight to trash).

The SaneBox reminder feature lets you specify which message you want replies to and by when. Simply add an address to a CC or BCC like oneday@SaneBox.com or April12@SaneBox.com  and SaneBox keeps an @SaneRemindMe folder with these messages ordered accordingly.

Now machine learning not only keeps spam out of view, it rescues your relationship with your inbox.