Never-Ending Language Learning – robot curiosity

robotA research team at Carnegie-Mellon University has created a self-educating computer system called Never-Ending Language Learning (or NELL). It was designed to be able to work out the connections between words based on how they are used together.  The system was given pre-defined relationships between words and categories, and using this information, it can create its own interpretations of other words and phrases.

NELL was designed to learn as humans learn, and using information from websites all over the internet, it can also update its knowledge if it finds conflicting information (unlike humans sometimes!). In the future, researchers hope that NELL will be able to work out context in regular human speech, and provide answers to questions without a real person to moderate.

It is still a new technology, despite already having learned over 440,000 facts with 87% accuracy (at time of writing), so it does have its limitations.

From the New York Times article, Smarter Than You Think – Aiming to Learn as We Do, a Machine Teaches Itself:

Take two similar sentences, he said. “The girl caught the butterfly with the spots.” And, “The girl caught the butterfly with the net.”

A human reader, he noted, inherently understands that girls hold nets, and girls are not usually spotted. So, in the first sentence, “spots” is associated with “butterfly,” and in the second, “net” with “girl.”

“That’s obvious to a person, but it’s not obvious to a computer,” Dr. Mitchell said. “So much of human language is background knowledge, knowledge accumulated over time. That’s where NELL is headed, and the challenge is how to get that knowledge.”

Initially, NELL ran by itself, but researchers decided it would be best to begin correcting significant mistakes as they went. One amusing mistake was quoted again in the article:

When Dr. Mitchell scanned the “baked goods” category recently, he noticed a clear pattern. NELL was at first quite accurate, easily identifying all kinds of pies, breads, cakes and cookies as baked goods. But things went awry after NELL’s noun-phrase classifier decided “Internet cookies” was a baked good. (Its database related to baked goods or the Internet apparently lacked the knowledge to correct the mistake.)

NELL had read the sentence “I deleted my Internet cookies.” So when it read “I deleted my files,” it decided “files” was probably a baked good, too. “It started this whole avalanche of mistakes,” Dr. Mitchell said. He corrected the Internet cookies error and restarted NELL’s bakery education.

So, the technology isn’t perfect yet, but these corrections can be viewed in the same way as a language teacher correcting your usage of a particular word. We all need a helping hand sometimes!

Update: NELL is now on Twitter (@cmunell)! Updates consist of a word or phrase and a category she thinks it belongs to. Followers are asked to send in corrections to improve the process. Sometimes she’s totally correct (I think “John MCain” is a ()), and sometimes not so much (I think “US President-elect Barack Obama” is a ()). What is a politicianus? Oh, I think she meant US politician. Right.