AI learns language with the help of a child

 


Today I want to talk to you about an experiment conducted by some AI researchers to see if AI can learn language the way human babies do. With this in mind, they put a head mounted camera on an infant and tried to understand how human babies comprehend language with the help of the visual and auditory cues it gets from its environment and can AI learn on the same lines. We’ll discuss that experiment in a bit.

If you come to think of it, learning a language is such a complex task and yet human babies do it pretty easily! Contrary to the belief that parents or caretakers teach a child to learn a language, children acquire language through interaction — not only with their parents and other adults, but also with other children. All normal children who grow up in normal households, surrounded by conversation, will acquire the language that is being used around them. And it is just as easy for a child to acquire two or more languages at the same time, as long as they are regularly interacting with speakers of those languages. As a child listens to the language spoken around them and interacts with the world through touch and sight, they naturally learn the names of objects by associating them with the sounds they hear. When a parent says something, it is likely that some of the words used are referring to what the child can see, meaning the infant’s comprehension is instilled by linking visual and language cues.

With this hypothesis Wai Keen Vong and other researchers from New York university conducted an experiment. In the study, Sam, a baby boy living near Adelaide in Australia, wore a head-mounted camera for around one hour twice each week from the age of six months to around two years, gathering experiences from the infant’s perspective.

Researchers then trained a multi modal AI model based on frames from the video and words spoken to Sam transcribed from the recording. The footage contained about 250,000 word-instances — including word repetitions — linked with video frames of what the child saw when the words were spoken.

After training the AI, they evaluated the system by presenting it with a target word and an array of four different image options and asking it to select the image matching the word. With input from just a single child’s experience, the algorithm was able to capture how words relate to each other and link words to images and concepts. It suggests that for toddlers hearing words and matching them to what they’re seeing helps build their vocabulary.

“Our results demonstrate how recent algorithmic advances paired with one child’s naturalistic experience has the potential to reshape our understanding of early language and concept acquisition,” Dr Vong said. “Combining these cues is what enables contrastive learning to gradually determine which words belong with which visuals and to capture the learning of a child’s first words,” he explained.

Linguists around the world are studying how infants learn such a complex task so easily and there is no definitive answer yet. But overall, combining AI and life experiences is a powerful new method to study both machine and human brains. This will potentially reshape our understanding of how our brains learn language and concepts.

Comments

Popular posts from this blog

Can language models reason?

AI implementation myths

Homework 2.0: The science behind homework and why it still matters in the AI age!