Aided by A.I. Language Models, Google’s Robots Are Getting Smart

A one-armed robot stood in front of a table. On the table sat three plastic figurines: a lion, a whale and a dinosaur.

Aided by A.I. Language Models, Google’s Robots Are Getting Smart
Aided by A.I. Language Models, Google’s Robots Are Getting Smart

An engineer handed the robot an instruction: “Pick up the extinct animal.”

The robot whirred for a minute, then its arm stretched and its claw expanded and descended. It grabbed the dinosaur.

Until very recently, this demonstration, which I witnessed during a podcast interview at Google’s robots group in Mountain View, Calif., last week, would have been unthinkable. Robots weren’t able to successfully manage objects they had never seen before, and they surely weren’t capable of making the logical leap from “extinct animal” to “plastic dinosaur.”

But a silent revolution is underway in robotics, one that piggybacks on recent improvements in so-called large language models — the same type of artificial intelligence system that powers ChatGPT, Bard and other chatbots.

Google has recently began integrating state-of-the-art language models into its robots, giving them the equivalent of artificial brains. The clandestine project has made the robots far smarter and given them new powers of comprehension and problem-solving.

I got a taste of that advancement during a private display of Google’s latest robotics model, named RT-2. The model, which is being introduced on Friday, amounts to a first step toward what Google executives described as a big shift in the way robots are designed and programmed.

“We’ve had to reconsider our entire research program as a result of this change,” said Vincent Vanhoucke, Google DeepMind’s head of robotics. “A lot of the things that we were working on before have been entirely invalidated.”

Robots still fall short of human-level dexterity and fail at some basic tasks, but Google’s use of A.I. language models to give robots new skills of reasoning and improvisation indicates a potential breakthrough, said Ken Goldberg, a robotics professor at the University of California, Berkeley.

Must read:AI is powering politics- but it could also reboot democracy

A New Generation of Chatbots

A brand new world. A new crop of chatbots powered by artificial intelligence has spurred a frenzy to assess whether the technology could upend the economics of the internet, turning today’s powerhouses into has-beens and generating the industry’s next giants. Here are the bots to know:

A New Generation of Chatbots
A New Generation of Chatbots


ChatGPT the artificial intelligence language model from a research lab, OpenAI, has been making news since November for its capacity to respond to hard inquiries, write poetry, develop code, plan holidays and translate languages. GP, the latest version introduced in mid-March, can even respond to visuals (and ace the Uniform Bar Exam).


Two months after ChatGPT’s debut, Microsoft, OpenAI’s principal investor and partner, added a comparable chatbot, capable of having open-ended text conversations on nearly any topic, to its Bing internet search engine. But that was the bot’s occasionally release.


Google’s chatbot, named Bard, was released in March to a restricted number of people in the United States and Britain. Originally created as a creative tool designed to compose emails and poetry, it may produce ideas, write blog articles and answer inquiries with facts or opinions.

Must read:US senators express  fear about AI, focused on biological assault


The search giant Baidu announced China’s first big challenger to ChatGPT in March. The premiere of Ernie, short for Enhanced Representation through Knowledge Integration, turned out to be a flop after a promised “live” demonstration of the bot was shown to have been taped.

“What’s very impressive is how it links semantics with robots,” he said. “That’s very exciting for robotics.”

To comprehend the significance of this, it helps to know a little about how robots have customarily been made.

For years, the way engineers at Google and other firms trained robots to accomplish a mechanical operation — flipping a burger, for example — was by programming them with a specific series of instructions. (Lower the spatula 6.5 inches, slide it ahead until it hits resistance, raise it 4.2 inches, spin it 180 degrees, and so on.) Robots would then practice the assignment again and again, with engineers modifying the instructions each time until they got it right.

Must read:How Does Artificial Intelligence Create New Jobs?

This strategy worked for specific, limited uses. But training robots this way is slow and labor-intensive. It requires collecting masses of data from real-world experiments. And if you wanted to teach a robot to do something new — to flip a pancake instead of a burger, say — you normally had to reprogram it from scratch.

Partly because of these restrictions, hardware robots have improved less swiftly than their software-based cousins. OpenAI, the maker of ChatGPT, terminated its robotics team in 2021, citing poor development and a lack of high-quality training data. In 2017, Google’s parent company, Alphabet, sold Boston Dynamics, a robotics business it had bought, to the Japanese tech conglomerate SoftBank. (Boston Dynamics is now owned by Hyundai and seems to exist largely to make viral videos of humanoid robots performing horrifying feats of agility.)

In recent years, researchers at Google had an idea. What if, instead of being programmed for specific tasks one by one, robots could use an A.I. language model — one that has been trained on enormous swaths of online content — to learn new abilities for themselves?

“We started playing with these language models around two years ago, and then we realized that they have a lot of knowledge in them,” said Karol Hausman, a Google research scientist. “So we started connecting them to robots.”

Google’s initial attempt to combine language models and physical robots was a research project called PaLM-SayCan, which was revealed last year. It received some attention, but its value was limited. The robots lacked the ability to understand visuals – an essential talent, if you want them to be able to traverse the world.

They could write out step-by-step directions for different jobs, but they couldn’t put those steps into actions. Google’s new robotics model, RT-2, can do precisely that. It’s what the business calls a “vision-language-action” paradigm, or an A.I. system that has the ability not merely to perceive and understand the world around it, but to tell a robot how to move.

It does so by turning the robot’s actions into a series of numbers — a process called tokenizing — and putting those tokens into the same training data as the language model. Eventually, much as ChatGPT or Bard learns to guess what words should come next in a poem or a history essay, RT-2 can learn to estimate how a robot’s arm should move to pick up a ball or dump an empty drink can into the recycling bin.

“In other words, this model can learn to speak robot,” Mr. Hausman added.

In an hourlong presentation, which took place in a Google office kitchen cluttered with stuff from a dollar shop, my podcast co-host and I observed RT-2 accomplish a number of astonishing tasks. One was successfully following complex directions like “move the Volkswagen to the German flag,” which RT-2 performed by finding and catching a model VW Bus and laying it down on a miniature German flag several feet distant.

It also proven capable of obeying instructions in languages other than English, and even making abstract connections between related concepts. Once, when I wanted RT-2 to pick up a soccer ball, I directed it to “pick up Lionel Messi.” RT-2 got it right on the first try.

Must read:Amazon has drawn thousands to try its AI service competing with Google and Microsoft

The robot wasn’t ideal. It erroneously detected the flavor of a can of LaCroix placed on the table in front of it. (The can was lemon; RT-2 guessed orange.) Another occasion, when it was asked what kind of fruit was on a table, the robot simply answered, “White.” (It was a banana.) A Google spokeswoman claimed the robot had utilized a cached answer to a previous tester’s inquiry because its Wi-Fi had briefly gone out.

Google has no immediate plans to sell RT-2 robots or distribute them more widely, but its researchers hope these new language-equipped machines will someday be valuable for more than simply parlor tricks. Robots with built-in language models may be put into warehouses, utilized in medicine or even deployed as household aides – folding laundry, unloading the dishwasher, tidying up around the house, they claimed.

“This really opens up using robots in environments where people are,” Mr. Vanhoucke added. “In office environments, in home environments, in all the places where there are a lot of physical tasks that need to be done.”

Of course, moving stuff around in the messy, chaotic physical world is tougher than doing so in a controlled lab. And considering that A.I. language models regularly make mistakes or produce nonsensical answers — which academics call hallucination or confabulation — employing them as the brains of robots could present new concerns.

But Mr. Goldberg, the Berkeley robotics professor, said those hazards were still unlikely.

“We’re not talking about letting these things run loose,” he continued. “In these lab environments, they’re just trying to push some objects around on a table.”

Google, for its part, said RT-2 was equipped with many of safety mechanisms. In addition to a giant red button on the back of every robot — which stops the robot in its tracks when pressed — the system uses sensors to prevent knocking into people or objects.

The A.I. software included into RT-2 has its own protections, which it can utilize to prevent the robot from doing anything destructive. One innocuous example: Google’s robots can be educated not to pick up containers with water in them, because water can damage their circuitry if it spills.

If you’re the kind of person who worries about A.I. going rogue — and Hollywood has given us plenty of reasons to fear that scenario, from the original “Terminator” to last year’s “M3gan” — the idea of making robots that can reason, plan and improvise on the fly probably strikes you as a terrible idea.

But at Google, it’s the kind of idea researchers are praising. After years in the wilderness, hardware robots are back — and they have their chatbot brains to thank.

Must read:AWS launched a new healthcare focused services powered by generative AI

Leave a Comment