Do Embodied Agents Dream of Pixelated Sheep?

Embodied Decision Making using Language Guided World Modelling

Kolby Nottingham Prithviraj Ammanabrolu Alane Suhr

Yejin Choi Hannaneh Hajishirzi Sameer Singh Roy Fox

Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world. However, if initialized with knowledge of high-level subgoals and transitions between subgoals, RL agents could utilize this Abstract World Model (AWM) for planning and exploration. We propose using few-shot large language models (LLMs) to hypothesize an AWM, that will be verified through world experience, to improve sample efficiency of RL agents. Our DECKARD agent applies LLM-guided exploration to item crafting in Minecraft in two phases: (1) the Dream phase where the agent uses an LLM to decompose a task into a sequence of subgoals, the hypothesized AWM; and (2) the Wake phase where the agent learns a modular policy for each subgoal and verifies or corrects the hypothesized AWM. Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude but is also robust to and corrects errors in the LLM, successfully blending noisy internet-scale information from LLMs with knowledge grounded in environment dynamics.

Paper» Code» Video»

DECKARD

We present DECKARD (DECision-making for Knowledgable Autonomous Reinforcement-leanring Dreamers), an agent that hypothesizes an Abstract World Model (AWM) over subgoals by few-shot prompting an LLM, then exploits the AWM for exploration and verifies the AWM with grounded experience.

Dream and Wake Phases

Each iteration, DECKARD alternates between Dream and Wake phases. During the Dream phase, DECKARD samples the next goal predicted by the LLM. Then, during the Wake phase, it attempts to reach the new node and verify the accuracy of the predicted transition.

Demo

Our final agent learns modular policies for each node verified during the Dream/Wake phases. DECKARD's modularity makes it possible to collect and craft arbitrary Minecraft items at inference time.

Do Embodied Agents Dream of Pixelated Sheep?

Embodied Decision Making using Language Guided World Modelling

Kolby Nottingham Prithviraj Ammanabrolu Alane Suhr

Yejin Choi Hannaneh Hajishirzi Sameer Singh Roy Fox

DECKARD

Minecraft Technology Tasks

Dream and Wake Phases

Results

Demo

Citation

Kolby Nottingham	Prithviraj Ammanabrolu	Alane Suhr
Yejin Choi	Hannaneh Hajishirzi	Sameer Singh	Roy Fox

Do Embodied Agents Dream of Pixelated Sheep?

Embodied Decision Making using Language Guided World Modelling

Kolby Nottingham Prithviraj Ammanabrolu Alane Suhr Yejin Choi Hannaneh Hajishirzi Sameer Singh Roy Fox

DECKARD

Minecraft Technology Tasks

Dream and Wake Phases

Results

Demo

Citation

Kolby Nottingham Prithviraj Ammanabrolu Alane Suhr

Yejin Choi Hannaneh Hajishirzi Sameer Singh Roy Fox