Facebook, Georgia Institute of Technology and Oregon State University researchers in a preprinted paper published this week, described a new task of artificial intelligence – by listening to natural language commands, navigate in a 3D environment.
The researcher’s task called visual and language navigation in a continuous environment (VLN-CE), was performed in Facebook’s simulator Habitat, which can train robot assistants to operate in environments that simulate real environments.
An assistant with a diameter of 0.2 meters and a height of 1.5 meters is placed inside the Matterport3D dataset, which is a collection of 90 environments captured through more than 10,800 panoramas and corresponding 3D grids.
The robot assistant must do one of four actions (moving 0.25 meters forward, turning left or right 15 degrees, or stopping at the target position) on a path and learn to avoid being trapped on obstacles such as chairs and tables on.
The research team refined these environments into 4,475 trajectories composed of 4 to 6 nodes, which corresponded to 360-degree panoramic images taken at various locations, showing navigation capabilities.
They used this to train two artificial intelligence models, a sequence-to-sequence model, which consists of strategies that take visual observation and instruction representation, and use them to predict an action; The model tracks observations and makes decisions based on instructions and characteristics.