ThreeDWorld (TDW)

A High-Fidelity, Multi-Modal Platform for Interactive Physical Simulation

Learn More

TDW is a 3D virtual world simulation platform, utilizing
state-of-the-art video game engine technology

A TDW simulation consists of two components: a) the Build, a compiled executable running on the Unity3D Engine, which is responsible for image rendering, audio synthesis and physics simulations; and b) the Controller, an external Python interface to communicate with the build.

Researchers write Controllers that send commands to the Build, which executes those commands and returns a broad range of data types representing the state of the virtual world.

TDW provides researchers with:

  • A general, flexible design that does not impose constraints on the types of use-cases it can support, nor force any particular metaphor on the user.
  • Support for multiple modalities -- visual rendering with near-photoreal image quality, coupled with superior audio rendering fidelity.
  • A comprehensive, highly extensible and thoroughly documented command and control Python API.
  • Multiple paradigms for object interaction, capable of generating physically-realistic behavior.

TDW is being used on a daily basis in multiple labs, supporting research that sits at the nexus of neuroscience, cognitive science and artificial intelligence.

Paper "ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation" [ArXiv]

The TDW platform is publicly available. GitHub

Latest News:

AGENT: A Benchmark for Core Psychological Reasoning A combined team from MIT, the MIT-IBM Watson AI Lab and Harvard University recently released AGENT: A Benchmark for Core Psychological Reasoning. The benchmark consists of a large dataset of procedurally generated 3D animations, synthesized with TDW, that probes key concepts of core intuitive psychology.

For further details, please visit the AGENT website.

ThreeDWorld Transport Challenge We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge. In this challenge, the Magnebot acts as an embodied agent and is spawned randomly in a simulated physical home environment. The agent must find a small set of objects scattered around the house, pick them up, and transport them to a desired final location.

For further details, please visit this website.

New Robotics-like API With version 1.8 of TDW, we introduce a new high-level robotics-like API - Magnebot. The Magnebot can move around the scene and manipulate objects by picking them up with its "magnet" end-effectors. Magnebot's arms have 7 degrees of freedom, with 2 additional DOF coming from its torso that can slide up and down and rotate around its central column. The simulation is entirely driven by physics.

At a low level, the Magnebot is driven by robotics commands such as set_revolute_target(), which will turn a revolute drive. The high-level API combines the low-level commands into "actions", such as grasp(target_object) or move_by(distance). Arm articulation is driven by an inverse kinematics (IK) system, where the arm will calculate a solution to reach a specified target position or object.

The API also includes a wide variety of new interior scenes, populated by interactable objects and optimized for navigation by Magnebot. In addition, users can now use their own robot models in TDW, by importing standard URDF robot model descriptor files.

To see the Magnebot in action, watch this video.

Compare Features

Simulation Platform Photorealism: Indoor Environments Photorealism: Outdoor Environments Physics: Rigidbody Physics: Fast/Accurate Collisions Physics: Softbody Physics: Cloth Physics: Fluid Audio: Environmental Audio: Physics-driven Interaction: Non-Agent Interaction: Agent-driven Interaction: Human VR
Deepmind Lab

TDW Core Features

Near-photoreal Image Rendering

High-resolution 3D models, physically-based rendering materials and a sophisticated lighting model combine to create highly-photorealistic rendered images.

Real-time Impact Sound Synthesis

Uniquely, TDW can synthesize and play collision impact sounds at runtime based on physics metadata such as object masses, materials and relative velocities.

Rich Set of API Commands

The TDW command API provides over 200 "building block" commands, allowing researchers to write controller programs for a wide range of use-cases.

Advanced Physics Behaviors

TDW is capable of simulating rigid bodies, soft bodies, cloth and fluids to provide complex physical interactions between scene objects.

Indirect Object Interaction Through Avatars

Avatars act as the embodiment of an AI agent; for example, one avatar type uses articulated arms to transport objects around the environment. TDW supports multiple avatars within a scene that can interact with each other.

Near-Photoreal Image Rendering

Our high-resolution 3D models are very detailed, which is important for photorealism, but at the same time are highly optimized for real-time simulation purposes.

TDW comes with a "core" library of 200+ models. In addition, our "full" photorealistic model library contains over 2000 models across 200 object categories. We are exploring making this library available for licensing; for details please go to this link. Users can also convert their own models for use inside TDW using our model conversion tools.

Many of our exterior environments are built using 3D model assets scanned from the real world (rock outcrops, ground surfaces).

TDW's lighting model uses a single light source to simulate the sun, for direct lighting. Indirect or environment lighting comes from HDRI (High Dynamic Range Image) 'skyboxes'.

TDW's 3D models use Physically-Based Rendering (PBR) materials, that respond to light in a physically-correct manner. The realism of many of TDW's materials is further enhanced by the use of texture images scanned from actual physical materials.

Real-Time Impact Sound Synthesis

TDW can generate audio from information about physical events such as material types and impact parameters of colliding objects (velocities, normal vectors and masses)

Our PyImpact Python library generates these sounds via modal synthesis, with mode properties sampled from distributions conditioned upon properties of the sounding object. The mode distributions were measured from recordings of actual impacts. Further details

In human perceptual experiments, listeners could not distingush our synthetic impact sounds from real impact sounds, and could accurately judge physical properties from the synthetic audio.

J Traer, M Cusimano, JH McDermott, A Perceptually Inspired Generative Model of Rigid-Body Contact Sounds, Digital Audio Effects (DAFx), 2019

Rich Set of API Commands

Users can interact directly with objects in the scene using our Python command API.

Controller programs send commands over TCP/IP to the TDW runtime executable, or "build". The build executes those commands and returns data back to the controller representing the state of the virtual world. TDW commands can be sent in a list per simulation step rather than one at a time, enabling arbitrarily complex behavior.

Here a force is being applied to a chair object, causing it to collide with a fridge object. The code generating the behavior is shown on the left, the result is shown on the right.

Rigid Body Physics and Collisions

Unity's built-in physics engine (PhysX) handles rigid body physics including the collisions between rigid bodies.

API commands can alter the physics time step to balance the accuracy of physics behavior against real-time performance, or modify behavior by adjusting mass, friction, etc. per-object at runtime.

NVidia Flex Uniform Particle Representation

Flex uses a uniform particle-based object representation that allows rigid bodies, soft bodies, cloth objects and fluids to interact.

On the left, we use the cloth simulation to drop a rubbery sheet which collides with a rigid body object. On the right, balls of increasing mass are dropped into a pool of water, causing greater and greater displacement and splashing.

This type of unified representation can help machine learning models use both the underlying physics and rendered images to learn a physical and visual representation of the world through interactions with objects in the world.

Advanced Physics Benchmark Dataset

Using the TDW platform, we have created a comprehensive benchmark for training and evaluation of physically-realistic forward prediction algorithms, which will be released as part of the TDW package.

Once completed, this dataset will contain a large and varied collection of physical scene trajectories, including all data from visual, depth, and force sensors, high-level semantic label information for each frame, as well as latent generative parameters and code controllers for all situations.

This dataset goes well beyond existing related benchmarks, providing scenarios with large numbers of complex real-world object geometries, photo-realistic textures, as well as a variety of rigid, soft-body, cloth, and fluid materials.

The codebase for generating the dataset will be made publicly available in conjunction with the TDW platform.

Indirect Object Interaction Through Avatars

In TDW, avatars are the embodiment of AI agents within a scene.

Avatars can take the form of simple disembodied cameras for generating egocentric-view rendered images, segmentation and depth maps etc.

Avatars using simple geometric primitives such as cubes or spheres can move around the environment, acting as basic embodied agents. These avatars are well-suited to basic algorithm prototyping.

More complex embodied avatars are possible with user-defined physical structures and physically-mapped action spaces

The Magnebot robot's mobility and arm articulation actions are driven by physics, as opposed to any form of pre-scripted animation, and controlled using high-level API commands. Here Magnebot uses its "magnet" end-effector to remove an object from a table. It also picks up a series of objects and places them into a container held by its other magnet; it then carries them to a different room and pours them out again.

Research Use Cases

TDW has been used in a number of labs within MIT and Stanford, as well as IBM

Visual Recognition Transfer

A learned visual feature representation, trained on a TDW image classification dataset comparable to ImageNet, was transferred to fine-grained image classification and object detection task.

Multi-modal Physical Scene Understanding

TDW's audio impact synthesis generated a synthetic dataset of impact sounds used to test material and mass classification.

Learnable Physics Models

Using TDW's ability to handle complex physical collisions and non-rigid deformations, agents learn to predict physical dynamics in novel settings.

Visual Learning in Curious Agents

Intrinsically-motivated agents based on TDW's high-quality rendering and flexible avatar models exhibit rudimentary self-awareness and curiosity.

Social Agents and Virtual Reality

In experiments on animate attention, both human observers in VR and a neural network agent embodying concepts of intrinsic curiosity found animacy to be more "interesting".

Frequently Asked Questions

Find answers to frequently asked questions about TDW.

  • Fast! Here are some basic benchmarks:

    Benchmark Quality Image Size FPS
    Object transform data, 100 objects N/A N/A 761
    Image capture Low 256x256 380
    Image capture High 1024x1024 41
    Move avatar per frame Low 256x256 160
    Flex Benchmark (Windows) FlexParticles,
    Transform, CameraMatrices, and Collisions
    N/A N/A 204

    Full benchmark details

  • If you want to contribute code, you can create a new branch and then open a PR from your fork of the TDW repo. Please note however the code for the simulation binary (the "build") is still closed-source, meaning that you won't be able to directly modify the API, fix bugs in the build, etc. If you have suggestions, feature requests, bug reports, etc., please add them as GitHub Issues.

    However if you believe that your particular use case absolutely requires access to the backend source code, then please refer to the discussion on our repo regarding this: Requesting access to TDW C# source code

  • Maybe! See our README: ThreeDWorld (TDW)

    • Windows, OS X, or Linux.
    • For high-fidelity rendering and particle-based physics simulations, an NVIDIA GPU.
    • Python 3.6+
  • TDW's team is working full-time on the project, so expect feature updates every few weeks or so.

  • Yes. You can optionally run your Python code on a different machine. Additionally, the repo contains a Docker file for TDW. Further details on Docker container.

Our Team

Development Team

Jeremy Schwartz

Project Lead, MIT BCS

Seth Alter

Lead Developer, MIT BCS

Principal Investigators

Jim DiCarlo


Josh McDermott


Josh Tenenbaum


Dan Yamins

Stanford NeuroAILab

Dan Gutfreund

MIT-IBM Watson AI Lab

Chuang Gan

MIT-IBM Watson AI Lab


James Traer


Jonas Kubilius


Martin Schrimpf


Abhishek Bhandwaldar

MIT-IBM Watson AI Lab

Julian DeFreitas

Vision Sciences Lab, Harvard

Damian Mwroca

Stanford NeuroAILab

Michael Lingelbach

Stanford NeuroAILab

Megumi Sano

Stanford NeuroAILab

Dan Bear

Stanford NeuroAILab

Kuno Kim

Stanford NeuroAILab

Nick Haber

Stanford NeuroAILab

Chaofei Fan

Stanford NeuroAILab

Brain and Cognitive Sciences, MIT

If you are interested in using TDW in your research, please contact:

Jeremy Schwartz,
TDW Project Lead

43 Vassar St
Cambridge, MA 02139