TDW is a 3D virtual world simulation platform, utilizing
state-of-the-art video game engine technology
A TDW simulation consists of two components: a) the Build, a compiled executable running on the Unity3D Engine, which is responsible for image rendering, audio synthesis and physics simulations; and b) the Controller, an external Python interface to communicate with the build.
Researchers write Controllers that send commands to the Build, which executes those commands and returns a broad range of data types representing the state of the virtual world.
TDW provides researchers with:
- A general, flexible design that does not impose constraints on the types of use-cases it can support, nor force any particular metaphor on the user.
- Support for multiple modalities -- visual rendering with near-photoreal image quality, coupled with superior audio rendering fidelity.
- A comprehensive, highly extensible and thoroughly documented command and control Python API.
- Multiple paradigms for object interaction, capable of generating physically-realistic behavior.
TDW is being used on a daily basis in multiple labs, supporting research that sits at the nexus of neuroscience, cognitive science and artificial intelligence.
Paper "ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation" [ArXiv]
The TDW platform is publicly available. GitHub
Latest News:
AGENT: A Benchmark for Core Psychological Reasoning
A combined team from MIT, the MIT-IBM Watson AI Lab and Harvard University recently released AGENT: A Benchmark for Core Psychological Reasoning. The benchmark consists of a large dataset of procedurally generated 3D animations, synthesized with TDW, that probes key concepts of core intuitive psychology.
For further details, please visit the AGENT website.
ThreeDWorld Transport Challenge
We introduce a visually-guided and physics-driven task-and-motion planning benchmark, which we call the ThreeDWorld Transport Challenge. In this challenge, the Magnebot acts as an embodied agent and is spawned randomly in a simulated physical home environment. The agent must find a small set of objects scattered around the house, pick them up, and transport them to a desired final location.
For further details, please visit this website.
New Robotics-like API
With version 1.8 of TDW, we introduce a new high-level robotics-like API - Magnebot. The Magnebot can move around the scene and manipulate objects by picking them up with its "magnet" end-effectors. Magnebot's arms have 7 degrees of freedom, with 2 additional DOF coming from its torso that can slide up and down and rotate around its central column. The simulation is entirely driven by physics.
At a low level, the Magnebot is driven by robotics commands such as set_revolute_target(), which will turn a revolute drive. The high-level API combines the low-level commands into "actions", such as grasp(target_object) or move_by(distance). Arm articulation is driven by an inverse kinematics (IK) system, where the arm will calculate a solution to reach a specified target position or object.
The API also includes a wide variety of new interior scenes, populated by interactable objects and optimized for navigation by Magnebot. In addition, users can now use their own robot models in TDW, by importing standard URDF robot model descriptor files.
Compare Features
Simulation Platform | Photorealism: Indoor Environments | Photorealism: Outdoor Environments | Physics: Rigidbody | Physics: Fast/Accurate Collisions | Physics: Softbody | Physics: Cloth | Physics: Fluid | Audio: Environmental | Audio: Physics-driven | Interaction: Non-Agent | Interaction: Agent-driven | Interaction: Human VR |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ThreeDWorld | ||||||||||||
AI2-THOR | ||||||||||||
Deepmind Lab | ||||||||||||
Gibson | ||||||||||||
Habitat | ||||||||||||
HoME | ||||||||||||
MuJuCo | ||||||||||||
PyBullet | ||||||||||||
VirtualHome | ||||||||||||
iGibson | ||||||||||||
Sapien |
TDW Core Features
Near-photoreal Image Rendering
High-resolution 3D models, physically-based rendering materials and a sophisticated lighting model combine to create highly-photorealistic rendered images.
Real-time Impact Sound Synthesis
Uniquely, TDW can synthesize and play collision impact sounds at runtime based on physics metadata such as object masses, materials and relative velocities.
Rich Set of API Commands
The TDW command API provides over 200 "building block" commands, allowing researchers to write controller programs for a wide range of use-cases.
Advanced Physics Behaviors
TDW is capable of simulating rigid bodies, soft bodies, cloth and fluids to provide complex physical interactions between scene objects.
Indirect Object Interaction Through Avatars
Avatars act as the embodiment of an AI agent; for example, one avatar type uses articulated arms to transport objects around the environment. TDW supports multiple avatars within a scene that can interact with each other.
Near-Photoreal Image Rendering
Our high-resolution 3D models are very detailed, which is important for photorealism, but at the same time are highly optimized for real-time simulation purposes.
TDW comes with a "core" library of 200+ models. In addition, our "full" photorealistic model library contains over 2000 models across 200 object categories. We are exploring making this library available for licensing; for details please go to this link. Users can also convert their own models for use inside TDW using our model conversion tools.
Many of our exterior environments are built using 3D model assets scanned from the real world (rock outcrops, ground surfaces).
TDW's lighting model uses a single light source to simulate the sun, for direct lighting. Indirect or environment lighting comes from HDRI (High Dynamic Range Image) 'skyboxes'.
TDW's 3D models use Physically-Based Rendering (PBR) materials, that respond to light in a physically-correct manner. The realism of many of TDW's materials is further enhanced by the use of texture images scanned from actual physical materials.
Real-Time Impact Sound Synthesis
TDW can generate audio from information about physical events such as material types and impact parameters of colliding objects (velocities, normal vectors and masses)
Our PyImpact Python library generates these sounds via modal synthesis, with mode properties sampled from distributions conditioned upon properties of the sounding object. The mode distributions were measured from recordings of actual impacts. Further details
In human perceptual experiments, listeners could not distingush our synthetic impact sounds from real impact sounds, and could accurately judge physical properties from the synthetic audio.
J Traer, M Cusimano, JH McDermott, A Perceptually Inspired Generative Model of Rigid-Body Contact Sounds, Digital Audio Effects (DAFx), 2019Rich Set of API Commands
Users can interact directly with objects in the scene using our Python command API.
Controller programs send commands over TCP/IP to the TDW runtime executable, or "build". The build executes those commands and returns data back to the controller representing the state of the virtual world. TDW commands can be sent in a list per simulation step rather than one at a time, enabling arbitrarily complex behavior.
Here a force is being applied to a chair object, causing it to collide with a fridge object. The code generating the behavior is shown on the left, the result is shown on the right.
Rigid Body Physics and Collisions
Unity's built-in physics engine (PhysX) handles rigid body physics including the collisions between rigid bodies.
API commands can alter the physics time step to balance the accuracy of physics behavior against real-time performance, or modify behavior by adjusting mass, friction, etc. per-object at runtime.
NVidia Flex Uniform Particle Representation
Flex uses a uniform particle-based object representation that allows rigid bodies, soft bodies, cloth objects and fluids to interact.
On the left, we use the cloth simulation to drop a rubbery sheet which collides with a rigid body object. On the right, balls of increasing mass are dropped into a pool of water, causing greater and greater displacement and splashing.
This type of unified representation can help machine learning models use both the underlying physics and rendered images to learn a physical and visual representation of the world through interactions with objects in the world.
Advanced Physics Benchmark Dataset
Using the TDW platform, we have created a comprehensive benchmark for training and evaluation of physically-realistic forward prediction algorithms, which will be released as part of the TDW package.
Once completed, this dataset will contain a large and varied collection of physical scene trajectories, including all data from visual, depth, and force sensors, high-level semantic label information for each frame, as well as latent generative parameters and code controllers for all situations.
This dataset goes well beyond existing related benchmarks, providing scenarios with large numbers of complex real-world object geometries, photo-realistic textures, as well as a variety of rigid, soft-body, cloth, and fluid materials.
The codebase for generating the dataset will be made publicly available in conjunction with the TDW platform.
Indirect Object Interaction Through Avatars
In TDW, avatars are the embodiment of AI agents within a scene.
Avatars can take the form of simple disembodied cameras for generating egocentric-view rendered images, segmentation and depth maps etc.
Avatars using simple geometric primitives such as cubes or spheres can move around the environment, acting as basic embodied agents. These avatars are well-suited to basic algorithm prototyping.
More complex embodied avatars are possible with user-defined physical structures and physically-mapped action spaces
The Magnebot robot's mobility and arm articulation actions are driven by physics, as opposed to any form of pre-scripted animation, and controlled using high-level API commands. Here Magnebot uses its "magnet" end-effector to remove an object from a table. It also picks up a series of objects and places them into a container held by its other magnet; it then carries them to a different room and pours them out again.
Research Use Cases
TDW has been used in a number of labs within MIT and Stanford, as well as IBM
Visual Recognition Transfer
A learned visual feature representation, trained on a TDW image classification dataset comparable to ImageNet, was transferred to fine-grained image classification and object detection task.
Multi-modal Physical Scene Understanding
TDW's audio impact synthesis generated a synthetic dataset of impact sounds used to test material and mass classification.
Learnable Physics Models
Using TDW's ability to handle complex physical collisions and non-rigid deformations, agents learn to predict physical dynamics in novel settings.
Visual Learning in Curious Agents
Intrinsically-motivated agents based on TDW's high-quality rendering and flexible avatar models exhibit rudimentary self-awareness and curiosity.
Social Agents and Virtual Reality
In experiments on animate attention, both human observers in VR and a neural network agent embodying concepts of intrinsic curiosity found animacy to be more "interesting".
Frequently Asked Questions
Find answers to frequently asked questions about TDW.
-
How fast is TDW?
Fast! Here are some basic benchmarks:
Benchmark Quality Image Size FPS Object transform data, 100 objects N/A N/A 761 Image capture Low 256x256 380 Image capture High 1024x1024 41 Move avatar per frame Low 256x256 160 Flex Benchmark (Windows) FlexParticles,
Transform, CameraMatrices, and CollisionsN/A N/A 204 -
Can I contribute to TDW?
If you want to contribute code, you can create a new branch and then open a PR from your fork of the TDW repo. Please note however the code for the simulation binary (the "build") is still closed-source, meaning that you won't be able to directly modify the API, fix bugs in the build, etc. If you have suggestions, feature requests, bug reports, etc., please add them as GitHub Issues.
However if you believe that your particular use case absolutely requires access to the backend source code, then please refer to the discussion on our repo regarding this: Requesting access to TDW C# source code
-
Can TDW do...?
Maybe! See our README: ThreeDWorld (TDW)
-
What are the system requirements?
- Windows, OS X, or Linux.
- For high-fidelity rendering and particle-based physics simulations, an NVIDIA GPU.
- Python 3.6+
-
How often is TDW updated?
TDW's team is working full-time on the project, so expect feature updates every few weeks or so.
-
Can I run TDW on a Linux server?
Yes. You can optionally run your Python code on a different machine. Additionally, the repo contains a Docker file for TDW. Further details on Docker container.
Our Team
Development Team
Jeremy Schwartz
Project Lead, MIT BCSSeth Alter
Lead Developer, MIT BCSPrincipal Investigators
Contributors
James Traer
MIT BCSJonas Kubilius
MIT BCSMartin Schrimpf
MIT BCSAbhishek Bhandwaldar
MIT-IBM Watson AI LabJulian DeFreitas
Vision Sciences Lab, HarvardDamian Mwroca
Stanford NeuroAILabMichael Lingelbach
Stanford NeuroAILabMegumi Sano
Stanford NeuroAILabDan Bear
Stanford NeuroAILabKuno Kim
Stanford NeuroAILabNick Haber
Stanford NeuroAILabChaofei Fan
Stanford NeuroAILabBrain and Cognitive Sciences, MIT
If you are interested in using TDW in your research, please contact:
Jeremy Schwartz,
TDW Project Lead
43 Vassar St
Cambridge, MA 02139