Tesla’s humanoid robot Optimus is making significant progress towards Elon Musk’s vision, with major improvements announced by the company. The robot, which was initially unveiled as a prototype last year, can now perform a range of tasks including picking up and sorting objects, doing yoga, and maneuvering through its surroundings. Compared to other humanoid robots like those developed by Boston Dynamics, Optimus stands out with its functional fingers and neural networks, allowing for more natural and versatile movements.
Jim Fan, a Senior AI scientist at NVIDIA, has conducted an in-depth analysis of Optimus and shed light on how the robot functions. It appears that Optimus achieves its impressive dexterity through a unique approach known as imitation learning or behavior cloning. This means the robot learns by imitating human operators, similar to how animations are recorded in games. This method contrasts with reinforcement learning in a simulated environment, which usually results in jerky movements and unnatural poses.
It is believed that Optimus’s smooth hand movements are achieved through the use of motion capture technology. This may involve operators wearing devices like the CyberGlove to capture real-time hand motion signals and haptic feedback, which are then applied to the robot. Additionally, Optimus could utilize a specially designed teleoperation system that allows human operators to precisely control its movements. Stanford AI Labs, for example, has developed ALOHA, which facilitates intricate and precise motions.
Another possible method is computer vision-based motion capture, where cameras and GPUs capture hand motions without the need for markers or gloves. Technologies like DexPilot from NVIDIA enable this marker-less and glove-free data collection. Furthermore, the training process can be turned into a virtual reality game, with operators using VR controllers or CyberGloves to role play as Optimus. This approach offers scalability as annotators from around the world can contribute remotely.
Optimus’s ability to learn from human demonstrations and perform precise hand movements is supported by a sophisticated neural architecture. The robot is trained in an end-to-end manner, taking videos as input and producing corresponding actions as output. It analyzes images to understand its environment, processes videos either frame by frame or as a whole, and potentially integrates language prompts to influence its perception. To translate continuous motion signals into discrete actions, Optimus may use various methods such as categorization or compression.
Elon Musk has expressed confidence in Optimus, stating that in the future, everyone would want to have one or even more of these humanoid robots. Tesla’s transition from a car company to an AI company is evident through its efforts in recruiting AI experts specifically for building Optimus. The advancements in humanoid robots are not limited to Tesla, with several other companies such as RoboFab, Figure.ai, Boston Dynamics, Chinese company Fourier Intelligence, and OpenAI investing in 1X robotics company. The future seems promising for humanoid robots, and the coming year holds many possibilities.