Computer Vision News Computer Vision News 4 Another significant application of this technology is in reinforcement learning, drawing parallels with the success of AlphaGo, Google DeepMind’s program that mastered the game of Go. AlphaGo defeated professional player Lee Sedol in 2016 by learning optimal strategies through self-play simulations, ultimately creating a player superior to humans. It has a perfect simulator, with the game’s rules hardcoded for training and a reinforcement learning algorithm that encourages winning behaviors and discourages losing ones. However, replicating this success in real-world scenarios is challenging due to a lack of perfect simulators. “There’s no simulator for the real world,” Sherry tells us. “There’s no way we can have an agent or computer automatically interacting with a simulator and learning from making mistakes because the simulators people use are often some kind of toy simulator that looks very different from the real world. People could directly have real-world interactions and learn from making real mistakes, but it’s expensive and unsafe. We can’t just have robots breaking things to learn!” Here, the paper’s idea of UniSim, using diverse real-world data to create realistic simulations, comes in. It can visualize the effects of executing various language instructions, even unsafe ones, without real-world risk. By combining this advanced simulation with reinforcement learning algorithms, the model can train agents to achieve superhuman performance in a range of tasks, not just games like Go. Data curation was one of the project’s biggest challenges. Most of the videos on platforms like YouTube have speech transcripts but no detailed action annotations. Robotics datasets inherently document low-level control actions but use formats like ∆x, ∆y, or endpoint movements and forces. “We have to convert those continuous ICLR Outstanding Paper
RkJQdWJsaXNoZXIy NTc3NzU=