Computer Vision News - June 2022
15 ICLR 2022 Researchers carefully set these parameters using big hyperparameter sweeps or costly trial and error procedures . The hyperparameters are automatically tuned in this work by running an extra optimization process – the meta-learning part. A meta-gradient algorithm predicts what the next best hyperparameters are going to be and then tunes that prediction again using gradient descent . Sebastian tells us he found this an atypical project – it took him a couple of months of theoretical reading and toying around with ideas before he proposed a much simpler algorithm. “ I said there aren’t any theoretical The experiment people tend to default to in the gradient space is the Atari one, where a vast neural network is trained using reinforcement learning to master a game like Pong or Breakout . There are hundreds of thousands of parameters, and gradient descent optimizes the neural network given a reinforcement learning objective. These are typically called actor-critic objectives because two neural networks are interacting, with one trying to estimate how good the other is. The mass of hyperparameters in these algorithms is critical to ensuring they are stable. If one algorithm is overestimating or underestimating, it can learn bad behavior.
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=