Is there some bottleneck there that prevents RL from scaling up performance to larger non-MoE model?
Is there some bottleneck there that prevents RL from scaling up performance to larger non-MoE model?