Our code is based on open-r1, with our customized Trainer for mixed SFT+GRPO training. Some other updates focus on the white-box RL (reward function design) and post-completion training (replacement ...
Beyond does contain multiple endings or, at least, different variations of the ending depending on your completion rate.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results