求加入ppo和grpo的强化学习的代码,感谢大佬,你就是我的GPT入门的导师啊 #239
Unanswered
2547454881
asked this question in
Q&A
Replies: 1 comment
-
PPO不如DPO/KTO吧,GRPO这个参数量搞不了会崩的。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
求加入ppo和grpo的强化学习的代码,感谢大佬,你就是我的GPT入门的导师啊
Beta Was this translation helpful? Give feedback.
All reactions