{"id":2461,"date":"2023-11-29T15:45:37","date_gmt":"2023-11-29T07:45:37","guid":{"rendered":"https:\/\/4399pay.com\/blog\/?p=2461"},"modified":"2023-11-29T15:45:38","modified_gmt":"2023-11-29T07:45:38","slug":"chatgpt%e8%83%8c%e5%90%8e%e7%9a%84%e5%8a%9f%e8%87%a3-rlhf%e6%8a%80%e6%9c%af%e8%af%a6%e8%a7%a3","status":"publish","type":"post","link":"\/blog\/archives\/2461","title":{"rendered":"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3"},"content":{"rendered":"<div id=\"readability-page-1\" class=\"page\">\n<div>\n<p>OpenAI \u63a8\u51fa\u7684 ChatGPT \u5bf9\u8bdd\u6a21\u578b\u6380\u8d77\u4e86\u65b0\u7684 AI \u70ed\u6f6e\uff0c\u5b83\u9762\u5bf9\u4e0d\u540c\u7684\u95ee\u9898\u5bf9\u7b54\u5982\u6d41\uff0c\u4f3c\u4e4e\u5df2\u7ecf\u6253\u7834\u4e86\u673a\u5668\u548c\u4eba\u7684\u8fb9\u754c\u3002\u8fd9\u9879\u5de5\u4f5c\u7684\u80cc\u540e\u662f\u5927\u578b\u8bed\u8a00\u6a21\u578b\uff08Large Language Model\uff0cLLM\uff09\u751f\u6210\u9886\u57df\u7684\u65b0\u8bad\u7ec3\u8303\u5f0f\uff1aRLHF\uff08Reinforcement Learning from Human Feedback\uff09\uff0c\u5373\u4ee5\u5f3a\u5316\u5b66\u4e60\u65b9\u5f0f\u4eba\u7c7b\u53cd\u9988\u4f18\u5316\u8bed\u8a00\u6a21\u578b\u3002<\/p>\n<p>\u8fc7\u53bb\u51e0\u5e74\u91cc\u5404\u79cdLLM\u6839\u636e\u8f93\u5165\u63d0\u793a\uff08\u63d0\u793a\uff09\u751f\u6210\u5927\u8c61\u6587\u672c\u7684\u80fd\u529b\u4ee4\u4eba\u5370\u8c61\u6df1\u523b\u3002\u7136\u800c\uff0c\u5bf9\u751f\u6210\u7ed3\u679c\u7684\u8bc4\u4f30\u662f\u4eba\u7c7b\u7684\u4f18\u52bf\u548c\u4f9d\u8d56\u4e8e\u4e0a\u4e0b\u6587\u7684\uff0c\u4f8b\u5982\uff0c\u6211\u4eec\u5e0c\u671b\u6a21\u578b\u751f\u6210\u4e00\u4e2a\u6709\u521b\u610f\u7684\u6545\u4e8b\u3001\u4e00\u6bb5\u771f\u5b9e\u7684\u4fe1\u606f\u6027\u6587\u672c\uff0c\u6216\u8005\u662f\u6267\u884c\u7684\u4ee3\u7801\u7247\u6bb5\uff0c\u8fd9\u4e9b\u7ed3\u679c\u96be\u4ee5\u4f7f\u7528\u73b0\u6709\u7684\u57fa\u4e8e\u89c4\u5219\u7684\u6587\u672c\u751f\u6210\u6307\u6807\uff08\u5982BLEU\u548cROUGE\uff09\u6765\u8f7b\u677e\u3002\u9664\u4e86\u8bc4\u4f30\u6307\u6807\uff0c\u73b0\u6709\u7684\u6a21\u578b\u901a\u5e38\u53ef\u4ee5\u9884\u6d4b\u4e0b\u4e00\u4e2a\u5355\u8bcd\u7684\u65b9\u5f0f\u548c\u7b80\u5355\u7684\u635f\u5931\u51fd\u6570\uff08\u5982\u4ea4\u53c9\u71b5\uff09\u6765\u5efa\u6a21\uff0c\u6ca1\u6709\u660e\u663e\u7684\u65b9\u5f0f\u5f15\u5165\u4eba\u4eec\u7684\u504f\u597d\u548c\u504f\u597d\u3002<\/p>\n<p>\u5982\u679c\u6211\u4eec<strong>\u7528\u751f\u6210\u6587\u672c\u7684\u4eba\u5de5\u53cd\u9988\u4f5c\u4e3a\u7ee9\u6548\u6807\u51c6\uff0c\u6216\u8005\u66f4\u8fdb\u4e00\u6b65\u7528\u8be5\u53cd\u9988\u635f\u5931\u6765\u4f18\u5316\u6a21\u578b<\/strong>\uff0c\u90a3\u4e0d\u662f\u66f4\u597d\u5417\uff1f\u8fd9\u5c31\u662f RLHF \u7684\u601d\u60f3\uff1a\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u7684\u65b9\u5f0f\u76f4\u63a5\u4f18\u5316\u4f5c\u4e3a\u4eba\u7c7b\u53cd\u9988\u7684\u8bed\u8a00\u6a21\u578b\u3002RLHF \u53ef\u5728\u4e00\u822c\u6587\u672c\u6570\u636e\u8bed\u8a00\u5e93\u4e2d\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u548c\u590d\u6742\u7684\u4eba\u7c7b\u4ef7\u503c\u89c2\u3002<\/p>\n<p>\u770b\u770bChatGPT\u662f\u5982\u4f55\u89e3\u91caRLHF\u7684\uff1a<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2023\/11\/24610.png\" width=\"500\" \/><\/p>\n<p>ChatGPT \u89e3\u91ca\u7684\u5f88\u597d\uff0c\u4f46\u8fd8\u6ca1\u6709\u5b8c\u5168\u8bb2\u900f\uff1b\u8ba9\u6211\u4eec\u66f4\u5177\u4f53\u4e00\u70b9\u5427\uff01<\/p>\n<h2><a id=\"rlhf-\u6280\u672f\u5206\u89e3\" href=\"#rlhf-\u6280\u672f\u5206\u89e3\"><\/a><\/p>\n<p>RLHF \u6280\u672f\u5206\u89e3<\/h2>\n<p>RLHF \u662f\u4e00\u4e2a\u6d89\u53ca\u6a21\u578b\u548c\u4e0d\u540c\u8bad\u7ec3\u9636\u6bb5\u7684\u590d\u6742\u6982\u5ff5\uff0c\u8fd9\u91cc\u6211\u4eec\u6309\u4e09\u4e2a\u6b65\u9aa4\u5212\u5206\uff1a<\/p>\n<ol>\n<li>\u9884\u4e00\u4e2a\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\uff08LM\uff09\uff1b<\/li>\n<li>\u805a\u5408\u95ee\u7b54\u6570\u636e\u5e76\u8bad\u7ec3\u4e00\u4e2a\u5956\u52b1\u6a21\u578b\uff08Reward Model\uff0cRM\uff09\uff1b<\/li>\n<li>\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\uff08RL\uff09\u65b9\u5f0f\u52a0\u91cdLM\u3002<\/li>\n<\/ol>\n<h3><a id=\"step-1-\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\" href=\"#step-1-\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\"><\/a><\/p>\n<p>Step 1.\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b<\/h3>\n<p>\u9996\u5148\uff0c\u6211\u4eec\u4f7f\u7528\u7ecf\u5178\u7684\u9884\u8bad\u7ec3\u76ee\u6807\u8bad\u7ec3\u4e00\u4e2a\u8bed\u8a00\u6a21\u578b\u3002\u5bf9\u4e8e\u8fd9\u4e00\u6b65\u7684\u6a21\u578b\uff0cOpenAI \u5728\u5176\u7b2c\u4e00\u4e2a\u6d41\u884c\u7684 RLHF \u6a21\u578bInstructGPT\u4e2d\u4f7f\u7528\u4e86\u8f83\u5c0f\u7248\u672c\u7684 GPT-3\uff1bAnthropic \u4f7f\u7528\u4e86\u8bad\u7ec3\u4e86 1000 \u4e07 \uff5e 520 \u4ebf\u53c2\u6570\u7684 Transformer \u6a21\u578b\u8fdb\u884c\uff1bDeepMind \u4f7f\u7528\u4e86\u81ea\u5bb6\u7684 2800 \u4ebf\u53c2\u6570\u6a21\u578bGopher\u3002<\/p>\n<p>\u8fd9\u91cc\u53ef\u4ee5\u7528\u989d\u5916\u7684\u6587\u672c\u6216\u8005\u6761\u4ef6\u5bf9\u8fd9\u4e2a LM \u8fdb\u884c\u8c03\u6574\uff0c\u4f8b\u5982 OpenAI \u5bf9\u201c\u66f4\u53ef\u53d6\u201d\uff08\u4f18\u9009\uff09\u7684\u4eba\u5de5\u751f\u6210\u6587\u672c\u8fdb\u884c\u4e86\u8c03\u6574\uff0c\u800c Anthropic \u6309\u201c\u6709\u7528\u3001\u516c\u5e73\u548c\u65e0\u5bb3\u201d\u7684\u6807\u51c6\u5728\u4e0a\u4e0b\u6587\u7ebf\u7d22\u4e0a\u8bf4\u660e\u4e86\u539f\u59cb\u7684LM\u3002\u8fd9\u91cc\u53ef\u80fd\u4f7f\u7528\u4e86\u6602\u8d35\u7684\u589e\u5f3a\u6570\u636e\uff0c\u4f46\u4e0d\u662fRLHF\u5fc5\u987b\u7684\u4e00\u6b65\u3002\u7531\u4e8eRLHF\u8fd8\u662f\u4e00\u4e2a\u5c1a\u5f85\u63a2\u7d22\u7684\u9886\u57df\uff0c\u5bf9\u4e8e\u201c\u54ea\u79cd\u6a21\u578b\u201d\u9002\u5408\u4f5c\u4e3aRLHF\u7684\u8d77\u70b9\u5e76\u6ca1\u6709\u660e\u786e\u7684\u7b54\u6848\u3002<\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2023\/11\/24611.png\" width=\"500\" \/><\/p>\n<p>\u63a5\u4e0b\u6765\uff0c\u6211\u4eec\u4f1a\u6839\u636e LM \u6765\u751f\u6210\u8bad\u7ec3<strong>\u5956\u52b1\u6a21\u578b<\/strong>\uff08RM\uff0c\u4e5f\u53eb\u504f\u597d\u6a21\u578b\uff09\u7684\u6570\u636e\uff0c\u5e76\u5728\u8fd9\u4e00\u6b65\u5f15\u5165\u4eba\u7c7b\u7684\u504f\u597d\u4fe1\u606f\u3002<\/p>\n<h3><a id=\"step-2-\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\" href=\"#step-2-\u8bad\u7ec3\u5956\u52b1\u6a21\u578b\"><\/a><\/p>\n<p>\u6b65\u9aa42.\u8bad\u7ec3\u5956\u52b1\u6a21\u578b<\/h3>\n<p>RM \u7684\u8bad\u7ec3\u662f RLHF \u533a\u522b\u4e8e\u65e7\u8303\u5f0f\u7684\u5f00\u7aef\u3002\u6a21\u578b\u63a5\u6536\u4e00\u7cfb\u5217\u6587\u672c\u5e76\u8fd4\u56de\u4e00\u4e2a\u6807\u91cf\u5956\u52b1\uff0c\u8bc4\u5206\u4e0a\u5bf9\u5e94\u4eba\u7684\u504f\u597d\u3002\u6211\u4eec\u53ef\u4ee5\u7528\u8fd9\u79cd\u7aef\u5230\u7aef\u7684\u65b9\u5f0f\u4e0e LM \u5efa\u6a21\uff0c\u6216\u8005\u7528\u81ea\u5b9a\u4e49\u7684\u7cfb\u7edf\u5efa\u6a21\uff08\u6bd4\u5982\u5bf9\u8f93\u51fa\u8fdb\u884c\u6392\u540d\uff0c\u518d\u5c06\u6392\u540d\u8f6c\u6362\u4e3a\u5956\u52b1\uff09\u3002 \u8fd9\u9879\u5956\u52b1\u6570\u503c\u6709\u540e\u7eed\u63a5\u5165\u73b0\u6709\u7684 RL \u7b97\u6cd5\u81f3\u5173\u91cd\u8981\u3002<\/p>\n<p>\u5173\u4e8e\u6a21\u578b\u9009\u62e9\u65b9\u9762\uff0cRM\u53ef\u4ee5\u662f\u53e6\u4e00\u4e2a\u7ecf\u8fc7\u504f\u597d\u8bad\u7ec3\u8bad\u7ec3\u7684LM\uff0c\u4e5f\u53ef\u4ee5\u662f\u6839\u636e\u504f\u597d\u6570\u636e\u4ece\u5934\u5f00\u59cb\u8bad\u7ec3\u7684LM\u3002\u4f8b\u5982Anthropic\u63d0\u51fa\u4e86\u4e00\u79cd\u7279\u6b8a\u7684\u9884\u8bad\u7ec3\u65b9\u5f0f\uff0c\u5373\u7528\u504f\u597d\u6a21\u578b\u9884\u8bad\u7ec3\uff08Preference Model Pretraining\uff0c PMP\uff09\u6765\u66ff\u6362\u4e00\u822c\u9884\u8bad\u7ec3\u540e\u7684\u6d88\u8017\u8fc7\u7a0b\u3002\u56e0\u4e3a\u4e4b\u524d\u88ab\u8ba4\u4e3a\u5bf9\u6837\u672c\u6570\u636e\u7684\u5229\u7528\u7387\u66f4\u9ad8\u3002\u4f46\u5bf9\u4e8e\u54ea\u79cd RM \u66f4\u597d\u5c1a\u65e0\u5b9a\u8bba\u3002<\/p>\n<p>\u5173\u4e8e\u8bad\u7ec3\u6587\u672c\u65b9\u9762\uff0cRM \u7684\u63d0\u793a &#8211; \u751f\u6210\u5bf9\u6587\u672c\u662f\u4ece\u9884\u5148\u5b9a\u4e49\u7684\u6570\u636e\u96c6\u4e2d\u91c7\u96c6\u751f\u6210\u7684\uff0c\u5e76\u7528\u6700\u7ec8\u7684 LM \u7ed9\u8fd9\u4e9b\u63d0\u793a\u751f\u6210\u6587\u672c\u3002Anthropic \u7684\u6570\u636e\u4e3b\u8981\u662f\u901a\u8fc7 Amazon Mechanical Turk \u4e0a\u7684\u804a\u5929\u5de5\u5177\u751f\u6210\u7684\uff0c\u5e76\u5728Hub\u4e0a\u53ef\u7528\uff0c\u5e76\u4e14OpenAI\u4f7f\u7528\u4e86\u7528\u6237\u63d0\u4ea4\u7ed9GPT API\u7684\u63d0\u793a\u3002<\/p>\n<p>\u5173\u4e8e\u8bad\u7ec3\u5956\u52b1\u5206\u6570\u65b9\u9762\uff0c\u8fd9\u91cc\u9700\u8981\u4eba\u5de5\u5bf9 LM \u751f\u6210\u7684\u56de\u7b54\u8fdb\u884c\u6392\u540d\u3002\u6211\u4eec\u53ef\u80fd\u4f1a\u8ba4\u4e3a\u901a\u8fc7\u76f4\u63a5\u5bf9\u6587\u672c\u6807\u7b7e\u5206\u6570\u6765\u8bad\u7ec3 RM\uff0c\u4f46\u662f\u7531\u4e8e\u6807\u7b7e\u8005\u7684\u5206\u6570\u4e0d\u540c\u5bfc\u81f4\u8fd9\u4e9b\u5206\u6570\u672a\u7ecf\u8fc7\u6392\u5e8f\u5e76\u4e14\u586b\u5145\u4e86\u566a\u97f3\u3002\u6392\u540d\u53ef\u4ee5\u6bd4\u8f83\u591a\u4e2a\u6a21\u578b\u7684\u8f93\u51fa\u5e76\u6784\u5efa\u66f4\u597d\u7684\u89c4\u8303\u6570\u636e\u96c6\u3002<\/p>\n<p>\u5bf9\u4e8e\u5177\u4f53\u7684\u6392\u540d\u65b9\u5f0f\uff0c\u4e00\u79cd\u6210\u529f\u7684\u65b9\u5f0f\u662f\u5bf9\u4e0d\u540c\u7684LM\u5728\u76f8\u540c\u63d0\u793a\u4e0b\u7684\u8f93\u51fa\u8fdb\u884c\u6bd4\u8f83\uff0c\u7136\u540e\u4f7f\u7528Elo\u7cfb\u7edf\u5efa\u7acb\u4e00\u4e2a\u5b8c\u6574\u7684\u6392\u540d\u3002\u8fd9\u4e9b\u4e0d\u540c\u7684\u6392\u540d\u7ed3\u679c\u5c06\u88ab\u5f52\u4e00\u5316\u4e3a\u7528\u4e8e\u7684\u6807\u91cf\u5956\u52b1\u503c\u3002<\/p>\n<p>\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\u4e00\u4e2a\u6709\u8da3\u7684\u4ea7\u7269\u662f\u76ee\u524d\u6210\u529f\u7684 RLHF \u7cfb\u7edf\u4f7f\u7528\u4e86\u5e76\u751f\u6210\u4e86\u5177\u6709\u4e0d\u540c\u5927\u5c0f\u7684 LM \u7684\u6a21\u578b\uff08\u4f8b\u5982 OpenAI \u4f7f\u7528\u4e86 175B \u7684 LM \u548c 6B \u7684 RM\uff0cAnthropic \u4f7f\u7528\u7684 LM \u548c RM \u4ece 10B \u5230 52B \u5927\u5c0f\u4e0d\u7b49\uff0c DeepMind \u4f7f\u7528\u4e86 70B \u7684 Chinchilla \u6a21\u578b\u5206\u522b\u4f5c\u4e3a LM \u548c RM)\u3002\u4e00\u79cd\u76f4\u89c9\u662f\uff0c\u504f\u597d\u6a21\u578b\u548c\u751f\u6210\u6a21\u578b\u9700\u8981\u5177\u6709\u7c7b\u4f3c\u7684\u80fd\u529b\u6765\u7406\u89e3\u63d0\u4f9b\u7ed9\u4ed6\u4eec\u7684\u6587\u672c\u3002<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2023\/11\/24612.png\" width=\"600\" \/><\/p>\n<p>\u63a5\u4e0b\u6765\u662f\u6700\u540e\u4e00\u6b65\uff1a\u5229\u7528 RM \u8f93\u51fa\u7684\u5956\u52b1\uff0c\u7528\u5f3a\u5316\u5b66\u4e60\u65b9\u5f0f\u4f18\u5316 LM\u3002<\/p>\n<h3><a id=\"step-3-\u7528\u5f3a\u5316\u5b66\u4e60\u5fae\u8c03\" href=\"#step-3-\u7528\u5f3a\u5316\u5b66\u4e60\u5fae\u8c03\"><\/a><\/p>\n<p>\u6b65\u9aa4 3. \u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u5f3a\u5ea6<\/h3>\n<p>\u7531\u4e8e\u5de5\u7a0b\u548c\u7b97\u6cd5\u539f\u56e0\uff0c\u8ba4\u4e3a\u7528\u5f3a\u5316\u5b66\u4e60\u8bad\u7ec3LM\u662f\u4e0d\u53ef\u80fd\u7684\u3002\u800c\u76ee\u524d\u8bb8\u591a\u7ec4\u7ec7\u627e\u5230\u7684\u53ef\u884c\u65b9\u6848\u4eba\u4eec\u662f\u4f7f\u7528\u7b56\u7565\u68af\u5ea6\u5f3a\u5316\u5b66\u4e60\uff08\u7b56\u7565\u68af\u5ea6RL\uff09\u7b97\u6cd5\u3001\u8fd1\u7aef\u7b56\u7565\u4f18\u5316\uff08\u8fd1\u7aef\u7b56\u7565\uff09\u4f18\u5316\uff0cPPO) \u521d\u59cb\u5316 LM \u7684\u90e8\u5206\u6216\u5168\u90e8\u53c2\u6570\u3002\u56e0\u4e3a\u626d\u77e9\u6574\u4e2a 10B\uff5e100B+ \u53c2\u6570\u7684\u6210\u672c\u8fc7\u9ad8\uff08\u76f8\u5173\u5de5\u4f5c\u53c2\u8003\u4f4e\u79e9\u9002\u5e94 LoRA \u548c DeepMind \u7684 Sparrow <a href=\"https:\/\/arxiv.org\/abs\/2106.09685\">LM<\/a>\uff09<a href=\"https:\/\/arxiv.org\/abs\/2209.14375\">\u3002PPO<\/a>\u7b97\u6cd5\u5df2\u7ecf\u5b58\u5728\u4e86\u76f8\u5bf9\u8f83\u77ed\u7684\u65f6\u95f4\uff0c\u6709\u5927\u91cf\u5173\u4e8e\u5176\u539f\u7406\u7684\u6307\u5357\uff0c\u4ece\u800c\u6210\u4e3a RLHF \u4e2d\u7684\u6709\u5229\u9009\u62e9\u3002<\/p>\n<p>\u4e8b\u5b9e\u8bc1\u660e\uff0cRLHF \u7684\u8bb8\u591a\u6838\u5fc3 RL \u8fdb\u6b65\u4e00\u76f4\u5728\u5f04\u6e05\u695a\u5982\u4f55\u5c06\u719f\u6089\u7684 RL \u7b97\u6cd5\u5e94\u7528\u5230\u66f4\u65b0\u5982\u6b64\u5927\u7684\u6a21\u578b\u3002<\/p>\n<p>\u8ba9\u6211\u4eec\u9996\u5148\u5c06\u4e00\u4e2a\u4efb\u52a1\u7b56\u7565\u5206\u5e03\u4e3a RL \u95ee\u9898\u3002\u9996\u5148\uff0c\u8be5\uff08<strong>\u7b56\u7565<\/strong>\uff09\u662f\u4e00\u4e2a\u63a5\u53d7\u63d0\u793a\u5e76\u8fd4\u56de\u4e00\u7cfb\u5217\u6587\u672c\uff08\u6216\u6587\u672c\u7684\u6982\u7387\u5206\u5e03\uff09\u7684 LM\u3002\u8fd9\u4e2a\u7b56\u7565\u7684<strong>\u884c\u52a8\u7a7a\u95f4<\/strong>\uff08action space\uff09\u662f LM \u7684\u8bcd\u8868\u5bf9\u5e94\u7684\u6240\u6709\u8bcd\u5143\uff08\u4e00\u822c\u572850k\u6570\u91cf\u7ea7\uff09\uff0c\u89c2\u5bdf<strong>\u7a7a\u95f4\uff08observation<\/strong> space\uff09\u662f\u53ef\u80fd\u7684\u8f93\u5165\u8bcd\u5143\u5e8f\u5217\uff0c\u4e5f\u6bd4\u8f83\u5927\uff08\u8bcd\u6c47\u91cf^\u8f93\u5165\u6807\u8bb0\u7684\u6570\u91cf\uff09\u3002\u5956\u52b1<strong>\u51fd\u6570<\/strong>\u662f\u504f\u597d\u6a21\u578b\u548c\u7b56\u7565\u8f6c\u53d8\u7ea6\u675f\uff08\u653f\u7b56\u8f6c\u5411\u7ea6\u675f\uff09\u7684\u7ed3\u5408\u3002<\/p>\n<p>PPO \u7b97\u6cd5\u786e\u5b9a\u7684\u5956\u52b1\u51fd\u6570\u5177\u4f53\u8ba1\u7b97\u5982\u4e0b\uff1a\u5c06\u63d0\u793a<em>x<\/em>\u8f93\u5165\u521d\u59cb LM \u548c\u5f53\u524d\u5f53\u524d\u7684 LM\uff0c\u5206\u522b\u5f97\u5230\u8f93\u51fa\u4e86\u6587\u672c<em>y1<\/em> , <em>y2<\/em>\uff0c\u5c06\u4ece\u5f53\u524d\u7b56\u7565\u7684\u6587\u672c\u4f20\u9012\u7ed9 RM \u5f97\u5230\u4e00\u4e2a\u6807\u91cf\u7684\u5956\u52b1<\/p>\n<p>r\u03b8r_\\\u03b8\u5c06\u4e24\u4e2a\u6a21\u578b\u7684\u751f\u6210\u6587\u672c\u8fdb\u884c\u6bd4\u8f83\u8ba1\u7b97\u8bef\u5dee\u7684\u60e9\u7f5a\u9879\uff0c\u5728\u6765\u81eaOpenAI\u3001Anthropic\u548cDeepMind\u7684\u591a\u7bc7\u8bba\u6587\u4e2d\u8bbe\u8ba1\u4e3a\u8f93\u51fa\u8bcd\u5206\u5e03\u5e8f\u5217\u4e4b\u95f4\u7684Kullback\u2013Leibler (KL)\u6563\u5ea6\u6563\u5ea6\u7684\u7f29\u653e<a href=\"https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence\">\uff0c<\/a>\u5373<\/p>\n<p>r=r\u03b8-\u03bbr\u5409\u9686\u5761r = r_\\theta &#8211; \\lambda r_\\text{KL}\u8fd9\u4e00\u9879\u88ab\u7528\u4e8e\u60e9\u7f5aRL\u7b56\u7565\u5728\u6bcf\u4e2a\u8bad\u7ec3\u6279\u6b21\u4e2d\u751f\u6210\u81f4\u547d\u6700\u521d\u6a21\u578b\uff0c\u4ee5\u786e\u4fdd\u6a21\u578b\u8f93\u51fa\u5408\u7406\u8fde\u8d2f\u7684\u6587\u672c\u3002\u5982\u679c\u53bb\u6389\u8fd9\u4e00\u60e9\u7f5a\u9879\u53ef\u80fd\u5bfc\u81f4\u6a21\u578b\u5728\u4f18\u5316\u4e2d\u751f\u6210\u4e71\u7801\u6587\u672c\u6765\u611a\u5f04\u5956\u52b1\u6a21\u578b\u63d0\u4f9b\u9ad8\u5956\u52b1\u4ef7\u503c\u3002\u6b64\u5916\uff0cOpenAI \u5728 InstructGPT \u4e0a\u7684\u5b9e\u9a8c\u5728 PPO \u4e0a\u6dfb\u52a0\u4e86\u65b0\u7684\u9884\u8bad\u7ec3\u68af\u5ea6\uff0c\u53ef\u4ee5\u9884\u89c1\u5230\u5956\u52b1\u51fd\u6570\u7684\u516c\u5f0f\u4f1a\u968f\u7740 RLHF \u7814\u7a76\u7684\u8fdb\u5c55\u800c\u7ee7\u7eed\u53d1\u5c55\u3002<\/p>\n<p>\u6700\u540e\u6839\u636ePPO\u7b97\u6cd5\uff0c\u6211\u4eec\u6309\u5f53\u524d\u6279\u6b21\u6570\u636e\u7684\u5956\u52b1\u6307\u6807\u8fdb\u884c\u4f18\u5316\uff08\u6765\u81eaPPO\u7b97\u6cd5on-policy\u7684\u7279\u6027\uff09\u3002PPO\u7b97\u6cd5\u662f\u4e00\u79cd\u4fe1\u4efb\u57df\u4f18\u5316\uff08Trust Region Optimization\uff0cTRO\uff09\u7b97\u6cd5\uff0c\u5b83\u4f7f\u7528\u68af\u5ea6\u7ea6\u675f\u786e\u4fdd\u66f4\u65b0\u6b65\u9aa4\u4e0d\u4f1a\u7834\u574f\u5b66\u4e60\u8fc7\u7a0b\u7684\u7a33\u5b9a\u6027\u3002DeepMind \u5bf9 Gopher \u4f7f\u7528\u4e86\u7c7b\u4f3c\u7684\u5956\u52b1\u8bbe\u7f6e\uff0c\u4f46\u662f\u4f7f\u7528 A2C\uff08\u540c\u6b65\u4f18\u52bf actor-critic\uff09\u7b97\u6cd5\u6765\u4f18\u5316\u68af\u5ea6\u3002<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2023\/11\/24613.png\" width=\"650\" \/><\/p>\n<p>\u4f5c\u4e3a\u4e00\u4e2a\u53ef\u7528\u9009\u9879\uff0cRLHF \u53ef\u4ee5\u901a\u8fc7\u8fed\u4ee3 RM \u548c\u7b56\u7565\u5171\u540c\u4f18\u5316\u3002\u968f\u7740\u7b56\u7565\u6a21\u578b\u66f4\u65b0\uff0c\u7528\u6237\u53ef\u4ee5\u7ee7\u7eed\u5c06\u8f93\u51fa\u548c\u65e9\u671f\u7684\u8f93\u51fa\u8fdb\u884c\u5408\u5e76\u6392\u540d\u3002Anthropic \u5728\u4ed6\u4eec\u7684\u8bba\u6587\u4e2d\u8ba8\u8bba\u4e86\u8fed\u4ee3\u5728\u7ebf RLHF \uff0c\u5176\u4e2d\u7b56\u7565<a href=\"https:\/\/arxiv.org\/abs\/2204.05862\">\u7684<\/a>\u8fed\u4ee3\u5305\u542b\u5728\u8de8\u6a21\u578b\u7684Elo\u6392\u540d\u7cfb\u7edf\u4e2d\u3002\u8fd9\u6837\u5f15\u5165\u7b56\u7565\u548cRM\u6f14\u53d8\u7684\u590d\u6742\u52a8\u6001\uff0c\u4ee3\u8868\u4e86\u4e00\u4e2a\u590d\u6742\u548c\u5f00\u653e\u7684\u7814\u7a76\u95ee\u9898\u3002<\/p>\n<h2><a id=\"open-source-tools-for-rlhf\" href=\"#open-source-tools-for-rlhf\"><\/a><\/p>\n<p>RLHF \u5f00\u6e90\u5de5\u5177<\/h2>\n<p>\u5982\u4eca\uff0cPyTorch \u4e2d\u5df2\u7ecf\u6709\u4e00\u4e9b\u6d3b\u8dc3\u7684 RLHF \u5b58\u50a8\u5e93\u5c31\u662f\u7531\u6b64\u4ea7\u751f\u7684\u3002\u4e3b\u8981\u5b58\u50a8\u5e93\u662f Transformers \u5f3a\u5316\u5b66\u4e60 ( TRL )\u3001TRLX\uff08\u6700\u521d\u662f TRL \u7684\u5206\u652f\uff09\u548c\u8bed\u8a00\u6a21\u578b\u5f3a\u5316\u5b66\u4e60 ( RL4LMs )\u3002<\/p>\n<p>TRL \u65e8\u5728\u4f7f\u7528 PPO \u5fae\u8c03 Hugging Face \u751f\u6001\u7cfb\u7edf\u4e2d\u7684\u9884\u8bad\u7ec3 LM\u3002TRLX \u662fCarperAI\u6784\u5efa\u7684 TRL \u7684\u6269\u5c55\u5206\u652f\uff0c\u7528\u4e8e\u5904\u7406\u5728\u7ebf\u548c\u79bb\u7ebf\u8bad\u7ec3\u7684\u5927\u578b\u6a21\u578b\u3002\u76ee\u524d\uff0cTRLX \u62e5\u6709\u4e00\u4e2a API\uff0c\u80fd\u591f\u5728 LLM \u90e8\u7f72\u6240\u9700\u7684\u89c4\u6a21\uff08\u4f8b\u5982 330 \u4ebf\u4e2a\u53c2\u6570\uff09\u4e0a\u4f7f\u7528 PPO \u548c\u9690\u5f0f\u8bed\u8a00 Q-Learning ILQL\u8fdb\u884c\u751f\u4ea7\u5c31\u7eea\u7684 RLHF\u3002TRLX \u7684\u672a\u6765\u7248\u672c\u5c06\u5141\u8bb8\u8bed\u8a00\u6a21\u578b\u8fbe\u5230 200B \u53c2\u6570\u3002\u56e0\u6b64\uff0c\u4e0e TRLX \u7684\u63a5\u53e3\u9488\u5bf9\u5177\u6709\u6b64\u7c7b\u89c4\u6a21\u7ecf\u9a8c\u7684\u673a\u5668\u5b66\u4e60\u5de5\u7a0b\u5e08\u8fdb\u884c\u4e86\u4f18\u5316\u3002<\/p>\n<p>RL4LM\u63d0\u4f9b\u4e86\u7528\u4e8e\u5fae\u8c03\u548c\u8bc4\u4f30 LLM \u7684\u6784\u5efa\u5757\uff0c\u5176\u4e2d\u5305\u62ec\u5404\u79cd RL \u7b97\u6cd5\uff08PPO\u3001NLPO\u3001A2C \u548c TRPO\uff09\u3001\u5956\u52b1\u51fd\u6570\u548c\u6307\u6807\u3002\u6b64\u5916\uff0c\u8be5\u5e93\u6613\u4e8e\u5b9a\u5236\uff0c\u5141\u8bb8\u5728\u4efb\u610f\u7528\u6237\u6307\u5b9a\u7684\u5956\u52b1\u51fd\u6570\u4e0a\u8bad\u7ec3\u4efb\u4f55\u7f16\u7801\u5668-\u89e3\u7801\u5668\u6216\u57fa\u4e8e\u7f16\u7801\u5668\u53d8\u538b\u5668\u7684 LM\u3002\u503c\u5f97\u6ce8\u610f\u7684\u662f\uff0c\u5b83\u5728\u6700\u8fd1\u591a\u8fbe 2000 \u4e2a\u5b9e\u9a8c\u7684\u5e7f\u6cdb\u4efb\u52a1\u4e2d\u7ecf\u8fc7\u4e86\u5145\u5206\u7684\u6d4b\u8bd5\u548c\u57fa\u51c6\u6d4b\u8bd5\uff0c\u7a81\u51fa\u4e86\u6570\u636e\u9884\u7b97\u6bd4\u8f83\uff08\u4e13\u5bb6\u6f14\u793a\u4e0e\u5956\u52b1\u5efa\u6a21\uff09\u3001\u5904\u7406\u5956\u52b1\u9ed1\u5ba2\u548c\u8bad\u7ec3\u4e0d\u7a33\u5b9a\u6027\u7b49\u65b9\u9762\u7684\u4e00\u4e9b\u5b9e\u7528\u89c1\u89e3\u3002\u5f53\u524d\u7684\u8ba1\u5212\u5305\u62ec\u5927\u578b\u6a21\u578b\u7684\u5206\u5e03\u5f0f\u8bad\u7ec3\u548c\u65b0\u7684\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u3002<\/p>\n<p>TRLX \u548c RL4LM \u90fd\u5904\u4e8e\u8fdb\u4e00\u6b65\u5f00\u53d1\u9636\u6bb5\uff0c\u56e0\u6b64\u9884\u8ba1\u5f88\u5feb\u4f1a\u6709\u66f4\u591a\u529f\u80fd\u3002<\/p>\n<p>Hub \u4e0a\u6709\u4e00\u4e2a\u7531 Anthropic \u521b\u5efa\u7684\u5927\u578b\u6570\u636e\u96c6\u3002<\/p>\n<h2><a id=\"rlhf-\u7684\u672a\u6765\" href=\"#rlhf-\u7684\u672a\u6765\"><\/a><\/p>\n<p>RLHF \u7684\u672a\u6765<\/h2>\n<p>\u5c3d\u7ba1 RLHF \u53d6\u5f97\u4e86\u4e00\u5b9a\u7684\u6210\u679c\u548c\u5173\u6ce8\uff0c\u4f46\u4ecd\u7136\u5b58\u5728\u6b7b\u4ea1\u3002\u8fd9\u4e9b\u6a21\u578b\u4ecd\u7136\u4f1a\u6beb\u65e0\u610f\u4e49\u5730\u8f93\u51fa\u635f\u5bb3\u6216\u8005\u4e0d\u771f\u5b9e\u7684\u6587\u672c\u3002\u8fd9\u79cd\u4e0d\u5b8c\u7f8e\u4e5f\u662f RLHF \u7684\u957f\u671f\u6311\u6218\u548c\u52a8\u529b\u2014\u2014\u5728\u4eba\u7c7b\u56fa\u6709\u7684\u9886\u57df\u4e2d\u8fd0\u884c\u610f\u5473\u7740\u6c38\u8fdc\u4e0d\u4f1a\u8fbe\u5230\u4e00\u4e2a\u5b8c\u7f8e\u7684\u6807\u51c6\u3002<\/p>\n<p>\u6536\u96c6\u4eba\u7c7b\u504f\u597d\u6570\u636e\u7684\u8d28\u91cf\u548c\u6570\u91cf\u51b3\u5b9a\u4e86RLHF\u7cfb\u7edf\u6027\u80fd\u7684\u4e0a\u9650\u3002RLHF\u7cfb\u7edf\u9700\u8981\u4e24\u79cd\u4eba\u7c7b\u504f\u597d\u6570\u636e\uff1a\u4eba\u5de5\u751f\u6210\u7684\u6587\u672c\u548c\u5bf9\u6a21\u578b\u8f93\u51fa\u7684\u504f\u597d\u6807\u7b7e\u3002\u751f\u6210\u9ad8\u8d28\u91cf\u7b54\u6848\u9700\u8981\u96c7\u4f63\u56fa\u5b9a\u4eba\u5458\uff08\u800c\u4e0d\u80fd\u4f9d\u8d56\u4ea7\u54c1\uff09\u53e6\u5916\uff0c\u8bad\u7ec3 RM \u9700\u8981\u7684\u5956\u52b1\u6807\u7b7e\u89c4\u6a21\u5927\u6982\u662f 50k \u5de6\u53f3\uff0c\u6240\u4ee5\u5e76\u4e0d\u662f\u90a3\u4e48\u6602\u8d35\uff08\u5f53\u7136\u8fdc\u8d85\u4e86\u5b9e\u9a8c\u5ba4\u5b66\u672f\u7684\u9884\u7b97\uff09\u3002\u76ee\u524d\u76f8\u5173\u7684\u6570\u636e\u96c6\u53ea\u6709\u4e00\u4e2a\u57fa\u4e8e\u901a\u7528 LM \u7684RLHF \u6570\u636e\u96c6\uff08\u6765\u81eaAnthropic\u548c\u51e0\u4e2a\u8f83\u5c0f\u7684\u5b50\u4efb\u52a1\u6570\u636e\u96c6\uff08\u4f8b\u5982\u6765\u81eaOpenAI\u7684\u6458\u8981\u6570\u636e\u96c6\uff09\u3002\u53e6\u4e00\u4e2a\u6311\u6218\u6765\u81ea\u6807\u6ce8\u8005\u7684\u504f\u89c1\u3002\u4e00\u4e9b\u4eba\u7c7b\u6807\u6ce8\u8005\u53ef\u80fd\u6709\u4e0d\u540c\u7684\u610f\u89c1\uff0c\u5bfc\u81f4\u8bad\u7ec3\u6570\u636e\u5b58\u5728\u4e00\u4e9b\u6f5c\u5728\u5dee\u5f02\u3002<\/p>\n<p>\u9664\u5f00\u6570\u636e\u65b9\u9762\u7684\u9650\u5236\u5916\uff0c\u8fd8\u6709\u4e00\u4e9b\u5f85\u5f00\u53d1\u7684\u8bbe\u8ba1\u9009\u9879\u53ef\u4ee5\u8ba9 RLHF \u53d6\u5f97\u957f\u8db3\u8fdb\u6b65\u3002\u4f8b\u5982\u5bf9\u4e8e RL \u4f18\u5316\u5668\u7684\u6539\u8fdb\u65b9\u9762\uff0cPPO \u662f\u4e00\u79cd\u8f83\u65e7\u7684\u7b97\u6cd5\uff0c\u4f46\u76ee\u524d\u8fd8\u6ca1\u6709\u4ec0\u4e48\u7ed3\u6784\u6027\u539f\u56e0\u8ba9\u5176\u4ed6\u7b97\u6cd5\u53ef\u4ee5\u5728\u73b0\u6709 RLHF \u5de5\u4f5c\u4e2d\u66f4\u5177\u6709\u4f18\u52bf\u3002\u53e6\u5916\uff0c\u5f20\u529b LM \u7b56\u7565\u7684\u4e00\u5927\u6210\u672c\u662f\u7b56\u7565\u751f\u6210\u7684\u6587\u672c\u90fd\u9700\u8981\u5728 RM \u4e0a\u8fdb\u884c\u8bc4\u4f30\uff0c\u901a\u8fc7\u79bb\u7ebf RL \u4f18\u5316\u7b56\u7565\u53ef\u4ee5\u8282\u7701\u8fd9\u4e9b\u5927\u6a21\u578b RM \u7684\u9884\u6d4b\u6210\u672c\u3002\u6700\u8fd1\uff0c\u51fa\u73b0\u4e86\u65b0\u7684RL \u7b97\u6cd5\u5982\u9690\u5f0f Q \u5b66\u4e60 (Implicit Language Q-Learning\uff0cILQL ) \u4e5f\u9002\u7528\u4e8e\u5f53\u524d RL \u7684\u4f18\u5316\u3002\u5728 RL \u8bad\u7ec3\u8fc7\u7a0b\u7684\u5176\u4ed6\u6838\u5fc3\u6743\u8861\uff0c\u4f8b\u5982\u63a2\u7d22\u548c\u5f00\u53d1 (exploration-exploitation) \u7684\u5e73\u8861\u4e5f\u6709\u5f85\u5c1d\u8bd5\u548c\u63a2\u7d22\u8fd9\u4e9b\u65b9\u5411\u81f3\u5c11\u80fd\u591f\u52a0\u6df1\u6211\u4eec\u5bf9 RLHF \u7684\u7406\u89e3\uff0c\u66f4\u8fdb\u4e00\u6b65\u63d0\u5347\u7cfb\u7edf\u7684\u8868\u73b0\u3002<\/p>\n<h3><a id=\"\u53c2\u8003\u8d44\u6599\" href=\"#\u53c2\u8003\u8d44\u6599\"><\/a><\/h3>\n<p><!-- HTML_TAG_END --><\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI \u63a8\u51fa\u7684 ChatGPT \u5bf9\u8bdd\u6a21\u578b\u6380\u8d77\u4e86\u65b0\u7684 AI \u70ed\u6f6e\uff0c\u5b83\u9762\u5bf9\u4e0d\u540c\u7684\u95ee\u9898\u5bf9\u7b54\u5982\u6d41\uff0c\u4f3c\u4e4e\u5df2\u7ecf\u6253\u7834\u4e86 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2056,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[11,57,9,8],"class_list":["post-2461","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-chatgpt","tag-chatgpt","tag-openai","tag-visa","tag-8"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3 - 4399 PAY<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/4399pay.com\/blog\/archives\/2461\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3 - 4399 PAY\" \/>\n<meta property=\"og:description\" content=\"OpenAI \u63a8\u51fa\u7684 ChatGPT \u5bf9\u8bdd\u6a21\u578b\u6380\u8d77\u4e86\u65b0\u7684 AI \u70ed\u6f6e\uff0c\u5b83\u9762\u5bf9\u4e0d\u540c\u7684\u95ee\u9898\u5bf9\u7b54\u5982\u6d41\uff0c\u4f3c\u4e4e\u5df2\u7ecf\u6253\u7834\u4e86 [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/4399pay.com\/blog\/archives\/2461\" \/>\n<meta property=\"og:site_name\" content=\"4399 PAY\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-29T07:45:37+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-11-29T07:45:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\/\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"Article\",\n\t            \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"admin\",\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/#\/schema\/person\/d2ca2d47f2751f1fcc8c4f143823629a\"\n\t            },\n\t            \"headline\": \"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3\",\n\t            \"datePublished\": \"2023-11-29T07:45:37+00:00\",\n\t            \"dateModified\": \"2023-11-29T07:45:38+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461\"\n\t            },\n\t            \"wordCount\": 242,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg\",\n\t            \"keywords\": [\n\t                \"ChatGPT\",\n\t                \"OpenAI\",\n\t                \"Visa\",\n\t                \"\u865a\u62df\u4fe1\u7528\u5361\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"ChatGPT\"\n\t            ],\n\t            \"inLanguage\": \"zh-Hans\"\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461\",\n\t            \"url\": \"https:\/\/4399pay.com\/blog\/archives\/2461\",\n\t            \"name\": \"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3 - 4399 PAY\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg\",\n\t            \"datePublished\": \"2023-11-29T07:45:37+00:00\",\n\t            \"dateModified\": \"2023-11-29T07:45:38+00:00\",\n\t            \"breadcrumb\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461#breadcrumb\"\n\t            },\n\t            \"inLanguage\": \"zh-Hans\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\/\/4399pay.com\/blog\/archives\/2461\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"zh-Hans\",\n\t            \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage\",\n\t            \"url\": \"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg\",\n\t            \"contentUrl\": \"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg\",\n\t            \"width\": 1200,\n\t            \"height\": 675\n\t        },\n\t        {\n\t            \"@type\": \"BreadcrumbList\",\n\t            \"@id\": \"https:\/\/4399pay.com\/blog\/archives\/2461#breadcrumb\",\n\t            \"itemListElement\": [\n\t                {\n\t                    \"@type\": \"ListItem\",\n\t                    \"position\": 1,\n\t                    \"name\": \"\u9996\u9875\",\n\t                    \"item\": \"https:\/\/4399pay.com\/blog\"\n\t                },\n\t                {\n\t                    \"@type\": \"ListItem\",\n\t                    \"position\": 2,\n\t                    \"name\": \"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3\"\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\/\/4399pay.com\/blog\/#website\",\n\t            \"url\": \"https:\/\/4399pay.com\/blog\/\",\n\t            \"name\": \"4399 PAY\",\n\t            \"description\": \"\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\/\/4399pay.com\/blog\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"zh-Hans\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\/\/4399pay.com\/blog\/#organization\",\n\t            \"name\": \"4399 PAY\",\n\t            \"url\": \"https:\/\/4399pay.com\/blog\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"zh-Hans\",\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/#\/schema\/logo\/image\/\",\n\t                \"url\": \"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2025\/03\/99Pay.jpg\",\n\t                \"contentUrl\": \"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2025\/03\/99Pay.jpg\",\n\t                \"width\": 300,\n\t                \"height\": 307,\n\t                \"caption\": \"4399 PAY\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/#\/schema\/logo\/image\/\"\n\t            }\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\/\/4399pay.com\/blog\/#\/schema\/person\/d2ca2d47f2751f1fcc8c4f143823629a\",\n\t            \"name\": \"admin\",\n\t            \"image\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"zh-Hans\",\n\t                \"@id\": \"https:\/\/4399pay.com\/blog\/#\/schema\/person\/image\/\",\n\t                \"url\": \"https:\/\/secure.gravatar.com\/avatar\/814700148bf50464026547f393f2fa7020c227b11212fc4dd648a4b277ff44f3?s=96&d=mm&r=g\",\n\t                \"contentUrl\": \"https:\/\/secure.gravatar.com\/avatar\/814700148bf50464026547f393f2fa7020c227b11212fc4dd648a4b277ff44f3?s=96&d=mm&r=g\",\n\t                \"caption\": \"admin\"\n\t            },\n\t            \"sameAs\": [\n\t                \"https:\/\/4399pay.com\/blog\"\n\t            ],\n\t            \"url\": \"\/blog\/archives\/author\/admin\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3 - 4399 PAY","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/4399pay.com\/blog\/archives\/2461","og_locale":"zh_CN","og_type":"article","og_title":"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3 - 4399 PAY","og_description":"OpenAI \u63a8\u51fa\u7684 ChatGPT \u5bf9\u8bdd\u6a21\u578b\u6380\u8d77\u4e86\u65b0\u7684 AI \u70ed\u6f6e\uff0c\u5b83\u9762\u5bf9\u4e0d\u540c\u7684\u95ee\u9898\u5bf9\u7b54\u5982\u6d41\uff0c\u4f3c\u4e4e\u5df2\u7ecf\u6253\u7834\u4e86 [&hellip;]","og_url":"https:\/\/4399pay.com\/blog\/archives\/2461","og_site_name":"4399 PAY","article_published_time":"2023-11-29T07:45:37+00:00","article_modified_time":"2023-11-29T07:45:38+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg","type":"image\/jpeg"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"1 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/4399pay.com\/blog\/archives\/2461#article","isPartOf":{"@id":"https:\/\/4399pay.com\/blog\/archives\/2461"},"author":{"name":"admin","@id":"https:\/\/4399pay.com\/blog\/#\/schema\/person\/d2ca2d47f2751f1fcc8c4f143823629a"},"headline":"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3","datePublished":"2023-11-29T07:45:37+00:00","dateModified":"2023-11-29T07:45:38+00:00","mainEntityOfPage":{"@id":"https:\/\/4399pay.com\/blog\/archives\/2461"},"wordCount":242,"publisher":{"@id":"https:\/\/4399pay.com\/blog\/#organization"},"image":{"@id":"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage"},"thumbnailUrl":"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg","keywords":["ChatGPT","OpenAI","Visa","\u865a\u62df\u4fe1\u7528\u5361"],"articleSection":["ChatGPT"],"inLanguage":"zh-Hans"},{"@type":"WebPage","@id":"https:\/\/4399pay.com\/blog\/archives\/2461","url":"https:\/\/4399pay.com\/blog\/archives\/2461","name":"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3 - 4399 PAY","isPartOf":{"@id":"https:\/\/4399pay.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage"},"image":{"@id":"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage"},"thumbnailUrl":"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg","datePublished":"2023-11-29T07:45:37+00:00","dateModified":"2023-11-29T07:45:38+00:00","breadcrumb":{"@id":"https:\/\/4399pay.com\/blog\/archives\/2461#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/4399pay.com\/blog\/archives\/2461"]}]},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/4399pay.com\/blog\/archives\/2461#primaryimage","url":"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg","contentUrl":"\/blog\/wp-content\/uploads\/2023\/11\/1680917624-24c0604e17e910de0f206348c1e99e38.jpeg","width":1200,"height":675},{"@type":"BreadcrumbList","@id":"https:\/\/4399pay.com\/blog\/archives\/2461#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/4399pay.com\/blog"},{"@type":"ListItem","position":2,"name":"ChatGPT\u80cc\u540e\u7684\u201c\u529f\u81e3\u201d\u2014\u2014RLHF\u6280\u672f\u8be6\u89e3"}]},{"@type":"WebSite","@id":"https:\/\/4399pay.com\/blog\/#website","url":"https:\/\/4399pay.com\/blog\/","name":"4399 PAY","description":"","publisher":{"@id":"https:\/\/4399pay.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/4399pay.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/4399pay.com\/blog\/#organization","name":"4399 PAY","url":"https:\/\/4399pay.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/4399pay.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2025\/03\/99Pay.jpg","contentUrl":"https:\/\/4399pay.com\/blog\/wp-content\/uploads\/2025\/03\/99Pay.jpg","width":300,"height":307,"caption":"4399 PAY"},"image":{"@id":"https:\/\/4399pay.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/4399pay.com\/blog\/#\/schema\/person\/d2ca2d47f2751f1fcc8c4f143823629a","name":"admin","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/4399pay.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/814700148bf50464026547f393f2fa7020c227b11212fc4dd648a4b277ff44f3?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/814700148bf50464026547f393f2fa7020c227b11212fc4dd648a4b277ff44f3?s=96&d=mm&r=g","caption":"admin"},"sameAs":["https:\/\/4399pay.com\/blog"],"url":"\/blog\/archives\/author\/admin"}]}},"_links":{"self":[{"href":"\/blog\/wp-json\/wp\/v2\/posts\/2461","targetHints":{"allow":["GET"]}}],"collection":[{"href":"\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"\/blog\/wp-json\/wp\/v2\/comments?post=2461"}],"version-history":[{"count":3,"href":"\/blog\/wp-json\/wp\/v2\/posts\/2461\/revisions"}],"predecessor-version":[{"id":2469,"href":"\/blog\/wp-json\/wp\/v2\/posts\/2461\/revisions\/2469"}],"wp:featuredmedia":[{"embeddable":true,"href":"\/blog\/wp-json\/wp\/v2\/media\/2056"}],"wp:attachment":[{"href":"\/blog\/wp-json\/wp\/v2\/media?parent=2461"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"\/blog\/wp-json\/wp\/v2\/categories?post=2461"},{"taxonomy":"post_tag","embeddable":true,"href":"\/blog\/wp-json\/wp\/v2\/tags?post=2461"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}