ChatGPT Is Mind-Blowing — Everything You Need To Know
https://levelup.gitconnected.
Extract :
While games have predefined rules and rewards, a conversation does not, thus, human feedback becomes essential.
This was done by prompting a model,
sampling several responses and then letting a human manually rank the responses.
These rankings will then become training data for a reward
model.
Finally, a fine-tuned language model will be further trained using reinforcement learning to respond to questions so as to optimize the output of the
reward model. For more information, check out OpenAI’s blog post:
No comments:
Post a Comment