Emails (Others): LIKE PROPOSED " ANSWER RATING SYSTEM " ?

Thursday, 28 September 2023

ChatGPT Is Mind-Blowing — Everything You Need To Know

Extract :

While games have predefined rules and rewards, a conversation does not, thus, human feedback becomes essential.

This was done by prompting a model,

sampling several responses and then letting a human manually rank the responses.

These rankings will then become training data for a reward

model.

Finally, a fine-tuned language model will be further trained using reinforcement learning to respond to questions so as to optimize the output of the

reward model. For more information, check out OpenAI’s blog post:

Thursday, 28 September 2023