Study Note 1: Sythetic Data
What is the use of sythetic data?
We often find there is limited amount of data in the real world, so sythethic data is often used to train models.
How to build sythetic data?
If requires to build reward model.
Reward model includes 3 parts, the first part is to have human to label to data. For example, xxx. The second part is to train the reward model with these data that output the data that is close to the training data set.
But the challenge for collecting human labelled data is that there is often a bottle neck on the lack of experts. So the way to do it is to have AI to simulate the expert and label the data. the process is called RLAIF (Reinforcement Learning from AI Feedback) . It means using a very advanced model to simulate the experts to train the more basic model by using very clear prompts (rubric) . for example, based on the following xxx dimensions, rate the answer from 0-10 etc.
But the challenge of using ai to rate ai performance will lead to model collapes or homogenization. real world is messy and people have biases. when using ai model, they tend to prefer structured answers and tend to reach to mathemeatical average.
So the answer to this is how to make sure we can reach a balance between ai labeller and human labeller, reaching scale and also accurarcy?
The answer is to build agentic simulators (or using Persona-Conditioned RLAIF).