A fascinating deep-dive from New York Magazine and The Verge into how AI models – including ChatGPT, Bard, and other LLMs (Large Language Models) – are trained, and the industry that provides the data, control, and feedback:
Inside the AI factory
[…] ChatGPT seems so human because it was trained by an AI that was mimicking humans who were rating an AI that was mimicking humans who were pretending to be a better version of an AI that was trained on human writing.
This circuitous technique is called “reinforcement learning from human feedback,” or RLHF, and it’s so effective that it’s worth pausing to fully register what it doesn’t do. When annotators teach a model to be accurate, for example, the model isn’t learning to check answers against logic or external sources or about what accuracy as a concept even is. The model is still a text-prediction machine mimicking patterns in human writing, but now its training corpus has been supplemented with bespoke examples, and the model has been weighted to favor them. Maybe this results in the model extracting patterns from the part of its linguistic map labeled as accurate and producing text that happens to align with the truth, but it can also result in it mimicking the confident style and expert jargon of the accurate text while writing things that are totally wrong. There is no guarantee that the text the labelers marked as accurate is in fact accurate, and when it is, there is no guarantee that the model learns the right patterns from it.
Makes sense, and it’s so very important to recognize this! We, humans, like to anthropomorphize things, and doubly so for LLMs since they can come across so… genuinely human.
Generating training data for AIs though appears… distinctly not-so-fun, mainly because it is so alien. Indeed, you are training an excessively dumb machine:
It is in part a product of the way machine-learning systems learn. Where a human would get the concept of “shirt” with a few examples, machine-learning programs need thousands, and they need to be categorized with perfect consistency yet varied enough (polo shirts, shirts being worn outdoors, shirts hanging on a rack) that the very literal system can handle the diversity of the real world. […] Instruction writers must come up with rules that will get humans to categorize the world with perfect consistency. To do so, they often create categories no human would use. A human asked to tag all the shirts in a photo probably wouldn’t tag the reflection of a shirt in a mirror because they would know it is a reflection and not real. But to the AI, which has no understanding of the world, it’s all just pixels and the two are perfectly identical. Fed a dataset with some shirts labeled and other (reflected) shirts unlabeled, the model won’t work. So the engineer goes back to the vendor with an update: DO label reflections of shirts. Soon, you have a 43-page guide descending into red all-caps.
I fear though that before this era is over, most of us will have a “job” like this one, if indeed any at all.