What does it mean to "train a model" when talking about generative AI like ChatGPT?
Answer
An AI model is trained by giving it lots of data, terabytes of data, millions of books of words. It starts playing fill-in-the-blank with sentences from this dataset, and developing probabilities about which words are most likely to go together. In math-ier terms, the training assigns coordinates (think x,y axes from geometry) to words and plots them on a graph, and then tries to make sentences out of them like you'd try to draw a line between two points. Another part of its programming looks at the line created and then goes back and readjusts the coordinates to account for feedback from other lines and from human programmers who give it a thumbs up or thumbs down. Except it doesn't just use x and y, it develops and then employs thousands of axes (don't think about this too hard, human brains usually tap out around 5). When the results reach the desired consistency of its developers, the coordinates are set and the model is released. Read more about training with the following articles, ordered from least to more technical.