343rwerfd 3 hours ago | next |

The hidden chain-of-though inside the process, from the official statement about it, I infer / suspect that it uses an unhobbled mode of the model, puts it in this special mode where it can use the whole training, avoiding the intrisic bias towards the aligned outcomes.

I think that, to put it in simple terms, "the sum of the good and the bad" is the secret sauce here, pumping the "IQ" of the model (every output in the hidden chain), to levels apparently a lot better than they could probably reach with just aligned hidden internal outputs.

Another way of looking at the "sum of good and bad" stuff, is that the model would have a potentially way bigger set of choices (probability space?), to look into for every given prompt.