For the same reason that anyone's reasoning process and answers to random exam questions are never used as textbooks: if the reasoning is not guaranteed to be right, why would you want to make that training material?
We can empirically figure out how often the reasoning model is correct. With a 95% empirical accuracy, it should still help the model directionally. No training data set needs to be 100% accurate. No?
For the same reason that anyone's reasoning process and answers to random exam questions are never used as textbooks: if the reasoning is not guaranteed to be right, why would you want to make that training material?
We can empirically figure out how often the reasoning model is correct. With a 95% empirical accuracy, it should still help the model directionally. No training data set needs to be 100% accurate. No?