As far as I know, that is mainly used where a better, bigger model generates training data for a more efficient smaller model to bring it a bit closer to its level.
Were there any cases of an already state of the art model using this method to improve itself?
As far as I know, that is mainly used where a better, bigger model generates training data for a more efficient smaller model to bring it a bit closer to its level.
Were there any cases of an already state of the art model using this method to improve itself?
I will search for the paper.
EDIT: can’t find it, dang.