Yeah.
"Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers"
https://arxiv.org/abs/2212.10559
@dang there's something weird about this URL in HN. It has 35 points but no discussion (I guess because the original submission is too old and never got any traction or something)
Yeah.
"Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers"
https://arxiv.org/abs/2212.10559
@dang there's something weird about this URL in HN. It has 35 points but no discussion (I guess because the original submission is too old and never got any traction or something)