回复@思山睿雨: 统一回答， Google 的论文已经证明Transformer（GPT等）无泛化能力：$比亚迪(SZ0...

回复@思山睿雨: 统一回答，Google的论文已经证明Transformer（GPT等）无泛化能力：$比亚迪(SZ002594)$ $特斯拉(TSLA)$ $小鹏汽车(XPEV)$
量子位报道标题：谷歌大模型研究陷重大争议：训练数据之外完全无法泛化？链接：网页链接。论文地址：网页链接
摘要：Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this work, we study how effectively transformers can bridge between their pretraining data mixture, comprised of multiple distinct task families, to identify and learn new tasks in-context which are both inside and outside the pretraining distribution. Building on previous work, we investigate this question in a controlled setting, where we study transformer models trained on sequences of (x, f(x)) pairs rather than natural language. Our empirical results show transformers demonstrate near-optimal unsupervised model selection capabilities, in their ability to first in-context identify different task families and in-context learn within them when the task families are well-represented in their pretraining data. However when presented with tasks or functions which are out-of-domain of their pretraining data, we demonstrate various failure modes of transformers and degradation of their generalization for even simple extrapolation tasks. Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities.
查看图片//@思山睿雨:回复@andyding:驾驶决策权交由神经网络系统，觉得核心应该在系统的泛化能力；如果无法实现，只能通过事件数据的堆砌，终归难以实现可靠。两三个事件可能就会毁掉一个品牌。

作者：andyding

引用：