Redian新闻
>
Can LLMs trained on A is B infer automatically that B is A?

Can LLMs trained on A is B infer automatically that B is A?

博客

Can LLMs trained on A is B infer automatically that B is A?

 

No, LLMs (Language Model Models) cannot automatically infer that B is A if they have been trained on the statement “A is B”. The training data provides information about the relationship between A and B, but it does not necessarily imply that the reverse relationship holds true. LLMs are based on statistical patterns and correlations in the data, and their predictions are limited to the information they have been trained on.

** 

Elegant and powerful new result that seriously undermines large language models

 

 

Wowed by a new paper I just read and wish I had thought to write myself.

Lukas Berglund and others, led by Owain Evans, asked a simple, powerful, elegant question: can LLMs trained on A is B infer automatically that B is A?

The shocking (yet, in historical context, see below, unsurprising) answer is no:

On made-up facts, in a first experiment, the model was at zero percent correct, on celebrities, in a second experiment, performance was still dismal.

Can we really say we are close to AGI, when the training set must contain billions of examples of symmetrical relationships, many closely related to these, and the system still stumbles on such an elementary relationship?

Here’s the paper; well-worth reading:

 

§

What the otherwise fabulous paper failed to note in its initial version is that the history on this one is really, really deep.

To begin with, this kind of failure actually goes back to my own 2001 book Algebraic Mind, which focused extensively on the failure of earlier multilayer neural networks to freely generalize universal relationships, and which gave principled reasons to anticipate such failures from these architectures. None of what I raised then has really been adequately addressed in the intervening decades. The core problem, as I pointed out then, is that in many real world problems, you can never fully cover the space of possible examples, and in a broad class of heavily-data-driven systems like LLMs that lack explicit variables and operations over variables, you are out of luck when you try to extrapolate beyond that space of training examples. Was true then, still true now.

But what’s really mind-blowing here is not just that the paper vindicates a lot of what I have been saying, but the specific example was literally at the center of one of the first modern critiques of neural networks, even earlier: Fodor and Pylyshyn, 1988, published in Cognition. Much of Fodor and Pylyshyn’s critique hovers around the systematicity of thought, with this passage I paste in below (and several others) directly anticipating the new paper. If you really understand the world, you should be able to understand a in relation to b, and b in relation to a; we expect even nonverbal cognitive creatures to be able to do that:

Forty one years later, neural networks (at least of the popular variety) still struggle with this. They still remain pointillistic masses of blurry memory, never as systematic as reasoning machines ought to be.

§

What I mean by pointillistic is that what they answer very much depends on the precise details of what is asked and on what happens to be in the training set. In a DM, Evans gave me this illuminating comparison. GPT-4 tends to gets questions like this correct, as noted in the paper

As Evans summarized, models that have memorized "Tom Cruise's parent is Mary Lee Pfeiffer" in training, fail to generalize to the question "Who is Mary Lee Pfeiffer the parent of?". But if the memorized fact is included in the prompt, models succeed.

It’s nice that it can get the latter, matching a template, but problematic that they can’t take an abstraction that they superficially get in one context and generalize it another; you shouldn’t have to ask it that way to get the answer you need.

1

§

My sense of déjà vu shot through the roof when I wrote to Evans to congratulate him on the result, saying I would write it up here in this Suvtack. Evans wrote “Great. I'm excited to get more eyes on this result. Some people were very surprised and thought that models couldn't have such a basic limitation.”

What struck me about people’s refusal to believe his result is that in 1998 I had a very closely-related result and very similar reaction. Neural networks of the day had a great deal of trouble generalizing identity. But I couldn’t get anyone to listen. Most people simply didn’t believe me; almost none appreciated the significance of the result. One researcher (in a peer review) accused me of a “terrorist attack on connectionism [neural networks]”; it was two decades before the central point of my result – distribution shift - became widely recognized as a central problem.

The will to believe in neural networks is frequently so strong that counterevidence is often dismissed or ignored, for much too long. I hope that won’t happen on this one.

§

In math, when one make a conjecture, a simple counterexample suffices. If I say all odd numbers are prime, 1, 3, 5, and 7 may count in my favor, but at 9 the game is over.

In neural network discussion, people are often impressed by successes, and pay far too little regard to what failures are trying to tell them. This symmetry fail is mighty big, a mighty persistent error that has endured for decades. It’s such a clear, sharp failure in reasoning it tempts me to simply stop thinking and writing about large language models altogether. If, after training on virtually the entire internet, you know Tom is Mary Lee‘s son, but can’t figure out without special prompting that Mary Lee therefore is Tom’s mother, you have no business running all the world’s software.

It’s just a matter before people start to realize that we need some genuinely new ideas in the field, either new mechanisms (perhaps neurosymbolic

2
), or different approaches altogether.

Share

Gary Marcus’s most important work remains his 2001 book, The Algebraic Mind, which anticipates current issues with hallucination, distribution shift, generalization, factuality and compositionality, all still central to the field.

 

 

 

 

所有跟帖: 

戳这里 Claim your page
来源: 文学城-TJKCB
相关阅读
活在两种教育的夹缝中俄罗斯一票否决了乌克兰要求废除俄罗斯一票否决权的建议年年夏至TUM、LMU食堂纷纷罢工!泼天的富贵这次轮到Döner店了?Translations of preocupación包工头和他的女人缘Extreme Drinking Claims Another Chinese Livestreamer【2023坛庆】“Can't Help Falling in Love" by 蜀风雅韵/老乔治。 祝大家坛庆快乐!恭喜盈盈上任 from you, what can I borrow说不完的“印象派”绘画nǚ hóng?nǚ gōng拥有自己机场的大学— 伊利诺伊大学 -厄巴纳香槟(UIUC)APAD:You can’t teach an old dog new tricksLion Queen 云儿正出非洲(can U feel love 今夜)‘Lying Flat’ Wolf Fed Well by Tourists, But Concerns Raised2011 Camry LE. Can’t start. 请教,车子的问题,还是电池的问题.New Film Offers a Striking Portrait of Beijing’s Migrant Kids大美和二美(三) - 二美出国大美和二美(二) - 这么丑的孩子一下生俩跟自己干杯Cities on High Alert as Historic Rainfall Batters South China故乡轶事(四)他的中美台旷世不了情红色日记 12.21-31关键时,儿子挺身而出my creative outlets complemented my science rather than detracti究竟应该说someone faces problems还是problems face someone?InterstellarLight and shadowGPT-4,Llama2,ChatGLM2,PaLM2共聚一堂 | LLM Day @KDD 2023Weibo Rolls Out Community Notes to Combat Misinformation大美和二美(一) - 投胎之前GPT-4、ChatGLM2、Llama2、PaLM2在KDD LLM Day上一起开了个会更新,2011 Camry LE. Can’t start. 请教,车子的问题,还是电池的问题.Gen X (1960-1979) Y (1980s - 1995) Z (1996 - early 2010s)再更新,2011 Camry LE. Can’t start. 请教,车子的问题,还是电池的问题.Nai Lert 文化遗产屋宾州长木公园,品位独处Can Bond Markets predict a World War?原子弹的前世今生(续)——曼哈顿计划的故事,Now It Can Be Told大美和二美(四) - 大姨的经营之道
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。