ChatGPT精通万国语言 | 经济学人文化
1
01 第十期外刊精读课
想要读懂更多外刊,
尽在外刊精读课
从字词-逻辑结构-背景-专业性答疑,
从预习-精读-泛读,全方位训练英语思维,
带你转外刊!两期连报,价格更低哦!
点击下图,即可了解精读课详情!
2
Culture | Johnson
文化 | 约翰逊专栏
Culture | Johnson
文化 | 约翰逊专栏
ChatGPT is a marvel of multilingualism
ChatGPT精通万国语言
It may make things up, but it does so fluently in more than 50 languages
即便ChatGPT可能无中生有,但用五十余种语言畅所欲言不在话下
The hype that followed ChatGPT’s public launch last year was, even by the standards of tech innovations, extreme. OpenAI’s natural-language system creates recipes, writes computer code and parodies literary styles. Its latest iteration can even describe photographs. It has been hailed as a technological breakthrough on a par with the printing press. But it has not taken long for huge flaws to emerge, too. It sometimes “hallucinates” non-facts that it pronounces with perfect confidence, insisting on those falsehoods when queried. It also fails basic logic tests.
去年,ChatGPT横空出世,即便以技术革新的标准衡量,其炒作宣传也不可不谓异乎寻常。OpenAI的自然语言系统能够编食谱、写代码,还能够戏仿各种文体写作。其最新版本甚至可以描述照片。人们称赞此次科技突破可以与印刷机问世媲美。然而没过多久,ChatGPT的各种巨大缺陷便暴露无遗。有时,ChatGPT会“产生幻觉”,凭空捏造一些事实,信心十足地告知用户,并在受到质疑时仍旧坚持这些错误认知。此外,它也未能通过基本逻辑测试。
In other words, ChatGPT is not a general artificial intelligence, an independent thinking machine. It is, in the jargon, a large language model. That means it is very good at predicting what kinds of words tend to follow which others, after being trained on a huge body of text—its developer, OpenAI, does not say exactly from where—and spotting patterns.
换言之,ChatGPT不算是一种通常意义上的人工智能,也就是可以独立思考的机器。用行话说,ChatGPT其实就是一种大型语言模型。这意味着,在接受了大量的文本训练(其开发商OpenAI未具体说明来源)以及辨识各种模式后,它善于预测对方说话后该如何接茬。
Amid the hype, it is easy to forget a minor miracle. ChatGPT has aced a problem that long served as a far-off dream for engineers: generating human-like language. Unlike earlier versions of the system, it can go on doing so for paragraphs on end without descending into incoherence. And this achievement’s dimensions are even greater than they seem at first glance. ChatGPT is not only able to generate remarkably realistic English. It is also able to instantly blurt out text in more than 50 languages—the precise number is apparently unknown to the system itself.
在这种大肆炒作下,人们很容易忽略掉一个小小的奇迹。ChatGPT解决了一个长期困扰着工程师们的问题,他们梦寐以求寻找着答案:如何生成类人语言。有别于先前的版本,当前的版本可以做到语句通顺、段落连贯。其取得的成就,初观略领一二,细品方显深度。ChatGPT不仅能够生成令人称奇的逼真英语文本,还能用五十多种语言以文本方式即时响应用户。至于到底有多少种语言,似乎系统本身也不知晓。
Asked (in Spanish) how many languages it can speak, ChatGPT replies, vaguely, “more than 50”, explaining that its ability to produce text will depend on how much training data is available for any given language. Then, asked a question in an unannounced switch to Portuguese, it offers up a sketch of your columnist’s biography in that language. Most of it was correct, but it had him studying the wrong subject at the wrong university. The language itself was impeccable.
用西班牙语询问它能说多少种语言时,ChatGPT含糊地回答道“50多种”,并解释说其生成文本的能力取决于某个特定语言的训练数据量。紧接着冷不丁又用葡萄牙语问了它一个其他问题,它便用葡萄牙语给出了笔者的一份生平概述。其中大部分内容正确无误,但也有不实信息,比如笔者所上的大学和所学的专业。不过语言本身倒是无可挑剔。
Portuguese is one of the world’s biggest languages. Trying out a smaller language, your columnist probed ChatGPT in Danish, spoken by only about 5.5m people. Danes do much of their online writing in English, so the training data for Danish must be orders of magnitude scarcer than what is available for English, Spanish or Portuguese. ChatGPT’s answers were factually askew but expressed in almost perfect Danish. (A tiny gender-agreement error was the only mistake caught in any of the languages tested.)
葡萄牙语是世界上使用最广泛的语言之一。所以笔者又尝试用更小众的丹麦语(只有大约550万人说丹麦语)来考验ChatGPT。丹麦人在网上大多使用英语,所以相比英语、西班牙语或葡萄牙语的训练数据,丹麦语的数据量级一定小得多。ChatGPT给出的答案虽有事实错误,但它对丹麦语的掌握近乎完美。(在所有语言测试中,只在性别一致上犯了个小小的语法错误。)
Indeed, ChatGPT is too modest about its own abilities. On request, it furnishes a list of 51 languages it can work in, including Esperanto, Kannada and Zulu. It declines to say that it can “speak” these languages, but rather “generates text” in them. This is too humble an answer. Addressed in Catalan—a language not on the list—it replies in that language with a cheerful “Yes, I do speak Catalan—what can I help you with?” A few follow-up questions do not trip it up in the slightest, including a query about whether it is merely translating answers first generated in another language into Catalan. This, ChatGPT denies: “I don’t translate from any other language; I look in my database for the best words and phrases to answer your questions.”
事实上,ChatGPT对自己的能力过于谦虚了。根据要求,ChatGPT提供了一份清单,涵盖了其掌握的51种语言,包括世界语、卡纳达语和祖鲁语。它推辞说自己“说不了”这些语言,只能用这些语言“生成文本”。这样的回答未免太过自谦了。当用不在这份清单上的加泰罗尼亚语与ChatGPT对话,它回答道:“是的,我确实会说加泰罗尼亚语——有什么我能帮你的?”后续几个问题都丝毫没有难倒它,包括质疑它是否只是先用另一种语言生成回答,然后再翻译成加泰罗尼亚语。对此,ChatGPT予以否认:“我并未进行任何语言翻译;我是在数据库中为你寻找最匹配的答案”。
Who knows if this is true? ChatGPT not only makes things up, but incorrectly answers questions about the very conversation it is having. (It has no “memory”, but rather feeds the last few thousand words of each conversation back into itself as a new prompt. If you have been speaking English for a while it will “forget” that you asked a question in Danish earlier and say that the question was asked in English.) ChatGPT is untrustworthy not just about the world, but even about itself.
是真是假,孰能分辨?ChatGPT不仅会捏造事实,而且即便对于它当前进行中的对话相关的问题也会出现答错现象。(它没有“记忆”,而是将每次对话的最后几千单词作为新提示语进行自我反馈。如果你用英语与其讲了一段时间,它甚至会忘记之前你是用丹麦语问了一个问题,并且会说这个问题是用英语提问的。)要想了解世界,ChatGPT并不可靠,。甚至想了解与其自身相关的问题也都靠不住。
This should not overshadow the achievement of a model that can effortlessly mimic so many languages, including those with limited training data. Speakers of smaller languages have worried for years about language technologies passing them by. Their justifiable concern had two causes: the lesser incentive for companies to develop products in Icelandic or Maltese, and the relative lack of data to train them.
不过瑕不掩瑜——ChatGPT可以毫不费力地模仿很多种语言,就连那些训练数据有限的语言也不在话下。多年来,小语种使用者一直担忧语言技术会忽略他们。有两方面的原因让他们有理由有这种担忧:众多企业没啥动力来开发比如冰岛语或马耳他语等小语种产品,以及训练ChatGPT的数据相对较少。
Somehow the developers of ChatGPT seem to have overcome such problems. It is too early to say what good the technology will do, but this alone gives one reason to be optimistic. As machine-learning techniques improve, they may not require the vast resources, in programming time or data, traditionally thought necessary to make sure smaller languages are not overlooked online.
ChatGPT开发者似乎以某种方式克服了这些问题。现在说这项技术能带来什么好处还为时过早,但仅凭这一点就足以让我们保持乐观。随着机器学习技术的发展,它们可能不再需要大量的编程时间或海量数据资源(而这在传统意义上一直被视为确保小语种不受忽视的必备条件)。
Jack Jan,实践出真知
Trista,女,虽然我是无业游民,但是我并没有骄傲
琚儿,女,QE在职,梦想能仗翻译/音乐/健康走天涯
Francis,橘子不是唯一的水果
YY,愿逆风如解意,容易莫摧残
Dossver,男,是错永不对真永是真
3
4
01 第十期外刊精读课
想要读懂更多外刊,
尽在外刊精读课
从字词-逻辑结构-背景-专业性答疑,
从预习-精读-泛读,全方位训练英语思维,
带你转外刊!两期连报,价格更低哦!
点击下图,即可了解精读课详情!
02 早起打卡营
微信扫码关注该文公众号作者