Conferences could consider allowing researchers to present in their native language, using a translator, and could publish abstracts in multiple languages. “Non-native English speakers constitute almost 95% of the world’s population,” Amano says. “If we don’t support those 95%, I’m sure we can’t solve many global challenges.”
GPT-4 passed the Turing test the imitation game\'Can machines th
GPT-4 passed the Turing test
GPT-4 aced most of them,including reading comprehension, mathematics and coding,OpenAI reported4.
the Turing test the imitation game"Can machines think’human judges, to evaluate performance on specific capabilities, such as language ability, common-sense reasoning and mathematical capacity. Increasingly, teams are also turning to academic and professional examinations designed for people.
It’s the kind of game that researchers familiar with LLMs could probably still win, however. Chollet says he’d find it easy to detect an LLM — by taking advantage of known weaknesses of the systems. “If you put me in a situation where you asked me, ‘Am I chatting to an LLM right now?’ I would definitely be able to tell you,” says Chollet.
The key, he says, is to take the LLM outside of its comfort zone. He suggests presenting it with scenarios that are variations on ones the LLM will have seen a lot in its training data. In many cases, the LLM answers by spitting out words that are most likely to be associated with the original question in its training data, rather than by giving the correct answer to the new scenario.
**
The company also set GPT-4 around 30 exams, including: various subject-specific tests designed for US high-school students, known as Advanced Placement; an exam to assess the current state of US physicians’ clinical knowledge; and a standard test used in the selection process for US graduate studies, called the GRE. In the Uniform Bar Exam, which forms part of the qualification process for lawyers in many US states, GPT-4 attained a score that would place it in the top 10% of people, OpenAI reported (see ‘AI system performance — selected results’).
The world’s best artificial intelligence (AI) systems can pass tough exams, write convincingly human essays and chat so fluently that many find their output indistinguishable from people’s. What can’t they do? Solve simple visual logic puzzles.
In a test consisting of a series of brightly coloured blocks arranged on a screen, most people can spot the connecting patterns. But GPT-4, the most advanced version of the AI system behind the chatbot ChatGPT and the search engine Bing, gets barely one-third of the puzzles right in one category of patterns and as little as 3% correct in another, according to a report by researchers this May1.
https://www.nature.com/articles/d41586-023-02361-7?
- NEWS FEATURE