Redian新闻
>
百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

公众号新闻

 © 作者|王晓磊 

  机构|中国人民大学  

 方向 | 对话式信息获取  

来自 | RUC AI Box  


进NLP群—>加入NLP交流群(备注nips/emnlp/nlpcc进入对应投稿群)


本文整理了2022年以来发表在顶级会议上的大语言模型相关论文。

导读

去年底,OpenAI 推出的 ChatGPT 在短短数月内已经风靡全球。这个基于 GPT-3.5 的大型语言模型,具备惊人的自然语言生成和理解能力,可以像人类一样进行对话、翻译、摘要等任务。由于其优秀的表现,ChatGPT 及其背后的大型语言模型迅速成为人工智能领域的热门话题,吸引了广大科研人员和开发者的关注和参与。

本文整理了 2022 年在各大顶会(ACL、EMNLP、ICLR、ICML、NeurIPS等)发表的和大型语言模型相关的论文,共计 100 篇。论文列表已经同步更新到 Github仓库(https://github.com/RUCAIBox/Top-conference-paper-list),欢迎大家关注和 Star。

Catalog(目录)

  • Training【训练】
    • Pre-Training【预训练】
    • Instruction Tuning【指令微调】
  • Utilization【使用】
    • In-Context Learning【上下文学习】
    • Chain-of-Thought Prompting【思维链提示】
    • Compression【压缩】
    • Others【其他】
  • Application【应用】
    • Multi-Modal【多模态】
    • Code【代码】
    • Retrieval【检索】
    • Text Generation【文本生成】
    • Others【其他】
  • Analysis & Evaluation【分析与评测】

Training【训练】

Pre-Training【预训练】

  • UL2: Unifying Language Learning Paradigms
  • Learning to Grow Pretrained Models for Efficient Transformer Training
  • Efficient Large Scale Language Modeling with Mixtures of Experts
  • Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
  • CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
  • InCoder: A Generative Model for Code Infilling and Synthesis
  • CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
  • CodeRetriever: A Large Scale Contrastive Pre-Training Method for Code Search
  • UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining
  • GLM-130B: An Open Bilingual Pre-trained Model
  • When FLUE Meets FLANG: Benchmarks and Large Pretrained Language Model for Financial Domain

Instruction Tuning【指令微调】

  • What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment
  • InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
  • Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization
  • Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
  • Boosting Natural Language Generation from Instructions with Meta-Learning
  • Help me write a Poem - Instruction Tuning as a Vehicle for Collaborative Poetry Writing
  • Multitask Instruction-based Prompting for Fallacy Recognition
  • Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization
  • HypeR: Multitask Hyper-Prompted Training Enables Large-Scale Retrieval Generalization

Utilization【使用】

In-Context Learning【上下文学习】

  • What learning algorithm is in-context learning? Investigations with linear models
  • Ask Me Anything: A simple strategy for prompting language models
  • Large Language Models are Human-Level Prompt Engineers
  • Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
  • kNN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference
  • Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
  • Selective Annotation Makes Language Models Better Few-Shot Learners
  • Active Example Selection for In-Context Learning
  • Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
  • In-Context Learning for Few-Shot Dialogue State Tracking
  • Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts
  • ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback
  • Controllable Dialogue Simulation with In-context Learning
  • Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again
  • XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing
  • On the Compositional Generalization Gap of In-Context Learning
  • Towards In-Context Non-Expert Evaluation of Reflection Generation for Counselling Conversations
  • Towards Few-Shot Identification of Morality Frames using In-Context Learning

Chain-of-Thought Prompting【思维链提示】

  • ReAct: Synergizing Reasoning and Acting in Language Models
  • Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
  • Neuro-Symbolic Procedural Planning with Commonsense Prompting
  • Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
  • PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales
  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks
  • Complexity-Based Prompting for Multi-step Reasoning
  • Automatic Chain of Thought Prompting in Large Language Models
  • Compositional Semantic Parsing with Large Language Models
  • Self-Consistency Improves Chain of Thought Reasoning in Language Models
  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
  • Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
  • Iteratively Prompt Pre-trained Language Models for Chain of Thought
  • ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
  • Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

Compression【压缩】

  • Understanding and Improving Knowledge Distillation for Quantization Aware Training of Large Transformer Encoders
  • The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
  • AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models

Others【其他】

  • BBTv2: Towards a Gradient-Free Future with Large Language Models
  • Compositional Task Representations for Large Language Models
  • Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Application【应用】

Multi-Modal【多模态】

  • Visual Classification via Description from Large Language Models
  • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
  • Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

Code【代码】

  • DocPrompting: Generating Code by Retrieving the Docs
  • Planning with Large Language Models for Code Generation
  • CodeT: Code Generation with Generated Tests
  • Language Models Can Teach Themselves to Program Better

Retrieval【检索】

  • Promptagator: Few-shot Dense Retrieval From 8 Examples
  • Recitation-Augmented Language Models
  • Generate rather than Retrieve: Large Language Models are Strong Context Generators
  • QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation

Text Generation【文本生成】

  • Generating Sequences by Learning to Self-Correct
  • RankGen: Improving Text Generation with Large Ranking Models
  • Eliciting Knowledge from Large Pre-Trained Models for Unsupervised Knowledge-Grounded Conversation

Others【其他】

  • Systematic Rectification of Language Models via Dead-end Analysis
  • Reward Design with Language Models
  • Bidirectional Language Models Are Also Few-shot Learners
  • Composing Ensembles of Pre-trained Models via Iterative Consensus
  • Binding Language Models in Symbolic Languages
  • Mind's Eye: Grounded Language Model Reasoning through Simulation

Analysis & Evaluation【分析与评测】

  • WikiWhy: Answering and Explaining Cause-and-Effect Questions
  • ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
  • Quantifying Memorization Across Neural Language Models
  • Mass-Editing Memory in a Transformer
  • Multi-lingual Evaluation of Code Generation Models
  • STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK
  • Leveraging Large Language Models for Multiple Choice Question Answering
  • Broken Neural Scaling Laws
  • Language models are multilingual chain-of-thought reasoners
  • Language Models are Realistic Tabular Data Generators
  • Task Ambiguity in Humans and Language Models
  • Discovering Latent Knowledge in Language Models Without Supervision
  • Prompting GPT-3 To Be Reliable
  • Large language models are few-shot clinical information extractors
  • How Large Language Models are Transforming Machine-Paraphrase Plagiarism
  • Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs
  • SLING: Sino Linguistic Evaluation of Large Language Models
  • A Systematic Investigation of Commonsense Knowledge in Large Language Models
  • Lexical Generalization Improves with Larger Models and Longer Training
  • What do Large Language Models Learn beyond Language?
  • Probing for Understanding of English Verb Classes and Alternations in Large Pre-trained Language Models



进NLP群—>加入NLP交流群(备注nips/emnlp/nlpcc进入对应投稿群)

持续发布自然语言处理NLP每日优质论文解读、相关一手资料、AI算法岗位等最新信息。

加入星球,你将获得:

1. 每日更新3-5篇最新最优质的的论文速读

2. 最新入门和进阶学习资料

4. 每日1-3个NLP、搜广推、CV等AI岗位招聘信息



微信扫码关注该文公众号作者

戳这里提交新闻线索和高质量文章给我们。
相关阅读
ACL 2023长文 | 先计划再求解:提升大型语言模型的零样本链式推理褪黑素能预防睡眠不足导致的记忆缺陷;大型语言模型综述全新出炉:从T5到GPT-4最全盘点|本周论文推荐谷歌出品:基于大型语言模型的语义解析方法中文医学大模型“本草”(原名华驼):医学知识增强在中文大型语言模型指令微调上的初步探索大型语言模型也能跑在浏览器上了!又一ChatGPT平替诞生,训练成本8.5万美元InfoQ 2023 年趋势报告:影响组织文化的两个最大的因素是大裁员和 ChatGPT 等大型语言模型周五云讲堂 | 王子涵:大型语言模型及其应用Pienso为企业提供由云上IPU支持的高效大型语言模型访问从LLM到MLLM,多模态大规模语言模型KOSMOS-1赋予了语言模型看见世界的能力北京内推 | 阿里达摩院招聘大型语言模型(LLM)应用方向实习生NLP大规模语言模型推理实战:大语言模型BLOOM推理工具测试实践与效果分析实录有证据了,MIT表明:大型语言模型≠随机鹦鹉,确实能学到语义​一文速览CVPR 2023掩码图像建模领域最新研究进展做光做盐Vaccines | HPV作用机制与最新研究进展大型语言模型的推理演算卷起来!Dr. LLaMA:通过生成数据增强改进特定领域 QA 中的小型语言模型,重点关注医学问答任务大型语言模型技术公开课上线4讲!直播讲解ChatGPT开源平替、类GPT4低成本定制以及GPT4ToolsMeta最新语言模型LLaMA论文研读:小参数+大数据的开放、高效基础语言模型阅读笔记讲座预告 | 软件工程学院博学论坛第十五期:自然语言处理与大型语言模型的挑战与探索妻子和驴友在山顶上激战,丈夫恼火要离婚,妻子:下次不要了Stability AI宣布推出新的开源大型语言模型基于重排序的新量化方法RPTQ:实现大型语言模型的 3 比特量化GPT-4要来了!一文看尽大型语言模型的过去、现在、未来Meta发布全新大型语言模型LLaMA毛爷爷的曾孙毛东东:今年已18岁,长相高大英俊,神似青年毛爷爷表现优于 GPT-4,ChemCrow 集成 13 种化学工具,增强大型语言模型的化学性能自制彩色三馅汤圆,好看还好吃邓小平下台后写了《我的自述》AI大战升级!Meta推出先进大型语言模型,下一个ChatGPT不远了?ChatGPT 之后,下一代大型语言模型在哪里?大型语言模型综述全新出炉:从T5到GPT-4最全盘点,国内20余位研究者联合撰写大型语言模型(LLM)的潜力有多大?大型语言模型综述全新出炉!从T5到GPT-4最全盘点,国内20余位研究者联合撰写Meta 推出新的大型语言模型;比亚迪今年首次降价;AI 生成语音骗过银行验证系统 | 极客早知道
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。