机器之心 & ArXiv Weekly
参与:杜伟、楚航、罗若天
本周论文包括李德毅院士前瞻性观点论文:《认知物理学 —— 薛定谔、图灵和维纳的启示和超越》;AI 从零开始学会玩《我的世界》,DeepMind AI 通用化取得突破。
1. STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing2. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening3. Cognitive Physics - The Enlightenment by Schrödinger, Turing, Wiener and Beyond4. Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models5. Mastering Diverse Domains through World Models6. ParkPredict+: Multimodal Intent and Motion Prediction for Vehicles in Parking Lots with CNN and Transformer7. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders8. ArXiv Weekly Radiostation:NLP、CV、ML 更多精选论文(附音频)论文 1:STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing- 论文地址:https://arxiv.org/pdf/2210.11888.pdf
摘要:近期,阿里巴巴达摩院联合中国科学院深圳先进技术研究院提出面向多轮 Text-to-SQL 语义解析的 SQL 查询语句导向型预训练模型 STAR。截至目前,STAR 已连续 10 个月占据 SParC 和 CoSQL 两大榜单的第一名。论文已被自然语言处理领域国际会议 EMNLP 2022 Findings 接收。如下为一个上下文依赖的多轮 Text-to-SQL 解析例子。推荐:登顶对话式语义解析国际权威榜单 SParC 和 CoSQL,全新多轮对话表格知识预训练模型 STAR 解读。论文 2:Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening- 论文地址:https://www.nature.com/articles/s41587-022-01612-8
摘要:本文首先提出了新型多模态机器学习模型 C.Origami 来预测特定细胞类型的染色质构象,并基于遗传筛选的原理提出了全新的高通量计算遗传筛选 (in silico genetic screening, ISGS) 方法。C.Origami 模型分为三个部分,处理并压缩 DNA 及基因组信息的编码器,Transformer 中间层和输出 Hi-C 解码器。推荐:Nature 子刊 | 谭济民、夏波等提出基因组构象预测模型及高通量计算遗传筛选方法。论文 3:Cognitive Physics - The Enlightenment by Schrödinger, Turing, Wiener and Beyond- 论文地址:https://spj.science.org/doi/10.34133/icomputing.0009
摘要:2023 年 1 月 3 日,著名人工智能学家,中国工程院院士、欧亚科学院院士,中国人工智能学会名誉理事长李德毅在 Science 伙伴期刊 Intelligent Computing 发表前瞻性观点论文《认知物理学 —— 薛定谔、图灵和维纳的启示和超越》。论文回顾了 20 世纪上半叶,控制论之父维纳(1894-1964)、量子力学之父薛定谔(1887-1961)和人工智能之父图灵(1912-1954)三位杰出学者为人类留下的五篇经典之作,并受其启发,展望未来以负熵为生、可交互、会学习、自成长的智能机器,为今后机器智能的发展奠定了基础和方向。下图为可交互、会学习、自成长的机器运行流程。推荐:李德毅院士前瞻性观点论文:《认知物理学 —— 薛定谔、图灵和维纳的启示和超越》。论文 4:Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models- 论文地址:https://www.medrxiv.org/content/10.1101/2022.12.19.22283643v2
摘要:ChatGPT 自发布以来一直受到关注,被认为是当前最强大的语言模型之一。它的文本生成能力已经不输人类,甚至有机器学习顶会为此明令禁止研究者使用 ChatGPT 编写论文。但是近期有一篇论文居然在作者一栏明确署名 ChatGPT,这是怎么回事?这篇论文是发表在医学研究论文平台 medRxiv 上的《Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models》,ChatGPT 是论文的第三作者。推荐:一位论文作者火了,ChatGPT 等大型语言模型何时能成为论文合著者?论文 5:Mastering Diverse Domains through World Models- 论文地址:https://arxiv.org/abs/2301.04104v1
摘要:通用智能需要解决多个领域的任务。人们认为强化学习算法具有这种潜力,但它一直受到为新任务调整所需资源和知识的阻碍。在 DeepMind 的一项新研究中,研究人员展示了基于世界模型的通用可扩展的算法 DreamerV3,它在具有固定超参数的广泛领域中优于以前的方法。DreamerV3 符合的领域包括连续和离散动作、视觉和低维输入、2D 和 3D 世界、不同的数据量、奖励频率和奖励等级。值得一提的是,DreamerV3 是第一个在没有人类数据或主动教育的情况下从零开始在《我的世界》(Minecraft)中收集钻石的算法。研究人员表示,这样的通用算法可以使强化学习得到广泛应用,并有望扩展到硬决策问题。推荐:AI 从零开始学会玩《我的世界》,DeepMind AI 通用化取得突破。论文 6:ParkPredict+: Multimodal Intent and Motion Prediction for Vehicles in Parking Lots with CNN and Transformer- 论文地址:https://arxiv.org/abs/2204.10777
摘要:Dragon Lake Parking (DLP) 数据集以无人机正射航拍视角,提供了大量经过标注的高清 4K 视频和轨迹数据,记录了在停车场环境内,不同类型的车辆、行人和自行车的运动及交互行为。数据集时长约 3.5 小时,采样率为 25Hz,覆盖区域面积约为 140 m x 80 m,包含约 400 个停车位,共记录了 5188 个主体。数据集提供两种格式:JSON 和原视频 + 标注,可服务的研究方向包括:大规模高精度目标识别和追踪、空闲车位检测、车辆和行人的行为和轨迹预测、模仿学习等。推荐:伯克利开源首个泊车场景下的高清数据集和预测模型,支持目标识别、轨迹预测。论文 7:ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders- 论文地址:https://arxiv.org/pdf/2301.00808v1.pdf
摘要:来自 KAIST、Meta、纽约大学的研究者(包括 ConvNeXt 一作刘壮、ResNeXt 一作谢赛宁)提出在同一框架下共同设计网络架构和掩码自编码器,这样做的目的是使基于掩码的自监督学习能够适用于 ConvNeXt 模型,并获得可与 transformer 媲美的结果。推荐:ConvNeXt V2 来了,仅用最简单的卷积架构,性能不输 Transformer。
ArXiv Weekly Radiostation机器之心联合由楚航、罗若天发起的ArXiv Weekly Radiostation,在 7 Papers 的基础上,精选本周更多重要论文,包括NLP、CV、ML领域各 10 篇精选,并提供音频形式的论文摘要简介,详情如下:1. Improving And Analyzing Neural Speaker Embeddings for ASR. (from Hermann Ney)
2. Dual Learning for Large Vocabulary On-Device ASR. (from Kyunghyun Cho)
3. Structured Case-based Reasoning for Inference-time Adaptation of Text-to-SQL parsers. (from Soumen Chakrabarti, Sunita Sarawagi)
4. NarrowBERT: Accelerating Masked Language Model Pretraining and Inference. (from Noah A. Smith)
5. Multilingual Entity and Relation Extraction from Unified to Language-specific Training. (from Jian Yang)
6. Facilitating Contrastive Learning of Discourse Relational Senses by Exploiting the Hierarchy of Sense Relations. (from Bonnie Webber)
7. A Cognitive Evaluation of Instruction Generation Agents tl;dr They Need Better Theory-of-Mind Capabilities. (from Hal Daumé III)
8. ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization. (from Yu Sun)
9. Scaling Laws for Generative Mixed-Modal Language Models. (from Luke Zettlemoyer)
10. A Multi-Modal Geographic Pre-Training Method. (from Xin Li)
1. Dynamic Grained Encoder for Vision Transformers. (from Jian Sun, Nanning Zheng)
2. Accidental Light Probes. (from Richard Szeliski, Noah Snavely)
3. CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder. (from Liang Chen)
4. In Defense of Structural Symbolic Representation for Video Event-Relation Prediction. (from Shih-Fu Chang)
5. Benchmarking Robustness in Neural Radiance Fields. (from Alan Yuille)
6. Edge Preserving Implicit Surface Representation of Point Clouds. (from Xiaogang Wang, Liang Wang)
7. Domain Expansion of Image Generators. (from Jun-Yan Zhu, Daniel Cohen-Or, Eli Shechtman)
8. Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images. (from Deva Ramanan)
9. TarViS: A Unified Approach for Target-based Video Segmentation. (from Deva Ramanan, Bastian Leibe)
10. Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network. (from Yang Gao)
1. Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning. (from Bernhard Schölkopf)2. Online Hyperparameter Optimization for Class-Incremental Learning. (from Bernt Schiele)3. Does compressing activations help model parallel training?. (from Eric P. Xing)4. ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data. (from Jian Wu)5. Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching. (from Honglak Lee)6. A Dietary Nutrition-aided Healthcare Platform via Effective Food Recognition on a Localized Singaporean Food Dataset. (from Beng Chin Ooi)7. Federated Learning and Blockchain-enabled Fog-IoT Platform for Wearables in Predictive Healthcare. (from Konstantinos Plataniotis)8. Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior. (from Sheila A. McIlraith)9. Architect, Regularize and Replay (ARR): a Flexible Hybrid Approach for Continual Learning. (from Davide Maltoni)10. Adversarial training with informed data selection. (from Pascal Frossard)© THE END
转载请联系本公众号获得授权
投稿或寻求报道:[email protected]