Redian新闻
>
黄仁勋2024GTC演讲实录

黄仁勋2024GTC演讲实录

公众号新闻




作者:黄仁勋
来源:长江商学院EMBA(ID:CKEMBA201314)


北京时间2024年3月19日凌晨,Nvidia的联合创始人兼CEO黄仁勋发表了一场引人瞩目的演讲。作为图形处理与人工智能计算领域的领军企业,Nvidia的每一次动态都备受关注。黄仁勋的这场演讲不仅回顾了公司的辉煌历程,更展望了未来的技术趋势与行业变革。


在演讲中,黄仁勋详细阐述了GTC大会的多元化特色,强调了Nvidia从CUDA技术到加速计算、再到生成式AI的连续创新历程。他特别提到了新发布的Blackwell GPU以及DGX超级计算机在AI领域的关键作用,展现了公司在硬件研发方面的强大实力。


此外,黄仁勋还谈到了与全球科技巨头的紧密合作,以及Nvidia在数字孪生、环境模拟等前沿技术的深入探索。这些合作与探索不仅将推动Nvidia自身的持续发展,更有望为整个科技行业乃至社会带来深远的影响。


以下是黄仁勋演讲的重点内容总结:


一、GTC大会与计算机行业的转型


GTC大会是开发者和技术爱好者的盛会,聚焦前沿技术。


计算机行业正在经历根本性转型,这影响着每个行业,特别是加速计算和生成式AI的崛起。


二、生成式AI与新的软件时代


生成式AI技术催生了全新的行业和软件类型,改变了传统的软件开发和应用方式。


三、加速计算与数字孪生生态


加速计算技术已到达临界点,为各行业带来巨大影响。


英伟达构建了数字孪生生态系统,通过模拟和预测推动行业创新。


四、Blackwell GPU的突破与创新


Blackwell GPU是英伟达最新的产品,具有强大的计算能力和推理性能。


它不仅是一款芯片,更代表了英伟达在GPU技术上的根本性创新,包括更大的规模、更高的速度和新的功能。


五、数字化与生成AI的革命性应用


数字化与生成AI在工业、气候、药物发现等领域展现出巨大的应用价值和潜力。


通过数字孪生技术,企业可以实现更高效的生产、更准确的预测和更低的成本。


六、AI代工厂与软件操作的新方式


英伟达致力于打造一个AI代工厂,为开发者提供全新的软件接收和操作方式。


通过Nim和Nemo微服务,开发者可以更加灵活地构建、部署和管理AI应用。


七、AI机器人的未来与协同工作


AI机器人在智能、协同和集成方面取得突破性进展,与Omniverse平台的结合将推动更多创新。


未来的AI机器人将在各个领域发挥重要作用,与人类紧密合作,共同推动社会进步。


八、五大关键词总结与展望


计算革命、Blackwell、新软件、AI代工厂和AI机器人


它们代表了英伟达在技术创新和行业领导地位上的重要成就,也预示着未来技术发展的方向和趋势。


以下为黄仁勋演讲全文精编,后附英文版


01

计算机行业的转型


欢迎来到GTC。我希望你意识到这不是一场音乐会。


你已经到了一个开发者大会——将会有很多科学描述,算法,计算机架构,数学……


我感觉到房间里有一股很重的重量;差点说你来错地方了。


世界上没有哪个会议有如此多的来自不同科学领域的研究人员,从气候技术到无线电科学,试图找出如何将AI用于机器人,控制下一代6G无线电、机器人自动驾驶汽车甚至人工智能的MIMOS。


我突然感觉到一种如释重负,这次会议由一些了不起的公司代表参与。这个名单,这不是与会者,这些是演讲者。


令人惊奇的是,如果你把我所有的朋友、亲密的朋友都拿走——迈克尔·戴尔就坐在IT行业——如果你拿走这个名单,这就是令人惊奇的地方。



这些是非IT行业的演讲者使用加速计算来解决普通计算机无法解决的问题;它代表了生命科学,医疗保健,基因组学,交通运输,零售,物流,制造,工业……代表的行业范围真的很惊人。


你来这里不仅仅是为了参加会议;你来这里是为了展示,谈论你的研究。


今天,世界上有100万亿美元的工业产值聚集在这个房间里,这绝对是惊人的!


02

生成式AI与新的软件时代


生成式AI带来全新的行业


有些事情正在发生。这个行业正在被改变,不仅仅是我们的行业;因为计算机行业——计算机是当今社会最重要的工具——根本性的转型和计算影响着每个行业。


但是我们是如何开始的呢?我们是怎么走到这一步的?我为你做了一幅小卡通,在一页纸上画了这个。


这是Nvidia的旅程,始于1993年。



我们成立于1993年,沿途发生了几个重要事件,我只想强调几个。


2006年,CUDA被证明是一种革命性的计算模型;我们当时认为这是革命性的。它将一夜成名。差不多20年后,它发生了;二十年后的2012年,我们看到了它的到来。


Alex Net AI和CUDA于2016年首次接触,认识到这种计算模型的重要性,我们发明了一种全新的计算机类型,我们称之为DGX1,170 teraflops。


在这台超级计算机中,8个GPU首次连接在一起。我亲手将第一台DGX1交付给了位于旧金山的一家名为OpenAI的初创公司。


DGX1是世界上第一台AI超级计算机。请记住,170 teraflaps。


2017年,变压器到来;2022年,ChatGPT抓住了全世界的想象力,让人们意识到AI的重要性和能力。2023年,生成式AI应运而生,一个新的行业开始了。


新的软件类型出现


为什么是一个新兴行业?因为该软件以前从未存在过。


我们现在正在生产使用计算机编写软件的软件,生产以前从未存在过的软件。这是一个全新的类别。


它从无到有,这是一个全新的类别,您生产软件的方式与我们以前在数据中心所做的任何事情都不同。


生成代币,以非常大的规模产生浮点数,就好像在上一次工业革命开始时,当人们意识到你会建立工厂,为其接入能源,这种看不见的、有价值的叫作电的东西就出来了——AC发电机。


100年后,200年后,我们现在正在创造新型的电子代币,使用我们称之为工厂的基础设施——AI工厂,来产生这种新的,难以置信的有价值的东西叫做AI。


一个新的行业已经出现。


我们将讨论关于这个新行业的很多事情。接下来我们将讨论如何进行计算,谈论由于这个新行业而构建的软件类型——新软件。您如何看待这个新软件,以及这个新行业的应用?接下来会发生什么,我们如何开始为即将到来的事情做准备?


但在开始之前,我想向您展示Nvidia的灵魂——在计算机图形学、物理学和人工智能的交汇处,所有这些都在Omniverse的计算机中交汇在虚拟世界模拟中。


我们今天要展示的一切都是模拟,而不是动画;它之所以美丽,是因为它是物理的。它之所以令人惊叹,是因为它是由机器人技术动画制作的,是由人工智能动画制作的。你一整天都要看到的东西,是完全生成的、完全模拟的、全宇宙的等等。您将要享受的是世界上第一场一切都是自制的音乐会;您将要观看一些家庭视频;所以坐下来享受吧。


03

加速计算与数字孪生生态


加速计算已经到达临界点


加速计算已经达到了临界点。通用计算已经失去了动力。


我们需要另一种计算方式,以便我们可以继续扩展,从而继续降低计算成本,以便我们可以继续消耗越来越多的计算,同时保持可持续性。


加速计算是对通用计算的巨大加速。我将向你们展示许多行业,其影响是巨大的;但没有哪个行业比我们自己的行业更重要,这个行业就是使用仿真工具来创造产品。


在这个行业中,这不是为了降低计算成本,而是为了提高计算的规模。


我们希望能够完全以全保真度、完全数字化的方式模拟我们所做的整个产品。从本质上讲,这就是我们所说的数字孪生。我们希望完全以数字方式设计、构建、模拟和操作它。为了做到这一点,我们需要加速整个行业的发展。


英伟达已经构建了数字孪生生态系统


今天,我想宣布,有一些合作伙伴正在加入我们的行列,以加速他们的整个生态系统,以便我们能够将世界带入加速计算。


我很高兴地宣布几个非常重要的合作伙伴关系。世界上有一些最重要的公司为世界制造的东西进行工程模拟。我们正在与他们合作,加速ANSYS生态系统,将答案连接到Omniverse数字孪生。


难以置信,真正伟大的是, GPU加速系统的安装基础遍布世界各地,在每个云中,在每个系统中,在整个企业中。因此,他们加速的应用程序将拥有庞大的安装基础来为最终用户提供服务,将拥有令人惊叹的应用程序。当然,系统制造商和CSP将拥有巨大的客户需求。


Synopsis是英伟达第一个软件合作伙伴,他们在我们公司的第一天就在那里。


Synopsis以高级设计彻底改变了芯片行业。我们将CUDA加速Synopsis,我们正在加速计算光刻,这是最重要的应用之一。为了制造芯片,我们必须将光刻技术推向极限。Nvidia创建了一个特定于领域的库,可以令人难以置信的加速计算光刻。


一旦我们可以加速和软件定义TSMC的所有业务,他们今天宣布,一旦软件定义和加速,他们将与英伟达Kulitho一起投入生产。


下一步是将生成式人工智能应用于半导体制造的未来。Cadence进一步推动了几何结构的发展,构建了世界上必不可少的EDA和SDA工具。我们还在这三家公司之间使用Cadence,ANSA、Synopsis和Cadence。


我们基本上是一起构建Nvidia。我们擅长加速Cadence。他们还在用Nvidia GPU构建一台超级计算机,这样他们的客户就可以在100,1000倍的规模下进行流体动态模拟。Cadence Millennium,一台配备Nvidia GPU的超级计算机,位于一家构建超级计算机的软件公司内。我喜欢看到这一点。


共同建立Cadence副驾驶。想象有一天,Cadence可以概述ANSIS工具提供商将为您提供AI副驾驶。这样我们就有成千上万的副驾驶助理帮助我们设计船舶、设计系统,我们还将Cadence数字孪生平台连接到Omniverse。


正如您在这里看到的趋势,我们正在加速全球的CAE、EDA和SDA,以便我们可以在数字孪生中创造我们的未来。


我们将把它们全部连接到Omniverse,这是未来数字孪生的基本操作系统,这是从规模中受益匪浅的行业之一。


04

Blackwell GPU的突破与创新


根本性的创新让我们拥有更大的GPU


你们都非常了解大型语言模型。


基本上,在Transformer发明之后,我们能够以惊人的速度扩展大型语言模型,每六个月有效地翻一番。


现在,我们如何可能通过每六个月翻一番来发展这个行业;到目前为止,我们已经增加了计算需求?


原因很简单,如果你把模型的大小翻倍,你的大脑就会翻倍,你需要两倍的信息来填充它。因此,每次你将参数计数翻倍时,你也必须适当地增加训练代币计数。这两个数字的组合成为计算规模。


您必须支持最新的、最先进的OpenAI模型,大约有1.8万亿个参数。1.8万亿个参数需要几万亿个代币才能进行训练。因此,几万亿个参数在几万亿个代币的量级上,当你把它们两个相乘时,大约每秒30,40,500亿次千万亿次浮点运算。


所以你有300亿、1万亿,就像一个PETA。因此,如果你有一个PETA翻牌GPU,你需要300亿秒来计算,去训练这个模型。300亿秒大约是1,000年。好吧,1000年,这是值得的。你喜欢早点做,但这是值得的。


这通常是我的回答,当大多数人告诉我,嘿,做某事需要多长时间?所以20年,这是值得的,但我们下周能做到吗?所以1000年,那么我们需要更大的GPU。



我们很早就认识到了这一点,我们意识到答案是将一大堆GPU放在一起,当然,在此过程中还要创新一大堆东西,比如发明张量核心,推进MV链接,以便我们可以创建巨大的GPU,并将它们与来自Finiband的一家名为Melanox的公司的惊人网络连接在一起,以便我们可以创建这些巨型系统。


所以DGX1是我们的第一个版本,但不是最后一个。


我们一路建造超级计算机。2021年,我们有了Celine 40,大约有500个GPU。然后在2023年,我们建造了世界上最大的AI超级计算机之一。


为了帮助世界建造这些东西,我们构建芯片、系统、网络,以及所有必要的软件。您应该会看到这些系统。


想象一下,编写一个运行在整个系统中的软件,将计算分布在数千个GPU上,但内部有数千个较小的GPU、数百万个GPU,用于在所有这些系统中分配工作并平衡工作负载,以便您可以获得最高的能源效率、最佳的计算时间,从而降低成本。因此,这些根本性创新使我们走到了这一步。


在这里,当我们看到ChatGPT的奇迹出现在我们面前时,我们也意识到我们还有很长的路要走。我们需要更大的模型。


我们将用多模态数据来训练它,而不仅仅是互联网上的文本;我们还将在文本、图像、图形和图表上训练它,就像我们学习看电视一样。因此,将有一大堆观看视频,以便这些模型可以建立在物理学的基础上,理解手臂不能穿墙。因此,这些模型将通过观看世界上许多视频和许多世界语言来获得常识。


它会使用合成数据生成之类的东西,就像你和我尝试学习时所做的那样,我们可能会使用我们的想象力来模拟它的最终结果,就像我在准备这个主题演讲时所做的那样,我一直在模拟它。我希望它能像我在模拟这个主题演讲时所想象的那样好。确实有人说,另一位表演者完全在跑步机上表演,这样她就可以保持身材,以充分的精力进行表演。我没有那样做。


如果我在大约10分钟后遇到小风,你就知道发生了什么。因此,我们坐在这里使用合成数据生成,我们将使用强化学习,我们将在脑海中练习它。


我们将让AI与AI相互训练,就像学生教师辩手一样,所有这些都将增加我们模型的大小。这将增加我们拥有的数据量,我们将不得不构建更大的GPU。


Blackwell不止是芯片


Hopper 很棒,但我们需要更大的GPU。女士们,先生们,我想向你们介绍一个非常大的GPU,以数学家、博弈论家、概率学家David Blackwell的名字命名。


我们认为这是一个完美的名字。Blackwell,女士们,先生们,感受它吧。



Blackwell不是芯片。Blackwell是一个平台的名称。


人们认为我们制造了GPU,我们确实做到了,但GPU看起来不像以前那样了。如果你愿意的话,这是Blackwell系统的核心。


公司内部的这个不叫Blackwell。


这是旁边的Blackwell;这是当今世界上生产最先进的GPU;这是Hopper,Hopper改变了世界;这是Blackwell。



Hopper,2080亿个晶体管。所以你可以看到,我可以看到两种染料之间有一条小线。这是第一次两种染料,但是它以这样一种方式在一起,以至于两个芯片,两个染料认为它是一个芯片。它之间有10 TB的数据,每秒10TB。


因此,Blackwell芯片的这两面都不知道它们在哪一边。没有内存局部性问题,没有缓存问题。它只是一个巨大的芯片。


因此,当我们被告知Blackwell的野心超出了物理学的极限时,工程师说,那又怎样?这就是发生的事情。


Blackwell的两类系统远超Hopper的速度


这就是Blackwell芯片,它分为两种类型的系统。


第一个是与Hopper兼容的形状拟合功能。


所以你在Hopper上滑动,然后推入Blackwell。这这就是为什么其中一个挑战是如此高效的原因。世界各地都有料斗的安装,它们可以是相同的基础设施,相同的设计,电源,电力,热能,软件,完全相同,立即将其推回去。因此,这是当前HGX配置的料斗版本。这就是另一个,第二个料斗的样子。现在这是一个原型板。


所以这是一个功能齐全的板。我在这里会小心,这是100亿美元,相当昂贵的。这是启动板,它的生产方式就像这个。它有两个Blackwell染料、两个Blackwell芯片和四个连接到Grace CPU的Blackwell染料。


Grace CPU具有超快的芯片到芯片链路。令人惊奇的是,这台计算机是同类计算机中的第一台,首先,如此多的计算适合这个小地方。其次,它是记忆连贯的。他们感觉就像是一个快乐的大家庭,一起处理一个应用程序。因此,其中的一切都是连贯的。


你看到了数字,这个有很多TB,那个有很多TB;但这是个奇迹,让我们看看这里有哪些东西。上面有MV链接,底部有PCI Express,其中之一是CPU芯片到芯片链接,这就是格蕾丝·布莱克威尔(Grace Blackwell)系统,但还有更多。


所以事实证明,所有的规格都很棒,但我们需要很多新功能,以便超越物理极限,如果你愿意的话,我们希望总是得到更多的X因素。


因此,我们所做的一件事就是发明了另一种变压器引擎,第二代。它具有动态和自动重新调整和重新转换数字格式以降低精度的能力。


请记住,人工智能是关于概率的。因此,数学在管道的特定阶段保持必要的精度和范围的能力非常重要。


因此,这不仅仅是关于我们设计了一个更小的ALU的事实,世界并不是那么简单。


你必须弄清楚什么时候可以在数千个GPU的计算中使用它。它运行了数周又数周,您希望确保训练作业将收敛;所以这个新的变压器引擎,我们有了第五代MV Link。它现在的速度是Hopper的两倍,但非常重要的是,它在网络中具有计算能力。


这样做的原因是,当你有这么多不同的GPU一起工作时,我们必须相互共享我们的信息。我们必须相互同步和更新。每隔一段时间,我们就必须减少部分产品,然后重新扩展部分产品,一些部分产品回馈给其他人。因此,有很多所谓的all reduce和all to all和all gather,这都是这个同步和集合领域的一部分,这样我们就可以让GPU相互协作,拥有非常快的链接,并且能够在网络中进行数学运算,这使我们能够从根本上进一步放大。


因此,即使它是每秒1.8TB,它实际上也高于此值。所以它是Hopper的很多倍。


Ras引擎、加密传输与解压缩


超级计算机运行数周的可能性几乎为零。这是因为有太多的组件同时工作。据统计,他们连续工作的概率非常低。


因此,我们需要确保每当性能波动时,我们都会尽可能多地检查并重新启动。但是,如果我们有能力及早检测到一周芯片或弱音符,我们可以将其退役,并可能换成另一个处理器,这种保持超级计算机高利用率的能力,特别是当你刚刚花了20亿美元建造它时,这是非常重要的。


因此,我们安装了一个Ras引擎,一个可靠性引擎,它可以对每个门、Blackwell芯片上的每一个内存以及与之连接的所有内存进行100%的系统测试,这几乎就像我们为每个芯片配备了自己的高级测试仪,我们用它来测试我们的芯片。


这是我们第一次这样做。对此超级兴奋——安全的AI!


只有这次会议,他们才会为Ras鼓掌。这是一个安全的AI。显然,你刚刚花了数亿美元创造了一个非常重要的AI;AI的代码和智能被编码在参数中。你要确保一方面,你不会丢失它,另一方面,它不会被污染。


因此,我们现在有能力加密数据,当然,静态数据,但也在传输中,在计算数据时,它都是加密的。因此,我们现在拥有了加密和传输的能力。当我们进行计算时,它处于一个可信的、可信的环境、可信的引擎环境中。


最后一件事是解压缩,当计算速度如此之快时,将数据移入和移出这些节点变得非常重要。因此,我们采用了高速压缩引擎,有效地将数据移入和移出这些计算机的速度提高了20倍。


这些计算机是如此强大,而且投资如此之大,以至于我们最不想做的就是闲置。因此,所有这些功能都是为了让Blackwell保持忙碌。


总体而言,与Hopper相比,每个芯片的训练性能是FPA的两倍半。此外,还具有称为FP6的新格式,因此即使计算速度相同,但由于内存而放大的带宽,现在可以存储在内存中的参数量现在也会放大。FP4有效地将吞吐量提高了一倍。


推理的计算模式不同于传统的检索计算


这对于推理至关重要。


变得非常清楚的一件事是,每当你使用另一侧有AI的计算机时,当你与聊天机器人聊天时,当你要求它审查或制作图像时。请记住,后面是生成令牌的GPU。有些人称之为推理,但更恰当的是生成。


过去的计算方式是检索。你会拿起手机,触摸某些东西,一些信号就会响起。基本上,一封电子邮件会发送到某个地方的某个存储空间。有预先录制的内容。有人写了一个故事,有人做了一个图像,或者有人录制了一个视频。这些记录,预先录制的内容然后被流式传输回手机,并以基于推荐系统的方式重新组合,以向您呈现信息。


您知道,在未来,绝大多数内容将不会被检索。这样做的原因是,这是由不了解上下文的人预先录制的,这就是为什么我们必须检索这么多内容的原因。如果你能和一个理解上下文的AI一起工作,你是谁,你为什么要获取这些信息,并以你喜欢的方式为你生成信息。


我们节省的能源量,节省的网络带宽量,节省的时间浪费量将是巨大的。未来是生成式的,这就是为什么我们称之为生成式AI,这就是为什么这是一个全新的行业。


我们的计算方式根本不同。


MV Link的突破与DGX


我们为生成式AI时代创建了一个处理器,其中最重要的部分之一是内容令牌生成。我们称此格式为FP4,这是大量的计算。


5倍的代币生成,5倍的Hopper推理能力似乎已经足够了。但为什么要止步于此呢?答案是不够的,我将告诉你为什么。


因此,我们希望拥有更大的GPU,甚至比这个更大。因此,我们决定扩大规模,首先让我告诉你们我们是如何扩大规模的。


在过去的八年中,我们的计算量增加了1,000倍。八年,1000倍。还记得摩尔定律的美好时光,它是2倍,每5年5倍,每5年10倍。这是最简单的地图。每5年10倍,每10年100倍。在PC革命的鼎盛时期,每10年每10次,每100次。


在过去的八年里,我们已经走了1000次。我们还有两年的时间。因此,从这个角度来看,我们推进计算的速度是疯狂的,而且仍然不够快。


所以我们制造了另一个芯片。这个芯片,简直就是一个不可思议的芯片。我们称之为MV Link开关。这是500亿个晶体管。它本身几乎和Hopper一样大。这个开关里面有四个MV链路,每个链路每秒1.8TB。


正如我所提到的,它有计算能力,这个芯片是做什么用的?如果我们要构建这样的芯片,我们可以让每个GPU同时全速与其他每个GPU通信。这太疯狂了。


这甚至没有意义。但是,如果你能找到一种方法来做到这一点,并建立一个具有成本效益的系统,那么我们可以让所有这些GPU通过一个连贯的链路连接起来,使它们有效地成为一个巨大的GPU,那将是多么不可思议。好吧,为了使其具有成本效益,一项伟大的发明是该芯片必须直接驱动铜。这种芯片的保证只是一个了不起的发明,因此我们可以直接驱动铜。因此,您可以构建一个看起来像这样的系统。


现在这个系统有点疯狂。这是一个DGX,这就是DGX现在的样子。


请记住,就在六年前,它很重,但我能够举起它。我向OpenAI和那里的研究人员交付了第一个DGX1。


它在网上,你知道,图片都在网上,我们都签名了。如果你来我的办公室,它有亲笔签名。真的很漂亮。但是,你可以举起它。顺便说一句,那个DGX,是170 teraflops。如果您不熟悉编号系统,那就是0.17 pedoflops。所以这是7:20。我交付给OpenAI的第一个是0.17。您可以将其四舍五入为0.2并做出任何改变。但是,你知道,还有30 teraflops,现在是720 pedoflops,几乎是训练的X翻牌。世界上第一台1 x翻牌机在一个机架上。如您所知,在我们说话的时候,地球上只有几台2,3 exaflops的机器。所以这是一个在一个机架上的出口flop AI系统。


让我们来看看它的背面。所以这就是使它成为可能的原因。那是背面。DGXMVLink脊柱,每秒130 TB 的DGX MV Link主干通过该机箱的背面,这比互联网的总带宽还要多。


因此,我们基本上可以在一秒钟内将所有内容发送给每个人。所以我们有5000根电缆,总共5000条MV链路电缆,两英里。现在,这是一件令人惊奇的事情。如果我们必须使用光学器件,我们将不得不使用收发器和重定时器。仅这些收发器和重定时器就要花费20,000瓦,仅收发器就要花费2千瓦,只是为了驱动包络脊柱。


因此,我们通过MV链路交换机完全免费完成,并且能够节省20千瓦的计算功率。整个机架的功率为120千瓦。因此,20千瓦的功率会产生巨大的差异。它是液冷的。进去的是25摄氏度,大约是室温。出来的是你的按摩浴缸的45摄氏度。


我们可以卖出600,000个零件。当有人说GPU时,我看到了这一点。两年前,当我看到GPU是HGX时,它是70,35,000英镑的零件。我们的GPU现在是600,000个零件和3,000磅。3000磅,这有点像碳纤维法拉利的重量。


你提到这一点,不知道3000磅是多少?好吧,3,000磅,一吨半。所以它不完全是一头大象。这就是DGX的样子。现在让我们看看它在运行中的样子,我们如何将其付诸实践,这意味着什么?


如果你要训练一个GPT模型,1.8万亿参数模型,它显然需要大约3到5个月左右的时间,25,000安培。如果我们用Hopper来做,可能需要大约8,000个GPU,并且会消耗15兆瓦。8,000个GPU和15兆瓦,这需要90天,大约三个月。这将允许你训练一些东西,你知道,这个突破性的AI模型。这显然没有任何人想象的那么昂贵,但它是8,000 个GPU,这还是很多钱。所以8,000个GPU,50兆瓦,如果你要用Blackwell来做到这一点,只需要2,000个GPU。2,000个GPU,相同的90天,但是令人惊奇的部分,只有4兆瓦的功率。


我们的目标是不断降低成本和能源。它们彼此成正比,与计算相关的成本和能源成正比,以便我们可以继续扩展和扩展我们必须进行的计算,以训练下一代模型。


推理极其困难的原因


现在Nvidia GPU在云中的时间可能有一半,它被用于代币生成。他们要么在做副驾驶,要么在聊天;ChatGPT,当你与它交互或生成图像或生成视频、生成蛋白质、生成化学物质时正在使用的所有这些不同的模型,有一堆生成正在进行。


所有这些都基于我们称之为推理的计算类别。但是,对于大型语言模型来说,推理是极其困难的,因为这些大型语言模型具有几个属性。


第一,它们非常大,因此无法安装在一个GPU上。


想象一下Excel无法安装在一个GPU上。想象一下你每天运行的某些应用程序无法安装在一台计算机上,就像视频游戏无法安装在一台计算机上一样。过去很多时候,在超大规模计算中,许多人的许多应用程序都适合在同一台计算机上。


现在突然之间,这个推理应用程序,你正在与这个聊天机器人交互,那个聊天机器人需要一个超级计算机来运行它。这就是这些聊天机器人的生成未来。这些聊天机器人是数万亿个代币,数万亿个参数,它们必须以交互速率生成代币。


当你与它交互时,你希望令牌尽快回到你身边,并尽快读取它。因此,生成代币的能力非常重要。您必须在许多GPU上运行此模型的工作,以便实现几个目标。


一方面,您需要吞吐量,因为吞吐量降低了成本,即每个代币生成的总成本。因此,您的吞吐量决定了提供服务的成本。另一方面,您还有另一个交互速率,即每秒另一个代币,这与服务质量有关。


因此,这两件事相互竞争,我们必须找到一种方法,在所有这些不同的GPU上分配工作,并以一种使我们能够同时实现两者的方式运行。


事实证明,搜索空间是巨大的。刚才当我放上那张幻灯片时,我听到了一些喘息声。



在这里,里的y轴是每秒数据中心吞吐量的代币数,X轴是人的每秒交互性代币数。请注意,右上角是最好的。您希望每个用户每秒互动的代币数非常高。


您希望每个数据中心的每秒代币数非常高。右上角太棒了。但是,要做到这一点非常困难。为了让我们在每个交叉点、x、y坐标上搜索最佳答案,你只需查看每一个x,y坐标。所有这些蓝点都来自软件的重新分区。



一些优化解决方案必须弄清楚是使用Tensor Parallel、Expert Parallel、Pipeline Parallel还是Data Parallel,并将这个巨大的模型分布在所有这些不同的GPU 上,并维持您需要的性能。如果不是因为Nvidia GPU的可编程性,这个探索空间将是不可能的。


因此,由于CUDA,因为我们有如此丰富的生态系统,我们可以探索这个宇宙,找到那条绿色的屋顶线。原来是绿色屋顶线,请注意你得到了一个TP两个EPA四个DP。


这意味着2个并行,2 个GPU之间的两个张量并行,8个GPU上的专家并行,4个数据并行。请注意,在另一端,您获得了跨4个的张量并行和跨16个的专家并行。


该软件的配置和分发,它是一个不同的运行时,会产生这些不同的结果。你必须去发现那条屋顶线。这只是计算机的一种配置。


想象一下,世界各地正在创建的所有模型以及将要提供的所有不同配置的系统。


Blackwell的推理能力是Hopper的30倍


因此,现在您已经了解了基础知识,让我们来看看Blackwell与Hopper的推理。


这是一代人中非同寻常的事情,因为我们创建了一个专为万亿参数生成式人工智能设计的系统。Blackwell的推理能力超乎想象。事实上,它是Hopper的30倍。是的,对于像ChatGPT这样的大型语言模型和其他类似的模型,蓝线是Hopper。想象一下我们没有改变Hopper的架构,我们只是让它成为一个更大的芯片。


我们只使用最新的、最大的10 TB,每秒TB。我们将两个芯片连接在一起。我们得到了这个巨大的2080亿参数芯片。如果没有其他变化,我们会如何表现?结果非常美妙。这就是紫线,但没有那么大。


这就是FP4张量核心、新的Transformer引擎以及非常重要的MV长度开关所在。这样做的原因是,所有这些GPU都必须共享结果,部分产品,无论何时它们都这样做,只要它们相互通信,MV链路交换机的通信速度几乎比我们过去使用最快的网络所能做到的快10倍。


所以Blackwell将是一个了不起的生成式AI系统。正如我之前提到的,在未来,数据中心将被视为一个AI工厂。


每个CSP都在与Blackwell签约


AI工厂的目标是创造收入,在这个设施中产生智能,不是像交流发电机那样发电,而是上一次工业革命和这次工业革命,智能的产生。


所以这种能力非常重要。Blackwell的兴奋程度真的非常高。两年前,当我们第一次开始与Hopper一起进入市场时,我们有幸有两家CSP加入我们的发布会,我们非常高兴。我们现在有更多。


Blackwell令人难以置信的兴奋,还有一大堆不同的配置。我向您展示了滑入料斗外形的配置,因此很容易升级。


我向你展示了液冷的例子,这是它的极端版本。通过MV Link 672连接的整个机架,Blackwell将向世界上的人工智能公司进军,其中有很多公司现在以不同的方式做着惊人的工作;每个CSP都已准备就绪,全球所有OEM和ODM、区域云、主权AI和电信公司都在与Blackwell签约。


这个Blackwell将是我们历史上最成功的产品发布,所以我迫不及待地想看到它。我要感谢一些加入我们的合作伙伴。


AWS正在为Blackwell做准备。他们将构建第一个GPU,在那里保护AI,构建一个222x flops系统。


刚才我们制作数字孪生动画时,那不仅仅是艺术,这是我们正在构建的数字孪生。


这就是它的规模。除了基础设施之外,我们还与AWS一起做了很多事情。我们正在加速Sagemaker AI。我们可以加速BedrockAI。Amazon Robotics正在使用Nvidia Omniverse和Isaac Sim与我们合作。AWS health已将Nvidia health集成到其中。因此,AWS确实倾向于加速计算。


谷歌正在为Blackwell做准备。GCP已经拥有100个100秒、一整套即时的CUDA GPU。他们最近宣布了贯穿所有这一切的Gemma模型,我们正在努力优化和加速GCP的各个方面。我们正在加速数据处理引擎的数据处理程序、Jacks、XLA、Vertex AI和机器人的Mujoko。


因此,我们正在与Google和GCP合作开展一系列计划。


甲骨文正在为Blackwell做准备。Oracle是Nvidia DGX Cloud的重要合作伙伴,我们也在共同努力加速对许多公司来说非常重要的Oracle数据库。


Microsoft正在加速,Microsoft正在为Blackwell做准备。Microsoft和Nvidia拥有广泛的合作伙伴关系。


当你聊天时,我们正在加速各种服务,AI微软Azure中的服务,英伟达很可能在后面进行推理和代币生成。


他们建造了Finiband超级计算机中最大的Nvidia,基本上是我们的数字孪生或我们的物理孪生。我们将Nvidia生态系统引入Azure,将Nvidia DGRX云引入Azure。


Nvidia Omniverse现在托管在Azure中,Nvidia Healthcare托管在Azure中,所有这些都与Microsoft Fabric深度集成并深度连接。


整个行业都在为Blackwell做准备。这就是我要向你展示的,到目前为止,你所看到的Blackwell的大部分场景都是Blackwell的全保真设计。


05

数字化与生成AI的革命性应用


数字孪生让Wistron实现了什么


我们公司的一切都有一个数字孪生。


事实上,这个数字孪生的想法确实在传播,它可以帮助公司在第一时间完美地构建非常复杂的东西。


还有什么比创建数字孪生来构建内置在数字孪生中的计算机更令人兴奋的呢?因此,让我向您展示Wistron在做什么。



为了满足Nvidia加速计算的需求,我们领先的制造合作伙伴之一Wistron正在使用使用Omniverse SDK和API开发的定制软件构建Nvidia DGX和HGX工厂的数字孪生。


对于他们最新的工厂,Wistron从数字孪生开始,将他们的多CAD和过程模拟数据虚拟集成到一个统一的视图中。在这个物理准确的数字环境中测试和优化布局,将工人效率提高了51%。


在施工过程中,我们使用Omniverse数字孪生来验证物理构建是否与数字计划相匹配,及早发现任何差异有助于避免成本,主要是更改订单,结果令人印象深刻。


使用数字孪生帮助Wistron的工厂在一半的时间内上线,只需两个半月,而不是五个月。


Omniverse数字孪生有助于快速退出、测试新布局以适应新流程或改善现有空间的运营,并使用来自生产线上每台机器的实时物联网数据监控实时运营,最终使Wistron能够将端到端周期时间缩短50%,缺陷率降低40%。


借助Nvidia AI和Omniverse,Nvidia的全球合作伙伴生态系统正在构建加速AI数字化的新时代。


数字化与生成AI革命


这就是我们的现状,未来我们将首先数字化制造一切,然后再进行实体制造。


人们问我,它是如何开始的?是什么让你们如此兴奋?是什么让你们看到了这个令人难以置信的想法?就是这个。



那将是一个非常重要的时刻。正如你所知道的,2012年,Alex Net。


你把一只猫放进这台电脑里,它出来了,上面写着猫。我们说,天哪,这将改变一切。你在三个通道上获取100万个数字,RGB,这些数字对任何人都没有意义。你把它放到这个软件中,它会在维度上压缩它,减少它。它从一百万个维度减少,变成三个字母,一个向量,一个数字,它是通用的。


你可以让猫是不同的猫,你可以让它成为猫的正面和猫的背面。你看这个东西,难以置信,它能够认出所有这些猫。


我们意识到它是如何系统地、结构化地做到这一点的,它是可扩展的。你能做多大?你想把它做多大?所以我们想象这是一种全新的编写软件的方式。


今天,你可以输入C、a、t这个词,然后出来的是一只猫。它走了另一条路。我说得对吗?难以置信。这怎么可能?没错。你拿了三个字母,从中生成了一百万个像素,这是有意义的。这就是奇迹。


而现在,仅仅十年后,我们识别文本,我们识别图像,我们识别视频,声音和图像。


我们不仅认识到它们,而且理解它们的含义;我们理解文本的含义,这就是我能和你聊天的原因。它可以为您总结一下,它理解文本,不仅仅是识别英语,理解英语;它不仅能识别像素,还能理解像素。您甚至可以在两种模式之间调节它,你可以有语言条件图像并生成各种有趣的东西。


如果你能理解这些东西,你还能理解你数字化的其他东西吗?我们之所以从文本和图像开始,是因为我们把它们数字化了。但是我们还数字化了什么?事实证明,我们数字化了很多东西,蛋白质、基因和脑电波。


任何你可以数字化的东西,只要它们的结构,我们大概可以从中学到一些模式。如果我们能从中学习模式,我们就能理解它的含义。如果我们能理解它的意义。如果我们能理解它的意义,我们也可能能够产生它。因此,生成AI革命就在这里。


CoreDiff模型:高分辨率预测天气


那么,我们还能产生什么?我们还能学到什么?


我们想学习的一件事是气候;我们很想了解极端天气,如何以足够高的分辨率预测区域尺度的未来天气,以便我们可以在伤害来临之前让人们远离危险。


极端天气给世界造成的损失高达1500亿美元,肯定不止于此;而且分布不均匀,500亿美元集中在世界某些地区。


当然,对于世界上的一些人来说,我们需要适应,我们需要知道会发生什么。因此,我们正在创建地球2号,一个用于预测天气的地球数字孪生。


我们发明了一项非凡的发明,称为Core Diff,让我们来看看。



随着地球气候变化,AI驱动的天气预报使我们能够更准确地预测和跟踪像超强台风Chanthu这样的强风暴,这些风暴在2021年在台湾及周边地区造成广泛破坏。


目前的AI预报模型可以准确预测风暴的轨迹,但它们的分辨率仅限于25公里,这可能会错过重要的细节。


Nvidia的Corediff是一种革命性的新型生成式AI模型,在高分辨率雷达上训练,同化战争预测上训练,并且有五个使用Corediff的重新分析数据。



像Chanthu这样的极端事件可以从25公里到2公里的分辨率进行超级解决,速度是传统天气模型的1000倍,能效是传统天气模型的3000倍。


通过结合Nvidia的天气预报模型、预报网和生成AI模型(如corediff)的速度和准确性,我们可以探索数百甚至数千公里尺度的区域天气预报,以提供风暴的最佳、最坏和最有可能影响的清晰画面。这些丰富的信息有助于最大限度地减少生命损失和财产损失。


今天,Corediff针对台湾进行了优化,很快,生成式超级采样将作为Nvidia Earth 2推理服务的一部分提供给全球许多地区。


气象公司必须相信全球天气预报的来源,我们正在共同努力加速他们的天气模拟,这是模拟的第一个原则基础。


然而,他们还将整合地球到核心的差异,以便他们可以帮助企业和国家进行区域高分辨率天气预报。因此,如果您想知道一些天气预报,请与气象公司联系。真是令人兴奋的工作。


Bio Nemo正在推动药物发现的新范式


还有视频医疗保健,这是我们15年前开始的,我们对此感到非常兴奋。这是一个我们非常自豪的领域,无论是医学成像、基因测序还是计算化学,Nvidia很可能是其背后的计算技术。我们在这方面做了很多工作。


今天,我们宣布我们将做一些非常酷的事情。



想象一下,所有这些AI模型都被用来生成图像和音频,但不是图像和音频,因为它理解音频中的图像,我们对基因、蛋白质和氨基酸所做的所有数字化能力现在都通过机器学习来实现,这样我们就可以理解生命的语言。


当然,我们在Alpha fold中看到了它的第一个证据,这真的是一件非常了不起的事情。经过几十年的艰苦工作,世界只使用冷冻电子显微镜或晶体X射线晶体学进行了数字化和重建,这些不同的技术煞费苦心地重建了200,000个蛋白质。


在不到一年左右的时间里,Alpha Fold已经重建了2亿种蛋白质,基本上是每一种蛋白质,每一种生物都被测序了。这完全是革命性的。


这些模型很难使用,对人们来说很难构建。所以我们要做的就是为全世界的研究人员建造它们。而且它不会是唯一的。我们还将创建许多其他模型。让我来给你们展示一下我们要用它做什么。


新药的虚拟筛选是一个计算上难以解决的问题。现有技术只能扫描数十亿种化合物,并且需要数天时间在数千个标准计算节点上识别新的候选药物。



Nvidia Bio Nemo Nims启用了一种新的生成筛选范式,使用Nims进行蛋白质结构预测,使用mole MIM生成Alpha fold分子并与Diff DOC对接,我们现在可以在几分钟内生成和筛选候选分子。



Mole MIM可以连接到自定义应用程序,以迭代方式引导生成过程,以优化所需的属性。这些应用程序可以定义,使用bio nemo微服务或从头开始构建。


在这里,基于物理的模拟优化了分子与靶蛋白结合的能力,同时优化了其他有利的分子特性。同时,malmim生成高质量的类似药物的分子,这些分子与目标结合并可合成,从而提高了开发成功药物的概率。


Bio Nemo正在推动药物发现的新范式,Nims提供按需微服务,可以组合构建强大的药物发现工作流程,如联想蛋白质设计或引导分子生成用于虚拟筛选。Bio Nemo Nims正在帮助研究人员和开发人员重塑计算药物设计。


06

英伟达如何做一个AI代工厂


接收和操作软件的新方式Nim


Nvidia model、Core Diff,还有一大堆其他模型,计算机视觉模型、机器人模型,当然还有一些非常棒的开源语言模型。


这些模型是开创性的。然而,公司很难使用。你会如何使用它?您将如何将其带入您的公司并将其集成到您的工作流程中,您将如何打包并运行它?


还记得之前说过,推理是一个非同寻常的计算问题。您将如何对每个模型进行优化,并将运行该超级计算机所需的计算堆栈放在一起,以便您可以在公司中运行这些模型。


所以我们有了一个好主意。我们将发明一种新的方式,让您接收和操作软件。该软件基本上装在一个数字盒子里,我们称它为Nvidia Inference Micro Service,即Nim。


我想向你解释它是什么,一个Nim是一个预先训练好的模型;所以它非常聪明,它经过打包和优化,可以在Nvidia的安装基础上运行。该安装基础非常大,里面的东西令人难以置信;你已经拥有了所有这些预先训练的开源模型状态——它们可以是开源的,也可以来自我们的合作伙伴之一,也可以由我们创建;它与它的所有依赖项打包在一起。


因此,正确版本的CUDA,coudnn,Tensor,RT,LLM,分布在多个GPU上,尝试一个推理服务器,所有这些都完全打包在一起。它根据您是拥有单个GPU、多GPU还是多节点GPU进行优化,它还连接了易于使用的API。


现在,想想AI API 是什么,一个你刚刚谈到的界面。因此,这是一个未来的软件,一个非常简单的API。


这个API被称为human;这些软件包,令人难以置信的软件体系将被优化和打包,我们会把它放在一个网站上。你可以下载它,可以随身携带它,可以在任何云中运行它;你可以在你自己的数据中心运行它,如果它适合的话,你可以在工作站中运行它。


你所要做的就是来到 AI.Nvidia.com,我们称之为Nvidia Inference微服务,但在公司内部,我们都称它为Nims。


使用一个超级AI完成各种任务


试想一下,有一天会有一个这样的聊天机器人,这些聊天机器人只是在Nim中,你会组装一大堆聊天机器人,这就是软件将来构建的方式。


未来我们如何构建软件?你不太可能从头开始编写它或编写一大堆Python代码或类似的东西。您很可能会组建一个AI团队,你可能会使用一个超级AI,它接受你赋予它的任务,并将其分解为执行计划。


其中一些执行计划可以交给另一个Nim。那个Nim可能会理解SAP,SAP的语言是ABAP;它现在可能理解服务,并从他们的平台上检索一些信息,然后它可能会把结果交给另一个Nim,后者会对其进行一些计算;也许它是一个优化软件,一个组合优化算法。也许,有些只是一些基本的计算器,对它做一些数值分析,然后它带着它的答案回来了,和其他人的答案结合在一起;因为它被呈现了正确答案应该是什么样子,它知道应该产生什么答案,什么正确答案并呈现给你。


我们每天都可以在某个时间点得到一份报告,该报告与构建计划、预测、客户警报、错误数据库或其他任何内容有关。我们可以用所有这些名字来组装它。由于这些NIM已经打包并准备好在您的系统上工作,只要您在云中的数据中心中有视频GPU,这个Nims将作为一个团队一起工作,做一些了不起的事情。所以我们决定这是一个好主意,我们要去做这件事。因此,英伟达在整个公司都运行了NIMS。


自定义Nim与Nemo微服务


我们到处都在创建聊天机器人。当然,最重要的聊天机器人之一是芯片设计师聊天机器人。你可能不会感到惊讶——我们非常关心制造芯片。因此,我们想构建聊天机器人,人工智能副驾驶,与我们的工程师共同设计。这就是我们的做法。


所以我们给自己找了一个LIama2,这是70B,它被包装在NIM中。我们问它,你知道什么是CTL?事实证明CTL是一个内部程序,并且具有内部专有语言。但它认为CTL是一种组合时序逻辑。所以它描述了CTL的传统知识,但这对我们来说不是很有用。因此,我们给了它一大堆新的例子;这与员工入职没有什么不同。我们说,谢谢你的回答,这是完全错误的;然后我们向他们介绍,CTL是什么。CTL,正如你所看到的,CTL代表计算跟踪库,这是有道理的,我们一直在跟踪计算周期我写了这个程序,是不是很神奇?


因此,我们的芯片设计人员的生产力可以提高。这就是您可以使用Nim做的事情。你可以做的第一件事,自定义它。


我们有一项名为Nemo Microservice的服务,可帮助您整理数据,准备数据,以便您可以在AI上教授这些数据,您可以对其进行微调,然后保护它。您甚至可以评估答案,根据其他示例评估其性能。这就是所谓的Nemo Microservice服务。


AI代工厂的三个支柱


我们正在做的事情有三个要素,三个支柱。


第一个支柱当然是发明AI模型的技术,运行AI模型并为您打包。


第二种是创建工具来帮助您修改它:首先是拥有人工智能技术,其次是帮你修改它。


第三是基础设施供你微调。如果您喜欢部署它,您可以将其部署在我们名为DGX Cloud的基础设施上,或者您可以将其部署在本地。您可以将其部署在任何您喜欢的地方。一旦你开发了它,它就是你的,可以带到任何地方。


因此,我们实际上是一个AI代工厂。我们将在AI方面为您和行业做台积电为我们制造芯片所做的工作。我们带着我们的制造的伟大想法去台积电,这里完全一样,AI Foundry。三大支柱是Nims、Nemo微服务和DGX Cloud。


未来的AI数据库


你可以教Nim做的另一件事是了解你的专有信息。请记住,在我们公司内部,我们的绝大多数数据都不在云中。它在我们公司内部,它一直在那里,一直被使用。


这基本上是模棱两可的情报。我们想获取这些数据,了解它的含义,就像我们了解我们刚才谈到的几乎所有其他事物的含义一样,了解它的含义,然后将这些知识重新索引到一种称为向量数据库的新型数据库中。因此,你基本上采用结构化数据或非结构化数据,你了解它的含义,你对它的含义进行编码。


所以现在它变成了一个AI数据库。而未来的AI数据库,一旦你创建了它,你就可以和它对话。让我举个例子,说明你能做些什么。


假设你创建了一大堆多模态数据,一个很好的例子是PDF。你把你所有的PDF带到所有你最喜欢的地方,这些东西对你来说是专有的,对你的公司至关重要,你可以像我们编码猫的像素一样编码它,它变成了猫这个词,我们可以对你所有的PDF进行编码, 它变成了现在存储在向量数据库中的向量。


它成为贵公司的专有信息。一旦你有了这些专有信息,你就可以和它聊天了。这是一个智能数据库。所以你只是和数据聊天。这有多有趣?


我们的软件团队,他们只是和错误数据库聊天;你知道昨晚有多少错误?我们取得了任何进展吗?然后,在你完成与这个错误数据库的交谈后,你需要治疗。因此我们为您提供了另一个聊天机器人,你可以做到。


好吧,所以我们称它为Nemo检索器。这样做的原因是,它的最终工作是尽快检索信息。你只需要和它说话,嘿,给我检索这些信息。它会把它带回来给你。你是指这个吗?你会说,是的,完美。好的。所以我们称它为Nemo检索器。


Nemo服务可以帮助您创建所有这些东西,并且我们拥有所有这些不同的NIM,我们甚至有数字人的名字。“我是Rachel,你的AI经理。”



这是一个非常短的片段,但是有很多视频要给你看,我想还有很多其他的演示要给你看,所以我不得不把这个视频剪短。


但这是Diane。她是一个数字人Nemo。你刚刚和她交谈过,在这种情况下,她与Hippocratic AI的医疗保健大型语言模型建立了联系。这真的很神奇,她在医疗保健方面非常聪明。所以在你完成之后,在我的软件工程副总裁 Dwight与聊天机器人交谈以获取数据库之后,你可以过来和Diane交谈。因此,Diane完全被人工智能动画化,她是一个数字人。


与英伟达合作AI工厂的客户们


有很多公司想要建立。他们坐在金矿上。企业IT行业坐拥金矿。这是一座金矿,因为他们对工作方式有如此多的了解。他们拥有多年来创造的所有这些令人惊叹的工具,并且他们坐拥大量数据。如果他们能把这个金矿变成副驾驶员,这些副驾驶员可以帮助我们做事情。因此,世界上几乎每一个拥有人们使用的宝贵工具的IT平台都坐在副驾驶的金矿上,他们想建立自己的副驾驶和自己的聊天机器人。


因此,我们宣布Nvidia AI Foundry正在与世界上一些伟大的公司合作。


SAP创造了全球87%的全球商业。基本上,世界都在SAP上运行。我们在SAP和视频上运行,SAP正在使用Nvidia,Nemo和DGX云服务构建SAP Jewel联合驾驶。


现在,他们经营着全球80-85%的财富500强公司,他们的人员和客户服务业务都在服务上。他们现在正在使用 Nvidia AI Foundry来构建服务,协助虚拟协助。Cohecity备份他们坐拥的数据金矿,数百EB的数据,超过10,000 家公司,Nvidia AI Foundry 正在与他们合作,帮助他们构建他们的Gaia生成式AI代理。


Snowflake是一家将全球数字仓库存储在云中的公司,每天为10,000家企业客户提供超过30亿次查询。Snowflake 正在与Nvidia AI Foundry合作,与Nvidia、Nemo和Nims Nedap建立联合驾驶。


世界上近一半的文件都存储在本地、META和视频AI上,Foundry正在帮助他们构建聊天机器人和副驾驶,例如与Nvidia,Nemo和Nims合作的向量数据库和检索器。


我们与戴尔建立了良好的合作伙伴关系,每个正在构建这些聊天机器人和生成式人工智能的人,当你准备好运行它时,你将需要一个AI工厂;没有人比戴尔更擅长为企业构建大规模的端到端系统。


因此,任何人、任何公司、每家公司都需要建立AI工厂。事实证明,迈克尔在这里。他很乐意接受你的订单。女士们,先生们,迈克尔·戴尔。


07

AI机器人的未来与协同工作


智能机器人的三个系统


让我们来谈谈下一波机器人,下一波人工智能机器人,物理AI。


到目前为止,我们讨论的所有AI都是一台计算机。数据进入一台计算机。如果你愿意的话,很多世界都以数字文本的形式体验。AI通过阅读大量语言来模仿我们来预测下一个单词。它通过研究所有模式和所有其他先前的例子来模仿你。当然,它必须理解上下文等等。但是一旦它理解了上下文,它本质上是在模仿你。


我们获取所有数据,将其放入像DGX这样的系统中,将其压缩为大型语言模型,数万亿个代币变成了数十亿个参数,这数十亿个参数将成为您的AI。


为了让我们进入下一波AI浪潮,让AI理解物理世界,我们将需要三台计算机。


第一台计算机仍然是同一台计算机。AI计算机正在观看视频,也许它正在生成合成数据。也许有一个人类的例子,就像我们有文本形式的人类例子一样,我们将有表达形式的人类例子,AI会观察我们,了解正在发生的事情,并尝试将其调整到上下文中。


因为它可以用这些基础模型进行泛化,也许这些机器人也可以在物理世界中相当普遍地执行。因此,我只是用非常简单的术语描述了大型语言模型中刚刚发生的事情,除了机器人技术的ChatGPT时刻可能即将到来。因此,一段时间以来,我们一直在为机器人技术构建端到端系统。我为我们的工作感到非常自豪。


AI系统,DGX。我们有一个较低的系统,称为AGX for Autonomous Systems,这是世界上第一个机器人处理器。


当我们第一次建造这个东西时,人们会问,你们在建造什么?这是一个 Soc,它是一个芯片,被设计为非常低的功耗,但它是为高速传感器处理和AI而设计的。


因此,如果您想在汽车上运行变压器,或者您想在任何移动的东西上运行变压器,我们有最适合你的计算机,它被称为 Jetson。用于训练AI的DGX,Jetson是自主处理器。


在中间,我们需要另一台计算机。而大型语言模型必须受益于你提供你的示例,然后进行强化学习人类反馈。


机器人的强化学习人类反馈是什么?这是强化学习物理反馈,这就是您对齐机器人的方式。这就是机器人如何知道,当它学习这些发音能力和操作能力时,它将正确地适应物理定律。


因此,我们需要一个模拟引擎,以数字方式代表机器人的世界,这样机器人就有一个健身房去学习如何成为机器人。我们称之为Omniverse虚拟世界。运行 Omniverse的计算机称为OVX;OVX(计算机本身)托管在 Azure 云中。


基本上我们构建了这三个东西,这三个系统之上,每一个系统都有算法。


AI与Omniverse将如何协同


现在,我将向您展示一个超级示例,说明AI和Omniverse将如何协同工作。我要给你们看的例子有点疯狂,但它离明天很近。


这是一座机器人大楼,这个机器人建筑被称为仓库。在机器人大楼内将有一些自主系统。一些自主系统将被称为人类,一些自主系统将被称为叉车。当然,这些自主系统将相互交互;这个仓库会忽略它,让每个人都远离伤害。该仓库本质上是一个空中交通管制员。每当它看到有事情发生时,它就会重定向交通,给机器人和人提供新的航点,只是新的航点,他们就会确切地知道该怎么做。


这个仓库,这栋楼,你也可以和它谈谈。例如,你今天感觉如何?你可以问仓库同样的问题。基本上,我刚才描述的系统将拥有Omniverse Cloud,它托管在DGX Cloud上运行的虚拟模拟和AI,所有这些都是实时运行的。让我们来看看。



重工业的未来始于数字孪生,帮助机器人、工人和基础设施在复杂的工业现实空间中应对不可预测的事件的AI代理将首先在复杂的数字孪生中构建和评估。


这个100,000平方英尺仓库的omniverse数字孪生作为一个模拟环境运行,它集成了数字工作者、运行Nvidia Isaac接收器堆栈的AMRS、使用Nvidia Metropolis的100个模拟天花板安装摄像头对整个仓库进行集中式活动地图模拟,以及使用Nvidia Kuo软件进行AMR路线规划,以及在这个物理精确的模拟环境中对AI代理进行循环测试,使我们能够评估和完善系统如何适应现实世界的不可预测性。



在这里,沿着这个AMR的计划路线发生了一起事故,在它移动以捡起托盘时挡住了它的路径。Nvidia Metropolis更新并发送实时占用地图,以选择计算新最佳路线的位置。AMR能够看到拐角处,并通过生成式AI驱动的Metropolis Vision Foundation模型提高其任务效率。操作员甚至可以使用自然语言提出问题,可视化模型了解细微的活动,并可以提供即时见解以改进运营。


所有传感器数据都是在仿真中创建的,并传递给作为Nvidia推理微服务或NEMS运行的实时AI。当AI准备好部署在物理孪生体(真正的仓库)中时,我们将Metropolis和Isaac Nimms连接到真正的传感器,从而能够持续改进数字孪生体和AI模型。


是不是太不可思议了。所以请记住,未来的设施、仓库、厂房将由软件定义,所以软件正在运行。


将Omniverse集成到工作流程中


您还将如何测试该软件?所以你测试软件来建造仓库,数字孪生中的优化系统。那么所有的机器人呢?你刚才看到的所有这些机器人,它们都在运行自己的自主机器人堆栈。


因此,未来集成软件的方式,未来机器人系统的CICD采用数字孪生,我们使Omniverse更易于访问。我们将创建Omniverse Cloud API、四个简单的API和一个通道,您可以将应用连接到它。


因此,在未来,这将像Omniverse一样美妙、美丽地简单;有了这些 API,您将拥有这些神奇的数字孪生功能。


我们还将Omniverse变成了一个AI,并将其与聊天USD的能力集成在一起。我们的语言是,人类和omnivers的语言,事实证明是通用的场景描述;所以这种语言是相当复杂的。


因此,我们教会了我们的Omniverse这种语言。所以你可以用英语和它说话,它会直接产生USD它会用USD回答,但用英语和你交流。你也可以在这个世界上寻找语义上的信息,而不是用语言对世界进行语义编码,现在它是在场景中进行语义编码。所以你可以问它某些物体、某些条件或某些场景,它可以去为你找到那个场景。它还可以在生成中与您协作。你可以用3D设计一些东西,它可以在3D中模拟一些东西,或者您可以使用AI在3D中生成一些东西。


让我们来看看这一切将如何运作。我们与西门子有着良好的合作关系。西门子是全球最大的工业工程和运营平台。你现在已经看到了工业领域中许多不同的公司。


重工业是其中最伟大的最终前沿之一,我们现在终于拥有了必要的技术来产生真正的影响。西门子正在构建工业元宇宙,今天,我们宣布西门子正在将其皇冠上的明珠加速器连接到Nvidia Omniverse。让我们来看看。



CMS技术每天都在为每个人带来变革。Tim Sendax是我们领先的产品生命周期管理软件,来自西门子加速器平台,我们的客户每天都在使用它来大规模开发和交付产品。


现在,我们通过集成NBD、AI和Omniverse技术,使真实世界和其他世界更加紧密。进入Team Center X Omniverse API,实现数据互操作性和基于物理的渲染,以实现工业规模设计和制造保理项目。我们的客户现代汽车是可持续船舶制造领域的市场领导者,生产氨和氢动力芯片,通常包括700万离散部件。借助Omniverse API,Team Center X让HD、Hyundai等公司交互式地统一和可视化这些被动工程数据集,并集成生成AI,生成3D对象或HRI背景,以在上下文中查看他们的项目。其结果是,基于超直观的逼真物理数字孪生消除了浪费和错误,从而节省了大量成本和时间。


我们正在构建这个功能,以便进行协作,无论是跨更多的西门子加速器工具,如西门子的any X或 Star CCM+,还是跨团队在同一场景中一起开发他们最喜欢的设备。而这仅仅是个开始,通过与Nvidia合作,我们将在西门子加速器产品组合中实现加速计算、生成式AI和Omniverse集成。


专业配音演员恰好是我的好朋友罗兰·布什,他恰好是西门子的CEO。


一旦您将Omniverse连接到您的工作流程,您的生态系统就会从设计开始到工程设计再到制造规划,一直到数字孪生运营。一旦你把所有东西连接在一起,您可以获得惊人的生产力。这真的很棒!


突然之间,每个人都在同一个地面实况上工作。您不必交换数据和转换数据,也不必犯错误。从设计部门到艺术部门、建筑部门,一直到工程部门,甚至营销部门,每个人都在同一个地面实况上工作。让我们来看看Nissan如何将 Omniverse集成到他们的工作流程中。这一切都是因为它被所有这些美妙的工具和我们正在合作的开发人员联系在一起。


那不是动画。这就是Omniverse。今天,我们宣布Omniverse Cloud将流式传输到Vision Pro,您在虚拟门周围走来走去是很奇怪的。当我从那辆车上下来时,每个人都这样做,这真的非常令人惊奇。Vision Pro连接到Omniverse将你带入Omniverse。由于所有这些CAD工具和所有这些不同的设计工具现在都已集成并连接到Omniverse,你可以拥有这种工作流程。真的令人难以置信。


下一代AV计算机Thor将被比亚迪使用


让我们谈谈机器人技术。所有移动的东西都将是机器人的,这是毫无疑问的;它更安全,更方便;最大的行业之一将是汽车。


正如我所提到的,我们从计算机系统开始,从上到下构建机器人堆栈。但就自动驾驶汽车而言,包括今年年底或明年年初的自动驾驶应用,我们将在梅赛德斯发货,然后在捷豹路虎之后不久发货。


因此,这些自主机器人系统是软件定义的。它们需要大量的工作来完成,具有计算机视觉,显然具有人工智能控制和规划,各种非常复杂的技术,需要数年时间来完善。


我们正在构建整个堆栈,但是,我们为所有汽车行业开放了整个堆栈。这正是我们在每个行业的工作方式。我们试图尽可能多地构建它,以便我们理解它,但后来我们开放它,以便每个人都可以访问它。


您是否只想购买我们的计算机,这是世界上唯一可以运行AI的全功能安全ACD系统。这款功能强大、安全的SLD质量计算机或操作系统,或者当然是我们的数据中心,基本上是世界上每个AV公司的数据中心。然而,您想享受它。我们对此感到高兴。


今天,我们宣布全球最大的电动汽车公司比亚迪正在采用我们的下一代产品。它叫雷神。Thor专为变压器发动机设计,我们的下一代AV计算机Thor将被比亚迪使用。


机器人计算机Jetson的新进展


你可能不知道我们有超过一百万的机器人开发人员。我们创造了Jetson,这台机器人计算机我们为此感到自豪。它上面的软件数量是疯狂的。


但是我们之所以能做到,是因为它是100%兼容的代码。我们在公司所做的一切都是为我们的开发人员服务的。通过我们能够维护这个丰富的生态系统,并使其与您从我们这里访问的所有内容兼容,我们可以将所有这些令人难以置信的功能带到我们称之为Jetson的小型计算机中,这是一台机器人计算机。我们今天也宣布了这个非常先进的新SDK。我们称之为Isaac Perceptor。


如今,大多数机器人都是预先编程的。他们要么遵循地面上的轨道,数字轨道,要么遵循四月的标签。但在未来,他们将具有感知力。你想要这样做的原因是你可以轻松地对其进行编程。你说,我想从A点到B点,它会想办法导航到那里。


因此,通过仅编程航路点,整个路线可以自适应,整个环境可以重新编程,就像我在仓库一开始向您展示的那样。你不能用预编程的AGV做到这一点。如果这些箱子掉下来,它们就会全部粘起来,然后等待有人清理它。


因此,现在有了Isaac Perceptor,我们拥有了令人难以置信的最先进的视觉里程计、3D重建,以及除了3D重建之外的深度感知。这样做的原因是,你可以有两种方式来关注世界上正在发生的事情。


Isaac Perceptor是当今最常用的机器人,是制造手臂的机械手,它们也是预先编程的。计算机视觉算法、AI算法、控制和路径规划算法具有几何感知能力,计算密集型程度令人难以置信。我们加速了这些CUDA;因此,我们拥有世界上第一个具有几何感知能力的CUDA加速运动规划器。你在它前面放了一些东西,它就会想出一个新的计划,并围绕它进行阐述。它对3D物体的姿态估计具有出色的感知能力。它不仅不是2D姿势,而且是3D姿势。


因此,它必须想象周围有什么以及如何最好地抓住它。因此,现在可以使用基础姿势、握把基础和关节算法。我们称之为Isaac Manipulator,它们也可以在VS计算机上运行。


我们开始在下一代机器人技术中做一些非常伟大的工作,很可能是人形机器人。我们现在拥有了必要的技术,正如我之前所描述的,在某种程度上想象广义人类机器人技术的必要技术,人类机器人技术可能更容易。


这样做的原因是因为我们有更多的模仿训练数据可以提供给机器人,因为我们是以非常相似的方式构建的。人类机器人技术很可能在我们的世界中更加有用,因为我们创造了这个世界,让我们可以在其中进行互操作并很好地工作。我们设置工作站、制造和物流的方式,都是为人类设计的;因此,当我们创建整个堆栈时,这些人为机器人的部署可能会更有效率,就像我们对其他人所做的那样。


从顶部开始,一个基础模型通过观看视频学习,人类发出人类的例子。它可以是视频形式,也可以是虚拟现实形式。然后,我们创建了一个名为Isaac Reinforce Learning Gym的健身房堡垒,它允许人形机器人学习如何适应物理世界。然后是一台令人难以置信的计算机,与将用于机器人汽车的计算机相同,这台计算机将在名为Thor的人类或机器人内部运行。它是为变压器引擎设计的。们将其中几个合并成一个视频。这是你会非常喜欢的东西。看看吧。


人类想象是不够的。我们必须发明、探索和超越已经完成的工作。


我们创造更智能、更快速。我们推动它失败,让它学习。我们教它,然后帮助它自学。我们拓宽它的理解,以绝对精确的方式应对新挑战并取得成功。我们让它感知、移动甚至推理,这样它就可以与我们分享我们的世界。


Nvidia Project Group

人形机器人通用基础模型


这就是灵感引领我们的地方,下一个前沿。这是Nvidia Project Group,一个用于人形机器人学习的通用基础模型。群体模型将多模态指令和过去的交互作为输入,并产生机器人执行的下一个动作。


我们开发了Isaac Lab,这是一款机器人学习应用程序,用于在Omniverse IsaacSIM上训练群体,我们还横向扩展了Osmo,这是一种新的计算编排服务,可协调DGX系统的训练工作流和OVX系统的模拟工作流。


借助这些工具,我们可以训练基于群体和物理的模拟,并将零镜头转移到现实世界中。群体模型将使机器人能够从少数人类演示中学习,因此它可以通过观察我们来帮助日常任务并模拟人类运动。


这得益于Nvidia的技术,这些技术可以从视频、训练模型和模拟中理解人类,并最终将它们直接部署到物理机器人上。将团队连接到大型语言模型甚至可以通过遵循自然语言教学团队来生成动作。


所有这些令人难以置信的智能都由专为Groot设计的新型Jetson Thor机器人芯片提供支持,该芯片与Isaac Lab、Osmo和Groot一起为未来打造,我们正在为下一代AI驱动的机器人提供构建模块。



08

五大关键词总结与展望


总结今天演讲的五大内容。


首先,新的工业革命。每个数据中心都应该加速发展。未来几年,价值1万亿美元的已安装数据中心将变得现代化。其次,由于我们带来的计算能力,出现了一种新的软件方式,即生成式AI,它将创建新的基础设施,专门用于做一件事,而且只做一件事,不是针对多用户数据中心,而是AI生成器。这些AI将创造极其有价值的软件,一场新的工业革命。


第二,这场革命的计算机,这一代的计算机,生成AI,万亿参数,Blackwell,大量的计算机和计算。


第三,新计算机创造了新的软件类型。新类型的软件应该以新的方式分发,以便一方面它可以成为云中的端点并且易于使用,但仍然允许您随身携带它,因为您的智能应该以一种允许您随身携带的方式打包。我们称它们为NIM。


第四、这些NIM将帮助您为未来创建一种新型应用程序,而不是您完全从头开始编写的应用程序,而是您将集成它们。像团队一样创建这些应用程序。我们在Nims、AI技术、工具、Nemo和基础设施、DGX云之间拥有出色的能力,可帮助您创建专有应用程序、专有聊天机器人。


最后,未来移动的一切都将是机器人。而这些机器人系统,无论是人形AMRS、自动驾驶汽车、叉车、操纵臂,都需要一样东西;巨型体育场、仓库、工厂,可以有机器人工厂,协调工厂,机器人生产线,制造机器人汽车,这些系统都需要一件事;他们需要一个平台,一个数字平台,一个数字孪生平台,我们称之为Omniverse,机器人世界的操作系统。


这就是我们今天谈到的五大内容。


当我们谈论GPU时,Nvidia是什么样子的?当人们问我关于GPU的问题时,我有一个非常不同的印象。


首先,我看到一堆软件堆栈和类似的东西。其次,我看到这就是我们今天向你们宣布的,这是Blackwell,这是平台。令人惊叹的处理器,MV链接交换机,网络系统——系统设计是一个奇迹。这是Blackwell,对我来说,这就是我心目中的GPU。


希望每个人都拥有一个很棒的GTC,谢谢大家的光临!


(以下为英文版演讲全文)


Jensen Huang:


Welcome to GTC. I hope you realize this is not a concert. You have arrived at a developers conference. There will be a lot of science described algorithms, computer architecture, mathematics.


I sensed a very heavy wait in the room all of a sudden, almost like you were in the wrong place. No, no conference in the world. Is there a greatest assembly of researchers from such diverse fields of science, from climate tech to radio sciences, trying to figure out how to use AI to robotically control mimos for next generation 6 G radios, robotic self driving cars, even artificial intelligence, even artificial intelligence, their brace? First, I noticed a sense of relief there all of a sudden. Also, this conference is represented by some amazing companies. This list, this is not the attendees, these are the presenters. And what's amazing is this, if you take away all of my friends, close friends, Michael Dell is sitting right there in the It industry.


All of the friends I grew up with in the industry, if you take away that list, this is what's amazing. These are the presenters of the non It industries using accelerated computing to solve problems that normal computers can't. It's rip represented in life sciences, healthcare, genomics, transportation, of course, retail, logistics, manufacturing, industrial. The gamut of industries represented is truly amazing. And you're not here to attend, only you're here to present. Talk about your research, $100 trillion of the world's industries is represented in this room today. This is absolutely amazing.


There is absolutely something happening. There is something going on. The industry is being transformed, not just hours because the computer industry, the computer is the single most important instrument of society today. Fundamental transformations in computing affects every industry, but how did we start?


How did we get here? I made a little cartoon for you. Literally, I drew this in one page. This is Nvidia's journey, started in 1993. This might be the rest of the talk, 1993, this is our journey. We were founded in 1993. There are several important events that happen along the way. I'll just highlight a few in 2006 kuda, which has turned out to have been a revolutionary computing model. We thought it was revolutionary then. It was going to be an overnight success. And almost 20 years later, it happened. We saw her coming. Two decades later.


In 2012, Alexnet AI and kuda made first contact in 2016, recognizing the importance of this computing model, we invented a brand new type of computer we call the dgx 1 1 170 teraflops. In this supercomputer, 8 Gpu's connected together for the very first time. I hand delivered the very first dgx 1 to a startup located in San Francisco called OpenAI.


Dgx 1 was the world's first AI supercomputer. Remember 170 teraflops 2017, the Transformer arrived 20 2022 ChatGPT captured the world's management imaginations have people realize the importance and the capabilities of artificial intelligence. And 2023 generative AI emerged and a new industry begins.


Why? Why is a new industry? Because the software never existed before. We are now producing software using computers to write software, producing software that never existed before. It is a brand new category, it took share from nothing, it's a brand new category, and the way you produce the software is unlike anything we've ever done before in data centers, generating tokens, producing floating point numbers at very large scale, as if in the beginning of this last industrial revolution when people realized that you would set up factories, apply energy to it, and this invisible valuable thing called electricity came out AC generators, and 100 years later, 200 years later, we are now creating new types of electrons, tokens using infrastructure, we call factories AI factories to generate this new incredibly valuable thing called artificial intelligence.


A new industry has emerged. Well, we're to talk about many things about this new industry. We're going to talk about how we're going to do computing next, we want to talk about the type of software that you build because of this new industry, the new software, how you would think about this new software, What about applications in this new industry? And then maybe what's next and how can we start preparing today for what is about to come next? Well, but before I start, I want to show you the soul of Nvidia, the soul of our company at the intersection of computer graphics, physics and artificial intelligence, all intersecting inside a computer in Omniverse, in a virtual world simulation. Everything we're going to show you today, literally everything we're going to show you today, is a simulation, not animation. It's only beautiful because it's physics. The world is beautiful, it's only amazing because it's being animated with robotics, it's being animated with artificial intelligence, what you're about to see all day, it's completely generated completely simulated and Omniverse and all of it, what you're about to enjoy is the world's first concert where everything is homemade.


Everything is homemade. You're about to watch some home videos, so sit back and enjoy yourself, God, I love Nvidia.


Accelerated computing has reached the tipping. General purpose computing has run out of steam. We need another way of doing computing so that we can continue to scale, so that we can continue to drive down the cost of computing so that we can continue to consume more and more computing while being sustainable. Accelerated computing is a dramatic speed up over general purpose computing. And in every single industry we engage, and I'll show you many, the impact is dramatic, but in no industry is it more important than our own.


The industry of using simulation tools to create products. In this industry, it is not about driving down the cost of computing, it's about driving up the scale of computing.


We would like to be able to simulate the entire product that we do completely in full fidelity, completely digitally, and essentially what we call digital twins. We would like to design it, build it, simulate it, operate it completely digitally. In order to do that, we need to accelerate an entire industry, and today I would like to announce that we have some partners who are joining us in this journey to accelerate their entire ecosystem so that we can bring the world into accelerated computing. But there's a bonus. When you become accelerated, your infrastructure is kuda Gpu's, and when that happens, it's exactly the same infrastructure for generative AI. And so I'm just delighted to announce several very important partnerships.


There are some of the most important companies in the world. Ansys does engineering simulation for what the world makes. We're partnering with them to COO, to accelerate the Ansys ecosystem, to connect Ansys to the Omniverse digital Twin Incredible. The thing that's really great is that the installed base of media GPU accelerated systems are all over the world, in every cloud, in every system, all over enterprises. And so the applications they accelerate will have a giant installed base to go serve. End users will have amazing applications. And of course, system makers and Csp's will have great customer demand.


Synopsis synopsis is invidious, literally first software partner, they were there in the very first day of our company synopsis revolutionized the chip industry with high level design, we are going to kuda accelerate synopsis, we're accelerating computational lithography, one of the most important applications that nobody's ever known about. In order to make chips, we have to push lithography to a limit. Nvidia has created a librarian domain specific library that accelerates computational lithography incredibly once we can accelerate and software define all of TSMC who is announcing today that they're going to go into production with Nvidia culliton. Once it's software defined and accelerated, the next step is to apply generative AI to the future of semiconductor manufacturing, pushing geometry even further.


Cadence builds the world's essential Eda and SDA tools. We also use cadence between these three companies.


Ansys synopsis and Cadence, we basically build Nvidia together. We are good to accelerating Cadence. They're also building a supercomputer out of Nvidia Gpu's so that their customers could do fluid dynamic simulation at 100, a thousand times scale, basically a wind tunnel in real time. Cadence Millennium, a supercomputer with Nvidia Gpu's inside a software company building supercomputers. I love seeing that building cadence Copilots together. Imagine a day when Cadence could synopsis Ansys tool providers would offer you AI co-pilots so that we have thousands and thousands of Copilot assistants helping us design chips design systems and we're also going to connect kaden's digital twin platform to Omniverse. As you can see the trend here, we're accelerating the world's CE Eda and SDA so that we could create our future in digital twins, and we're going to connect them all to Omniverse, the fundamental operating system for future digital twins, one of the industries that benefited tremendously from scale. And you know, you all know this one very well.


Large language models. Basically, after the transformer was invented, we were able to scale large language models at incredible rates, effectively doubling every six months. Now, how is it possible that by doubling every six months that we have grown the industry, we have grown the computational requirements so far? And the reason for that is quite simply this, If you double the size of the model, you double the size of your brain, you need twice as much information to go fill it. And so every time you double your parameter count, you also have to appropriately increase your training token count. The combination of those two numbers becomes the computation scale.


You have to support the latest, the state of the art OpenAI model is approximately 1.8 trillion parameters, 1.8 trillion parameters required several trillion tokens to go train. So a few trillion parameters on the order of a few trillion tokens on the order of when you multiply the two of them together, approximately 30, 40, 50 billion quadrillion floating point operations per second. Now we just have to do some Co math right now. Just hang with me. So you have 30 billion quadrillion, 1 quadrillion is like a pedda, And so if you had a Peta flop GPU, you would need 30 billion seconds to go compute, to go train that model. 30 billion seconds is approximately 1000 years, while 1000 years, it's worth it.


I'd like to do it sooner, but it's worth it. Which is usually my answer when most people tell me, hey, how long, how long is it going to take to do something? So we have 20 years. It's worth it. But can we do it next week? And so 1000 years, 1000 years.


So what we need, what we need, our bigger Gpus, we need much, much bigger Gpu's. We recognize this early on, and we realized that the answer is to put a whole bunch of Gpu's together and of course, innovate a whole bunch of things along the way, like inventing tensor cores, advancing Mv links so that we could create essentially virtually giant Gpu's and connecting them all together with amazing networks from a company called mellanox Infiniband so that we could create these giant systems. And so dgx 1 was our first version, but it wasn't the last we built. We build supercomputers all the way all along the way in 2021, we had Celine 4500 Gpu's or so. And then in 2023, we built one of the largest AI supercomputers in the world.


It's just come online eels. And as we're building these things, we're trying to help the world build these things and in order to help the world build these things, we got to build them first. We build the chips, the systems, the networking, all of the software necessary to do this. You should see these systems.


Imagine writing a piece of software that runs across the entire system, distributing the computation across thousands of Gpu's, but inside are thousands of smaller Gpu's, millions of Gpu's to distribute work, across all of that and to balance the workload so that you can get the most energy efficiency, the best computation time, keep your costs down. And so those, those fundamental innovations is what got us here.


And here we are as we see the miracle of ChatGPT emerge in front of us, we also realize we have a long ways to go, we need even larger models, we're going to train it with multi modality data, not just text on the internet, but we're going to train it on texts and images and graphs and charts. And just as we learn watching TV. And so there's going to be a whole bunch of watching video so that these models can be grounded in physics understands that an arm doesn't go through a wall. And so these models would have common sense by watching a lot of the world's video combined with a lot of the world's languages. It'll use things like synthetic data generation, just as you and I do when we try to learn, we might use our imagination to simulate how it's going to end up, just as I, when I was preparing for this keynote, I was simulating it all along the way. I hope it's going to turn out as well as I had into my head.


As I was simulating how this keynote was going to turn out, somebody did say that another performer did her performance completely on a treadmill so that she could be in shape to deliver it with full energy. I didn't do that. If I get a low wind and about 10 minutes into this, you know what happened. And so, so where were we, we're seen here using synthetic data generation.


We're going to use reinforcement learning. We're going to practice it in our mind, we're going to have AI working with AI training each other, just like student, teacher, debaters. All of that is going to increase the size of our model. It's going to increase the amount of data that we have, and we're going to have to build even bigger Gpu's. Hopper is fantastic, but we need bigger Gpus. And so ladies and gentlemen, I would like to introduce you to a very, very, very big GPU.


Named after David Blackwell, a mathematician, game theorists probability we thought it was a perfect name. Blackwell, ladies and gentlemen, enjoy this.


Yeah.


Blackwell is not a chip. Blackwell is the name of a platform. People think we make Gpus and we do, but Gpu's don't look the way they used to. Here's the, if you will, the heart of the Blackwell system. And this inside the company is not called Blackwell is just the number and I this, this is Blackwell sitting next to Oh, this is the most advanced GPU in the world in production today. This is Hopper, this is hopper. Hopper changed the world. This is Blackwell.


It's okay hopper.


You're very good. Good, good boy. What the girl? 208 billion transistors.


And so you could see, I can see that there's a small line between 2 dyes. This is the first time 2 dyes have a button like this together in such a way that the two dies think it's one chip. There's 10 TB of data between it, 10 TB per second. So that these two, these two sides of the Blackwell chip have no clue which side they're on. There's no memory locality issues, no cash issues. It's just one giant, giant chip. And so when we were told that Blackwell's ambitions were beyond the limits of physics, the engineer said, so what? And so this is what happened, and so this is the Blackwell chip.


And it goes into two types of systems. The first one, it's form fit function compatible to Hopper. And so you slide on Hopper and you push in Blackwall. That's the reason why one of the challenges of ramping is going to be so efficient. There are installations of hoppers all over the world and they could be, they could be, you know, the same infrastructure, same design, the power, the electricity, the thermals, the software, identical, push it right back. And so this is a hopper version for the current hgx configuration. And this is what the other, the second hopper looks like this. Now this is a prototype board and Janine, could I just borrow ladies and John and Janine Paul?


And so this is a fully functioning board. And I'll just be careful here. This right here is, I don't know, $10 billion. The second one's five. It gets cheaper after that. So any customers in the audience, it's okay. No, all right. But this is, this one's quite expensive.


This is the bring up board and the way it's going to go to production is like this one here, okay? And so you're going to take take this, it has 2 Blackwell die, 2 Blackwell chips and 4 Blackwell dyes connected to a Grace CPU. The Grace CPU has a super fast chip to chip link. What's amazing is this computer, first of its kind, where this much computation, first of all, fits into this small of a place. Second, it's memory coherent. They feel like they're just one big happy family working on one application location together, and so everything is coherent within it, just the amount of, you know, you saw the numbers, there's a lot of terabytes this and terabytes that's, but this is, this is a miracle.


This is a this. Let's see, what are some of the things on here? There's an Mv link on top PCI express on the bottom on on your which one is my and your left one of them it doesn't matter one of the one of them is a CPU chip to chip link is my left or you're depending on which side I was just I was trying to sort that out and I just kind of doesn't matter i'. Hopefully it comes plugged in so. Okay, so this is the Grace Blackwell system.


But there's more. So it turns out, it turns out all of the specs is fantastic, but we need a whole lot of new features in order to push the limits beyond, if you will, the limits of physics. We would like to always get a lot more X factors. And so one of the things that we did was we invented another transformer engine. Another transformer engine, the second generation, it has the ability to dynamically and automatically rescale and recast numerical formats to a lower precision.


Whenever you can remember, artificial intelligence is about probability. And so you kind of have, you know, 1.7, approximately 1.7 times approximately 1.4 to be approximately something else. Does that make sense? And so the ability for the mathematics to retain the precision and the range necessary in that particular stage of the pipeline, super important.


And so this is, it's not just about the fact that we designed a smaller Alu. It's not quite, the world's not quite that simple. You've got to figure out when you can use that across a computation that is thousands of Gpu's. It's running for weeks and weeks on weeks, and you want to make sure that the training job is going to converge.


And so this new transformer engine, we have a fifth generation NV link. It's now twice as fast as Hopper, but very importantly, it has computation in the network. And the reason for that is because when you have so many different Gpu's working together, we have to share our information with each other. We have to synchronize and update each other. And every so often we have to reduce the partial products and then rebroadcast out the partial products that some of the partial products back to everybody else. And so there's a lot of what is called all reduce and all to all and all gather.


It's all part of this area of synchronization and collectives so that we can have Gpu's working with each other, having extraordinarily fast links and being able to do mathematics right in the network allows us to essentially amplify even further.


So even though it's 1.8 TB per second, it's effectively higher than that. And so it's many times that of Hopper, the likelihood of a supercomputer running for weeks on end is approximately 0. And the reason for that is because there's so many components working at the same time. The statistic, the probability of them working continuously is very low. And so we need to make sure that whenever there is a well, we checkpoint and restart as often as we can. But if we have the ability to detect a weak chip or a weak note early, we can retire it and maybe swap in another processor.


That ability to keep the utilization of the supercomputer high, especially when you just spent $2 billion building it, is super important. And so we put in a Ras engine, a reliability engine that does 100% self test in system test of every single gate, every single bit of memory on the Blackwell chip and all the memory that's connected to it. It's almost as if we shipped with every single chip, its own advanced tester that we test our chips with. This is the first time we're doing this super excited about it secure AI.


Only this conference today, clap for Ras the secure AI. Obviously you've just spent hundreds of millions of dollars creating a very important AI and the code, the intelligence of that AI is encoded in the parameters. You want to make sure that on the one hand, you don't lose it, on the other hand, it doesn't get contaminated. And so we now have the ability to encrypt data, of course, at rest, but also in transit. And while it's being computed, it's all encrypted. And so we now have the ability to encrypt and transmission. And when we're computing it, it is in a trusted, trusted environment, trusted engine environment.


And the last thing is decompression, moving data in and out of these nodes when the compute is so fast becomes really essential. And so we've put in a high line speed compression engine, and it effectively moves data 20 times faster in and out of these computers. These computers are so powerful and they're such a large investment. The last thing we want to do is have them be idle, and so all of these capabilities are intended to keep Blackwell fed and as busy as possible.


Overall, compared to Hopper, it is 2.5 times 2.5 times the FPA 8 performance for training per chip. It also has this new format called FP 6, so that even though the computation speed is the same, the bandwidth that's amplified because of the memory, the amount of parameters you can store in the memory is now amplified. FP 4 effectively doubles the throughput. This is vitally important for inference.


One of the things that is becoming very clear is that whenever you use a computer with AI on the other side, when you're chatting with the chat bot, when you're asking it to review or make an image, remember in the back is a GPU generating tokens. Some people call it inference, but it's more appropriately generation the way that computing has done in the past was retrieval. You would grab your phone, you would touch something, some signals go off, basically an email goes off to some storage somewhere there's prerecorded content, somebody wrote a story, or somebody made an image, or somebody recorded a video that record prerecorded content is then streamed back to the phone and recomposed in a way based on a recommender system to present the information to you. You know that in the future, the vast majority of that content will not be retrieved, and the reason for that is because that was prerecorded by somebody who doesn't understand the context, which is the reason why we have to retrieve so much content. If you can be working with an AI that understands the context, who you are, for what reason you're fetching this information, and produces the information for you just the way you like it, the amount of energy we save, the amount of networking, bandwidth we save, the amount of waste of time we save will be tremendous. The future is generative, which is the reason why we call it generative AI, which is the reason why this is a brand new industry.


The way we compute is fundamentally different. We created a processor for the generative AI era, and one of the most important parts of it is content token generation. We call it this format is FP 4.


Well, that's a lot of computation, 5x the token generation, 5x the inference capability of Hopper seems like enough. But why stop there? The answer is, it's not enough. And I'm going to show you why. I'm going to show you what. And so we would like to have a bigger GPU, even bigger than this one.


And so we decided to scale it and notice, but first, let me just tell you how we've scaled over the course of the last eight years. We've increased computation by 1000 times 8 years, 1000 times. Remember back in the good old days of Moore's Law, it was 2x, well, 5x every, well, 10x every five years, that's the easiest, easiest math, 10x every five years, 100 times every 10 years, 100 times every 10 years in the middle, in the heydays of the PC revolution, 100 times every 10 years. In the last eight years, we've gone 1000 times. We have two more years to go.


And so that puts it in perspective.


The rate at which we're advancing computing is insane, and it's still not fast enough. So we built another chip. This chip, it's just an incredible chip, we call it the nvlink switch, it's 50 billion transistors, it's almost the size of hopper all by itself. This switch ship has four envy links in it, each 1.8 TB per second. And it has computation. And as I mentioned, what is this chip for? If we were to build such a chip, we can have every single GPU talk to every other GPU at full speed at the same time. That's insane.


It doesn't even make sense. But if you could do that, if you can find a way to do that and build a system to do that, that's cost effective, that's cost effective, how incredible would it be that we could have all these Gpu's connect over a coherent link so that they effectively are one giant GPU? Well, one of the great inventions in order to make it cost effective is that this chip has to drive copy directly. The certes of this chip is just a phenomenal invention, so that we could do direct drive to copper and as a result, you can build a system that looks like this.


Now this system, this system is kind of insane. This is one dgx, this is what a dgx looks like. Now remember, just six years ago, it was pretty heavy, heavy, but I was able to lift it. I delivered the first dgx 1 to OpenAI and the researchers there. It's on, you know, the pictures that are on the internet, and we all autographed it. And if you come to my office, it's autographed there. It's really beautiful, lifted at this dgx, this dgx that dgx, by the way, was 170 teraflops If you're not familiar with the numbering system, that's 0.17 petaflops.


So this is 720. The first 1 I delivered to OpenAI was 0.17. You could round it up to 0.2, won't make any difference, but and by then it was like, wow, you know, 30 more tariffs. And so this is now 720 petaflops, almost an exaflop for training. And the world's first one exaflops machine in one rack.


Just so you know, there are only a couple, 2, 3 exaflops machines on the planet as we speak. And so this is an exa flops AI system in one single rack. Well, let's take a look at the back of it. So this is what makes it possible. That's the back, that's the that's the back, the dgx Mv link spine 130 TB per second goes through the back of that chassis. That is more than the aggregate bandwidth of the internet.


So we could basically send everything to everybody within a second. And so we have 5000 cables, 5000 Mv link cables in total, two miles. Now, this is the amazing thing. If we had to use optics, we would have had to use transceivers and retainers, and those transceivers and retainers alone would have cost 20000 W, 2 kW of just transceivers alone, just to drive the nvlink spine. As a result, we did it completely for free over nvlink switch, and we were able to save the 20 kW for computational. This entire rack is 120 kW, so that 20 kW makes a huge difference.


It's liquid cooled, what goes in is 25 degrees C about room temperature. What comes out is 45 degrees C about your Jacuzzi. So room temperature goes in, Jacuzzi comes out 2 l per second.


We could sell a peripheral.


600000 parts. Somebody used to say, you know, you guys make Gpu's and we do, but this is what a GPU looks like to me. When somebody says GPU, I see this two years ago when I saw a GPU was the hgx, it was £70, 35 parts. Our Gpu's now are $600000 parts and £3000, £3000, £3000. That's kind of like the weight of a, you know, carbon fiber Ferrari. I don't know if that's useful metric.


Everybody's going. I feel it, I feel it, I get it. I get that. Now that you mentioned that, I feel it, I don't know what's £3000? Okay, so £3000 a ton and a half, so it's not quite an elephant. So this is what a dgx looks like. Now let's see what it looks like in operation.


Okay, let's imagine what is what, How do we put this to work and what does that mean? Well, if you were to train a GP team model, 1.8 trillion parameter model, it took about apparently about, you know, 3 to 5 months or so with 25000 A If we were to do it with Hopper, it would probably take something like 8000 Gpu's and it would consume 15 MW, 8000 Gpus.


On 15 MW, it would take 90 days, about three months. And that allows you to train something that is, you know, this groundbreaking AI model. And this is obviously not as expensive as anybody would think, but it's 8000 8000 Gpu's. It's still a lot of money. And so 8000 Gpu's, 15 MW. If you were to use Blackwell to do this, it would only take 2000 Gpus, 2000 Gpu's, same 90 days. But this is the amazing part, only 4 MW of power. So from 15, that's right.


And that's, and that's our goal. Our goal is to continuously drive down the cost and the energy. They're directly proportional to each other, cost and energy associated with the computing so that we can continue to expand and scale up the computation that we have to do to train the next generation models. Well, this is training, inference or generation is vitally important going forward.


You know, probably some half of the time that Nvidia Gpus are in the cloud these days, it's being used for token generation. You know, they're either doing Copilot this or, you know, ChatGPT that, or all these different models that are being used when you're interacting with it or generating images or generating videos, generating proteins, generating chemicals. There's a bunch of generation going on. All of that is in the category of computing we call inference. But inference is extremely hard for large language models because these large language models have several properties. One, they're very large, and so it doesn't fit on 1 GPU, imagine Excel doesn't fit on 1 GPU, you know, and imagine some application you're running on a daily base doesn't fit on one computer, like a video game doesn't fit on one computer, and most in fact do. And many times in the past in hyperscale computing, many applications for many people fit on the same computer, and now all of a sudden, there's one inference application where you're interacting with this chat chatbot, That chat bot requires a supercomputer in the back to run it. And that's the future.


The future is generative with these chat bots, and these chat bots are trillions of tokens, trillions of parameters, and they have to generate tokens at interactive rates.


Now, what does that mean? Oh, well, three tokens is about a word, you know, the, you know, space, the final frontier. These are the adventures that's like, that's like 80 tokens. Okay, I don't know if that's useful to you and so. She, the art of communications is selecting good analogies. Yeah, this is, this is not going well. Every side. I don't know what he's talking about. I've never seen Star Trek. And so and so here we are, we're trying to generate these tokens. When you're interacting with it, you're hoping that the tokens come back to you as quickly as possible and as quickly as you could read it. And so the ability for generation tokens is really important.


You have to paralyze the work of this model across many, many Gpu's so that you could achieve several things. One, on the one hand, you would like throughput because that throughput reduces the cost, the overall cost per token of generating. So your throughput dictates the cost of delivering the service. On the other hand, you have another interactive rate, which is another tokens per second, where it's about per user. And that has everything to do with quality of service. And so these two things compete against each other. And we have to find a way to distribute work across all of these different Gpu's and paralyze it in a way that allows us to achieve both.


And it turns out the search space is enormous. You know, I told you there's going to be math involved and everybody's going all dear. I heard some gasp just now when I put up that slide, you know, so this right here, the Y axes is tokens per second data center throughput, the X axis is tokens per second, interactivity of the person. And notice the upper right is the best. You want interactivity to be very high number of tokens per second per user. You want the tokens per second per data center to be very high. The upper right is terrific. However, it's very hard to do that.


And in order for us to search for the best answer across every single one of those intersections, xy coordinates, okay, so you just look at every single xy coordinate. All those blue dots came from some repartitioning of the software.


Some optimizing solution has to go and figure out whether to use tensor parallel, expert parallel, pipeline parallel, or data parallel, and distribute this enormous model across all these different Gpu's and sustain the performance that you need. This exploration space would be impossible if not for the programmability of Nvidia's Gpu's. And so we could, because of, because we have such a rich ecosystem, we could explore this universe and fine, that green roof line. It turns out that green roof line, notice you've got a TP two EPA DP 4, it means two tensor parallel, tensor parallel across 2. Gpu's expert parallels cross 8 data parallel cross 4 notice on the other end, you've got tensor parallel cross 4 and expert parallel cross 16. The configuration, the distribution of that software, it's a different, different runtime that would produce these different results. And you have to go discover that roof line.


Well, that's just one model. And this is just one configuration of a computer.


Imagine all of the models being created around the world and all the different configurations of systems that are going to be available. So now that you understand the basics, let's take a look at inference of Blackwell compared to Hopper. And this is, this is the extraordinary thing in one generation, because we created a system that's designed for trillion parameter generative AI. The inference capability of Blackwell is off the charts. And in fact, it is some 30 times Hopper.


For large language models, For large language models like ChatGPT and others like it, the blue line is hopper I gave you.


Imagine we didn't change the architecture of Hopper and we just made it a bigger chip. We just use the latest, you know, greatest 10 terrible, you know, terabytes per second. We connected the two chips together. We got this giant 208 billion parameter chip. How would we have performed if nothing else changed? And it turns out quite wonderfully, quite wonderfully, and that's the purple line, but not as great as it could be.


And that's where the FP 4 Tensor Core, the new Transformer engine, and very importantly, the Envy link switch. And the reason for that is because all these Gpu's have to share the results partial products, whenever they do, all to all gather, whenever they communicate with each other. That NV link switch is communicating almost 10 times faster than what we could do in the past using the fastest networks.


Okay, so Blackwell is going to be just an amazing system for a generative AI. And in the future, in the future, data centers are going to be thought of, as I mentioned earlier, as an AI factory. An AI factory's goal in life is to generate revenues, generate in this case, intelligence in this facility, not generating electricity as an AC generators, but of the last industrial revolution and this industrial revolution, the generation of intelligence. And so this ability is super, super, super important.


The excitement of Blackwell is really off the charts. You know, when we first, when we first, you know, this is a year and a half ago, two years ago, I guess two years ago when we first started to go to market with Hopper, you know, we had the benefit of 2. Csp's joined us in a lunch and we were delighted. And so we had two customers. We have more now.


Unbelievable excitement for Blackwell. Unbelievable excitement. And there's a whole bunch of different configurations. Of course, I showed you the configurations that slide into the hopper form factor, so that's easy to upgrade. I showed you examples that are liquid cooled, that are the extreme versions of it, One entire rack that's connected by Mv link 6 72. We're going to Blackwell is going to be ramping to the world's AI companies, of which there are so many now doing amazing work in different modalities. The Csp's, every CSP is geared up all the Oems and odms regional clouds, sovereign AIS and telcos all over the world are signing up to launch with Blackwell this.


Blackwell would be the most successful product launch in our history. And so I can't wait, wait to see that. I want to thank, I want to thank some partners that are joining us in this.


AWS is gearing up for Blackwell. They're they're going to build the 1st GPU with secure AI. They're building out a 222 exaflops system. You know, just now when we animate just now the digital twin, if you saw all of those clusters are coming down, by the way, that is not just art. That is a digital twin of what we're building. That's how big it's going to be. Besides infrastructure, we're doing a lot of things together with AWS. We're kuda accelerating SageMaker AI, we're kuda accelerating Bedrock AI. Amazon Robotics is working with us using Nvidia Omniverse and Isaac Sim. AWS Health has Nvidia health integrated into it. AWS has really leaned into accelerated computing.


Google is gearing up for Blackwell, GCP already has a 100 S H-100 S T 4 SL 4 S, a whole fleet of Nvidia Kuta Gpu's, and they recently announced a Gemma model that runs across all of it. We're working to optimize and accelerate every aspect of GCP. We're accelerating data proc, which data processing their data processing engine Jax Xla tech AI and mujo for robotics. So we're working with Google and GCP across a whole bunch of initiatives.


Oracle is gearing up for Blackwell. Oracle is a great partner of ours for Nvidia dgx Cloud. And we're also working together to accelerate something that's really important to a lot of companies. Oracle Database, Microsoft is accelerating and Microsoft is gearing up for Blackwell. Microsoft Nvidia has a wide ranging partnership where accelerating could accelerating all kinds of services when you when you chat obviously and AI services that are in Microsoft Azure, it's very, very likely Nvidia's in the back doing the inference and the token generation we built, they built the largest Nvidia Infiniband supercomputer, basically a digital twin of ours or a physical twin of ours. We're bringing the Nvidia ecosystem to Azure Nvidia digs cloud to Azure. Nvidia Omniverse is now hosted in Azure. Nvidia healthcare is an Azure and all of it is deeply integrated and deeply connected with Microsoft Fabric.


The whole industry is gearing up for Blackwell. This is what I'm about to show you. Most of the scenes that you've seen so far of Blackwell are the full fidelity design of Blackwell.


Everything in our company has a digital twin, and in fact, this digital twin idea is really spreading and it helps companies build very complicated things perfectly the first time. And what could be more exciting than creating a digital twin to build a computer that was built in a digital twin? And so let me show you what wistron is doing. To meet the demand for Nvidia accelerated computing, wiston one of our leading manufacturing partners is building digital twins of Nvidia dgx and hgx factories using custom software developed with Omniverse Sdks and Apis for their newest factory westron started with the digital twin to virtually integrate their multi-cad and process simulation data into a unified view, testing and optimizing layouts in this physically accurate digital environment increased worker efficiency by 51%. During construction, the Omniverse Digital twin was used to verify that the physical build matched the digital plans, identifying any discrepancies early has helped avoid costly change orders, and the results have been impressive using a digital twin helped bring wistrand factory online in half the time, just 2.5 months instead of 5 in operation, the Omniverse Digital Twin helps westron rapidly test new layouts to accommodate new processes or improve operations in the existing space and monitor real time operations using live IoT data from every machine on the production line line, which ultimately enabled wistron to reduce end to end cycle times by 50% and defect rates by 40% with Nvidia AI and Omniverse Nvidia's global ecosystem of partners are building a new era of accelerated AI enabled digitalization.


That's how we are. That's the way it's going to be in the future when I'm manufacturing everything digitally first, and then we'll manufacture it physically. People ask me, how did it start? What got you guys so excited? What was it that you saw that caused you to put it all in on this incredible idea? And it's this. Hang on a second.


Guys, that was going to be such a moment. That's what happens when you don't rehearse.


This, as you know, was first contact 2012 Alexnet. You put a cat into this computer and it comes out and it says cat. And we said, oh, my God, this is going to change everything.


You take 1 million numbers, you take 1 million numbers across three channels, RGB, these numbers make no sense to anybody. You put it into this software and it compress it dimensionally, reduce it, it reduces it from a million dimensions, 1 million dimensions, it turns it into three letters, one vector, 1 number. And it's generalized.


You could have the cat be different cats and you could have it be the front of the cat and the back of the cat. And you look at this thing. Is it unbelievable? You mean any cats? Yeah, any cat. And it was able to recognize all these cats. And we realized how it did it systematically, Structurally, it's scalable. How big can you make it? Well, how big do you want to make it? And so we imagine that this is a completely new way of writing software. And now today, as you know, you can have you type in the word cat. And what comes out is a cat. It went the other way, am I right?


Unbelievable, how is it possible? That's right, how is it possible? You took three letters and you generated a million pixels from it and it made sense?


Well, that's the miracle and here we are, just literally 10 years later, 10 years later, where we recognize texts, we recognize images, we recognize videos and sounds and images. Not only do we recognize them, we understand their meaning, we understand the meaning of the text. That's the reason why I can chat with you, it can summarize for you, it understands the text, it understood, not just recognizes the English, it understood the English, it doesn't just recognize the pixels, it understood the pixels. And you can, you can even condition it between two modalities. You can have language conditioned image and generate all kinds of interesting things. Well, if you can understand these things, what else can you understand that you've digitized the reason why we started with texts and you know images is because we digitized those, but what else have we digitized? Well, it turns out we digitized a lot of things, proteins and genes and brainwaves, anything you can digitize, so long as there's structure, we can probably learn some patterns from it, and if we can learn the patterns from it, we can understand its meaning, if we can understand its meaning, we might be able to generate it as well. And so therefore, the generative AI revolution is here, well, what else can we generate, what else can we learn?


Well, one of the things that we would love to learn, we would love to learn is we would love to learn climate, we would love to learn, extreme weather, we would love to learn what, how we can predict future, future weather, weather at regional scales at sufficiently high resolution such that we can keep people out of harm's way before harm comes.


Extreme weather. Weather cost the world $150 billion, surely more than that. It's not evenly distributed, $150 billion is concentrated in some parts of the world. And of course, to some people of the world. We need to adapt and we need to know what's coming. And so we are creating Earth Ii, a digital twin of the Earth, for predicting weather. And we've made an extraordinary invention called the ability to use generative AI to predict weather at extremely high resolution. Let's take a look.


As the Earth's climate changes AI powered weather forecasting is allowing us to more accurately predict and track severe storms like super typhoon chanthu, which caused widespread damage in Taiwan and the surrounding region in 2021. Current AI forecast models can accurately predict the track of storms, but they are limited to 25 kilometre resolution, which can miss important details.


Nvidia's Cardiff is a revolutionary new generative AI model trained on high resolution radar, assimilated wharf weather forecasts and Era 5 reanalysis data using Cordis. Extreme events like chantu can be super resolved from 25 kilometre to 2 kilometre resolution with 1000 times the speed and 3000 times the energy efficiency of conventional weather models.


By combining the speed and accuracy of Nvidia's weather forecasting model, forecast net, and generative AI models like Cordy, we can explore hundreds or even thousands of kilometre scale regional weather forecasts to provide a clear picture of the best, worst and most likely impacts of a storm. This wealth of information can help minimise loss of life and property damage. Today cordifer is optimised for Taiwan, but soon generative super sampling will be available as part of the Nvidia Earth Ii inference service for many regions across the down.


The weather company is to trust the source of global weather prediction. We are working together to accelerate their weather simulation, first principled base of simulation. However, they're also going to integrate Earth to Cork so that they could help businesses and countries do regional high resolution weather prediction. And so if you have some weather prediction you'd like to know you'd like to do, reach out to the weather company.


Really exciting, really exciting work. Nvidia Healthcare, something we started 15 years ago. We're super excited about this. This is an area where we're very, very proud, whether it's medical imaging or gene sequencing or computational chemistry, it is very likely that Nvidia is the computation behind it. We've done so much work in this area.


Today, we're announcing that we're going to do something really, really cool. Imagine all of these AI models that are being used to generate images and audio, but instead of images and audio, because it understood images and audio, all the digitization that we've done for genes and proteins and amino acids, that digitization capability is now passed through machine learning so that we understand the language of life, the ability to understand the language of life. Of course, we saw the first evidence of it with alpha fold. This is really quite an extraordinary thing after decades of painstaking work, the world had only digitized and reconstructed using chiro electron microscopy or X-ray crystallography. These different techniques painstakingly reconstructed the protein, 200000 of them, in just, what is it, less than a year or so, alpha fold has reconstructed 200 million proteins, Basically every protein of every living thing that's ever been sequenced. This is completely revolutionary.


Well, those models are incredibly hard to use for, incredibly hard for people to build. And so what we're going to do is we're going to build them. We're going to build them for, the researchers around the world. And it won't be the only one. There will be many other models that we create. And so let me show you what we're going to do with it.


Virtual screening for new medicines is a computationally intractable problem. Existing techniques can only scan billions of compounds and require days on thousands of standard compute nodes to identify new drug candidates. Nvidia Biome Nims enable a new generative screening paradigm using Nims for protein structure prediction with alpha fold molecule generation with mole Mim, and docking with diff DOC, we can now generate and screen candidate molecules in a matter of minutes, malim can connect to custom applications to steer the generative process iteratively optimising for desired properties. These applications can be defined with bione mo microservices or built from scratch. Here, a physics based simulation optimises for a molecule's ability to bind to a target protein while optimising for other favourable molecular properties. In parallel, mulm generates high quality drug like molecules that bind to the target and are synthesizable, translating to a higher probability of developing successful medicines. Faster bione Mo is enabling a new paradigm in drug discovery, with Nims providing on demand microservices that can be combined to build powerful drug discovery workflows like de novo protein design or guided molecule generation. For virtual screening, bione Mo Nims are helping researchers and developers reinvent computational drug design.


I nvidia momen molem core diff, there's a whole bunch of other models, whole bunch of other models, computer vision models, robotics models, and even, of course, some really, really, really terrific open source language models. These models are groundbreaking. However, it's hard for companies to use. How would you use it? How would you bring it into your company and integrate it into your workflow? How would you package it up and run it? Remember earlier I just said that inference is an extraordinary computation problem. How would you do the optimization for each and every one of these models and put together the computing stack necessary to run that supercomputer so that you can run these models in your company?


And so we have a great idea.


We're going to invent a new way and invent a new way for you to receive and operate software. This software comes basically in a digital box. We call it a container, and we call it the Nvidia inference microservice, a Nim, and let me explain to you what it is, and now it's a pre trained model, so it's pretty clever, and it is packaged and optimized to run across Nvidia's installed base, which is very, very large. What's inside it is incredible. You have all these pretrained state of the art open source models, they could be open source, they could be from one of our partners, it could be created by us, like Nvidia moment, it is packaged up with all of its dependencies, so kuda the right version, q DN, the right version, tensor RT LM distributing across the multiple Gpu's, try an inference server, all completely packaged together. It's optimized depending on whether you have a single GPU, multi GPU or multi node of Gpu's, it's optimized for that and it's connected up with Apis that are simple to use.


Now this, think about what an AI API is. An AI API is an interface that you just talked to. And so this is a piece of software in the future that has a really simple API, and that API is called human, and these packages incredible bodies of software will be optimized and packaged, and we'll put it on a website and you can download it, you could take it with you could run it in any cloud, you could run it in your own data center, you can run in workstations of it fit, and all you have to do is come to AI dot Nvidia dot com, we call it Nvidia inference microservice, but inside the company, we all call it nims chan.


Just imagine, you know, one of some day there's going to be one of these chat bytes. And these chat bytes is going to just be in a Nim. And you'll assemble a whole bunch of chat bots. And that's the way software is going to be built someday.


How do we build software in the future? It is unlikely that you'll write it from scratch or write a whole bunch of Python code or anything like that. It is very likely that you assemble a team of Ai's. There's probably going to be a super AI that you use that takes the mission that you give it and breaks it down into an execution plan. Some of that execution plan could be handed off to another Nim, that Nim would maybe understand.


Sap, the language of Sap is Abap. It might understand ServiceNow and go retrieve some information from their platforms. It might then hand that result to another Nim who goes off and does some calculation on it. Maybe it's an optimization software, a combinatorial optimization algorithm, maybe it's just some basic calculator. Maybe it's pandas to do some numerical analysis on it. And then it comes back with its answer, and it gets combined with everybody else's. And because it's been presented with, this is what the right answer should look like, it knows what right answer is to produce, and it presents it to you.


We can get a report every single day at, you know, top of the hour that has something to do with a build plan or some forecast or some customer alert or some bugs database or whatever it happens to be. And we could assemble it using all these Nims. And because these Nims have been packaged up in ready to work on your systems so long as you have video Gpus in your data center in the cloud, this, this Nims will work together as a team and do amazing things. And so we decided this is such a great idea. We're going to go do that. And so Nvidia has Nims running all over the company.


We have chat bots being created all over the place, and one of the most important chat bots, of course, is a chip designer chat bot. You might not be surprised. We care a lot about building chips, and so we want to build chatbots AI Copilots that are co designers with our engineers. And so this is the way we did it. So we got ourselves a Lama 2, this is a 70 B, and it's, you know, packaged up in a Nim.


And we asked it, you know, what is a CTL? Well, it turns out CTL is an internal program and it has an internal proprietary language, but it thought the CTL was a combinatorial timing logic. And so it describes, you know, conventional knowledge of CTL, but that's not very useful to us. And so we gave it a whole bunch of new examples. You know, this is no different than employee onboarding an employee. And we say, you know, thanks for that answer. It's completely wrong. And then we present to them, this is what a CTL is, okay? And so this is what a CTL is at Nvidia. And the CTL, as you can see, you know CTL stands for Compute Trace Library, which makes sense.


You know, we were tracing compute cycles all the time and it wrote the program. Is that amazing?


And so the productivity of our chip designers can go up. This is what you can do with a Nim.


First thing you can do with it's customize it. We have a service called Nemo Microservice that helps you curate the data, preparing the data so that you could teach this onboard this AI, you fine tune them, and then you guardrail it. You can even evaluate the answer, evaluate its performance against other examples. And so that's called the Nemo microservice.


Now the thing that's that's emerging here is this, there are three elements, 3 pillars of what we're doing. The first pillar is of course, inventing the technology for AI models and running AI models and packaging it up for you. The second is to create tools to help you modify it. First is having the AI technology, second is to help you modify it, and third is infrastructure for you to find, tune it, and if you like deploy it, you could deploy it on our infrastructure called dgx cloud, or you can deploy it on prem, you could deploy it anywhere you like. Once you develop it, it's yours to take anywhere. And so we are effectively an AI foundry we will do for you and the industry on AI what TSMC does for us building chips and so we go to it with our go to TSMC with our big ideas, they manufacture it and we take it with us. And so exactly the same thing here. AI Foundry and the three pillars are the Nims Nemo microservice and dgx cloud.


The other thing that you could teach the Nim to do is to understand your proprietary information. Remember, inside our company, the vast majority of our data is not in the cloud, it's inside our company, it's been sitting there, you know, being used all the time and gosh, it's basically Nvidia's intelligence. We would like to take that data, learn its meaning, like we learned the meaning of almost anything else that we just talked about, learn its meaning, and then reindex that knowledge into a new type of database called the vector database. And so you essentially take structured data or unstructured data, you learn its meaning, you encode its meaning. So now this becomes an AI database. And that AI database in the future, once you create it, you can talk to it.


And so let me give you an example of what you could do. So suppose you create, you've got a whole bunch of multi modality data, and one good example of that is PDF. So you take the PDF, you take all of your Pdf's to all your favorite, you know, the stuff that is proprietary to you, critical to your company. You can encode it just as we encoded pixels of a cat, and it becomes the word cat. We can encode all of your PDF and it turns into vector that are now stored inside your vector database. It becomes the proprietary information of your company. And once you have that proprietary information, you could check to it.


It's a smart database. And so you just chat with data. And how much more enjoyable is that?


You know, for our software team, you know, they just chat with the bugs database, you know, how many bugs was there last night? Are we making any progress? And then after you're done talking to this bugs database, you need therapy. And so we have another chat bot for you. You can do it.


Okay, so we call this Nemo retriever. And the reason for that is because ultimately its job is to go retrieve information as quickly as possible. And you just talk to it, hey, retrieve this information. It goes, oh, it brings it back to you. Is it, do you mean this? You go, yeah, perfect. Okay, and so we call it the Nemo Retriever. Well, the Nemo service helps you create all these things.


And we have all these different Nims. We even have Nims of digital humans.


I'm Rachel, your AI care manager, manager. Okay, so it's a really short clip, but there were so many videos to show you, I guess, so many other demos to show you, And so I had to cut this one short. But this is Diana. She is a digital human Nim, and you just talked to her, and she's connected in this case to Hippocratic Ai's large language model for healthcare, and it's truly amazing. She is just super smart about healthcare things, you know? And so after my Dwight, my VP of software engineering, talks to the chatbot for Bugs database, then you come over here and talk to Diane. And so Diane is completely animated with AI, and she's a digital human.


There's so many companies that would like to build. They're sitting on gold mines. The enterprise It industry is sitting on a gold mine. It's a gold mine because they have so much understanding of the way work is done. They have all these amazing tools that have been created over the years, and they're sitting on a lot of data if they could take that gold mine and turn them into Copilots, these Copilots could help us do things. And so just about every It franchise It platform in the world that has valuable tools that people use is sitting on a gold mine for Copilots. And they would like to build their own Copilots and their own chat bots. And so we're announcing that Nvidia AI foundry is working with some of the world's great companies.


Sap generates 87% of the world's global commerce. Basically, the world runs on Sap. We run on an Sap.


Nvidia and Sap are building Sap Juul co-pilots using Nvidia Nemo and dgx Cloud ServiceNow, they run 85% of the world's Fortune 500 company, run their people and customer service operations on ServiceNow, and they're using Nvidia AI Foundry to build ServiceNow assist virtual assistance cohesity backs up the world's data. They're sitting on a gold mine of data, hundreds of exabytes of data, over 10000 companies. Nvidia AI Foundry is working with them, helping them build their Gaia generative AI agent Snowflake is a company that stores the world's digital warehouse in the cloud and serves over 3 billion queries a day for 10000 enterprise customers. Snowflake is working with Nvidia AI Foundry to build Copilots with Nvidia Nemo and Nims NetApp. Nearly half of the files in the world are stored on Prem on NetApp, and Video AI Foundry is helping them build chatbots and Copilots like those vector databases and retrievers with Nvidia Nemo and Nims, and we have a great partnership with Dell, Everybody who, everybody who is building these chat bots and generative AI, when you're ready to run it, you're going to need an AI factory, and nobody is better at building end to end systems of very large scale for the enterprise than Dell. And so anybody, any company, every company will need to build AI factories.


And it turns out that Michael is here. He's happy to take your order.


Ladies and gentlemen, Michael Tel. Okay, let's talk about the next wave of robotics, the next wave of AI, robotics, physical AI. So far, all of the AI that we've talked about is one computer data comes into one computer in lots of the worlds, if you will experience in digital text form, the AI imitates us by reading a lot of the language to predict the next words. It's imitating you by studying all of the patterns and all the other previous examples. Of course, it has to understand context and so on and so forth, but once it understands the context is essentially imitating you, we take all of the data, we put it into a system like dgx, we compress it into a large language model, trillions and trillions of parameters become billions and billions, trillions of tokens becomes billions of parameters, these billions of parameters becomes your AI, well, in order for us to go to the next wave of AI where the AI understands the physical world, we're going to need three computers, the first computer is still the same computer, it's the AI computer that now it's going to be watching video, and maybe it's doing synthetic data generation, and maybe there's a lot, lot of human examples, just as we have human examples in text form, we're going to have human examples in articulation form, and the AIS will watch us understand what is happening and try to adapt it for themselves into the context, and because it can generalize with these foundation models, maybe these robots can also perform in the physical world a fairly generally. So I just described in very simple terms, essentially what just happened in large language models, except the ChatGPT moment for robotics may be right around the corner. And so we've been building the end to end systems for robotics for some time. I'm super, super, super proud of the work.


We have the AI system, dgx we have the lower system, which is called agx for autonomous systems, the world's first robotics processor. When we first built this thing, people are, what are you guys building? It's a SOC, it's 1 chip, it's designed to be very low power, but it's designed for high speed sensor processing and AI. And so if you want to run transformers in a car or you want to run Transformers in a, you know, anything that moves, we have the perfect computer for you. It's called the Jetson. And so the dgx on top are training the AI, the Jetson is the autonomous processor, and in the middle we need another computer, whereas large language models have to benefit of you providing your examples and then doing reinforcement learning, human feedback.


What is the reinforcement learning, human feedback of a robot? Well, it's reinforcement learning, physical feedback, that's how you align the robot, that's how the robot knows that as it's learning these articulation capabilities and manipulation capabilities, it's going to adapt properly into the laws of physics. And so we need a simulation engine that represents the world digitally for the robot so that the robot has a gym to go learn how to be a robot. We call that virtual world Omniverse, and the computer that runs Omniverse is called OFX and OVC, the computer itself is hosted in the Azure Cloud, okay? And so basically we built these three things, these three systems on top of it. We have algorithms for every single one.


Now I'm going to show you one super example of how AI and Omniverse are going to work together.


The example I'm going to show you is kind of insane, but it's going to be very, very close to tomorrow. It's a robotics building. This robotics building is called a warehouse. Inside the robotics building are going to be some autonomous systems. Some of the autonomous systems are going to be called humans, and some of the autonomous systems are going to be called forklifts. And these autonomous systems, we're going going to interact with each other, of course, autonomously, and it's going to be overlooked upon by this warehouse to keep everybody out of harm's way. The warehouse is essentially an air traffic controller. And whenever it sees something happening, it will redirect traffic and give new waypoints, just new waypoints to the robots and the people, and they'll know exactly what to do.


This warehouse, this building you can also talk to, of course, you could talk to it, hey, you know, sap center, how are you feeling today, for example? And so you could ask the same the warehouse, the same questions. Basically the system I just described will have Omniverse Cloud that's hosting the virtual simulation and AI running on dgx cloud, and all of this is running in real time.


Let's take a look. The future of Heavy Industries starts as a digital twin. The AI agents helping robots, workers, and infrastructure navigate unpredictable events in complex industrial spaces will be built and evaluated first in sophisticated digital twins.


This Omniverse digital twin of a 100000 square foot warehouse is operating as a simulation environment that integrates digital workers amrs running the Nvidia Isaac receptor stack. Centralized activity maps of the entire warehouse from 100 simulated ceiling mount cameras using Nvidia Metropolis and Amr route planning with Nvidia coups software in loop testing of AI agents in this physically accurate simulated environment enables us to evaluate and refine how the system adapts to real world unpredictability. Here, an incident occurs along this amr's planned route, blocking its path as it moves to pick up a pallet. Nvidia Metropolis updates and sends a real time occupancy map to kuop, where a new optimal route is calculated. The Amr is enabled to see around corners and improve its mission efficiency with generative AI powered Metropolis Vision Foundation models. Operators can even ask questions using natural language. The visual model understands nuanced activity and can offer immediate insights to improve operations.


All of the sensor data is created in simulation and passed to the real time AI running as Nvidia inference, microservices, or nyms. And when the AI is ready to be deployed in the physical twin, the real warehouse, we connect Metropolis and Isaac Nims to real sensors with the ability for continuous improvement of both the digital twin and the AI models.


Is that incredible? And so? Remember, remember, a future facility, warehouse, factory building will be software defined. And so the software is running. How else would you test the software? So you test the software to building the warehouse, the optimization system in the digital twin, what about all the robots, all of those robots you are seeing just now, they're all running their own autonomous robotic stack. And so the way you integrate software in the future, CICD in the future for robotic systems is with digital twins.


We've made Omniverse a lot easier to access. We're going to create basically Omniverse Cloud Apis for simple API in a channel, and you can connect your application to it. So this is, this is going to be as wonderfully, beautifully simple in the future that Omniverse is going to be. And with these Apis, you're going to have these magical digital twin capability.


We also have turned Omniverse into an AI and integrated it with the ability to chat USD. The language of our language is, you know, human, and Omniverse is language, as it turns out, is universal scene description. And so that language is a rather complex. And so we've taught our Omniverse that language. And so you can speak to it in English, and it would directly generate USD and it would talk back in USD, but converse back to you in English.


You could also look for information in this world semantically instead of the world being encoded semantically in language, now it's encoded semantically in scenes. And so you could ask it of certain objects or certain conditions or certain scenarios, and it can go and find that scenario for you. It also can collaborate with you in generation. You could design some things in 3D, it could simulate some things in 3D, or you could use AI to generate something in 3D. Let's take a look at how this is all going to work.


We have a great partnership with Siemens. Siemens is the world's largest industrial engineering and operations platform. You've seen now so many different companies in the industrial space, heavy industries is one of the greatest final frontiers of it, and we finally now have necessary technology to go and make a real impact. Siemens is building the industrial metaverse, and today we're announcing that Siemens is connecting their crown jewel accelerator, 2 Nvidia Omniverse. Let's take a look.


Seamus technology is transformed every day for everyone. Team Center X, our leading product lifecycle management software from the Siemens Accelerator platform, is used every day by our customers to develop and deliver products and scale.


Now we are bringing the real and digital worlds even closer by integrating Nvidia AI and Omniverse technologies into team centers. Omnibus Apis enabled data interoperability, and physics based rendering to industrial scale design and manufacturing projects. Our customers HD und market leader in sustainable ship manufacturing, builds ammonia and hydrogen power chips, often comprising over 7 million discrete parts. Omniverse Apis teams and the companies like HD Hongdae unify and visualize these massive engineering data sets interactively and integrate general AI to generate 3D objects or Hdri backgrounds to see their projects in context. The result? An ultra intuitive, photoreal physics based digital twin that eliminates waist and arrows, delivering huge savings in cost and time. And we are building this for collaboration, whether across more Siemens accelerator tools like Siemens Annex or Star CCM Plus, or across teams working on their favorite devices in the same scene together.


And this is just the beginning. Working with ividia, we will bring accelerated computing, generative AI, and omnivore integration across the Siemens Accelerator port 4D. Let's.


The professional voice actor happens to be a good friend of mine, Roland Bush, who happens to be the CEO of Siemens.


Once you get Omniverse connected into your workflow, your ecosystem, from the beginning of your design to engineering to manufacturing planning all the way to digital twin operations, once you connect everything together, it's insane how much productivity you can get. And it's just really, really wonderful. All of a sudden, everybody's operating on the same ground. Truth, you don't have to exchange data and convert data, make mistakes. Everybody is working on the same ground truth from the design department to the art department, the architecture department, all the way to the engineering and even the marketing department. Let's take a look at how Nissan has integrated Omniverse into their workflow, and it's all because it's connected by all these wonderful tools and these developers that we're working with. Take a look.


Mei Zhong yan zhe dong shi jiuan chean FA chun du te de Meili. I'm.


I'm.


That was not an animation. That was Omniverse. Today we're announcing that Omniverse Cloud streams to the Vision Pro and.


It is very, very, very strange that you walk around virtual doors. When I was getting out of that car and everybody does it. It is really, really quite amazing. Vision Pro connected to Omniverse portals you into Omniverse, and because all of these CAD tools and all these different design tools are now integrated and connected to Omniverse, you can have this type of workflow. Really incredible.


Let's talk about robotics. Everything that moves will be robotic, there's no question about that. It's safer, it's more convenient, and one of the largest industries is going to be a automotive. We build the robotic stack from top to bottom, as I was mentioned from the computer system, but in the case of self driving cars, including the self driving car application at the end of this year, or I guess beginning of next year, we will be shipping in Mercedes and then shortly after that, JLR. And so these autonomous robotic systems are software defined. They take a lot of work to do, has computer vision, has obviously artificial intelligence control and planning, all kinds of very complicated technology and takes years to refine.


We're building the entire stack. However, we open up our entire stack for all of the automotive industry. This is just the way we work, the way we work in every single industry. We try to build as much of it as we can so that we understand it, but then we open it up so everybody can access it.


Whether you would like to buy just our computer, which is the world's only full functional, safe Ald system that can run AI, this functional safe ACL D quality computer, or the operating system on top, or of course, our data centers, which is in basically every Av company in the world, However, you would like to enjoy it, we're delighted by it.


Today, we're announcing that BYD, the world's largest OVC, is adopting our next generation. It's called Thor. Thor is designed for transformer engines. Thor, our next generation Av computer will be used by BYD.


You probably don't know this fact that we have over a million robotics developers. We created Jetson, this robotics computer. We're so proud of it. The amount of software that goes on top of it is insane. But the reason why we can do it at all is because it's 100% could have compatible everything that we do. Everything that we do in our company is in service of our developers. And by us being able to maintain this rich ecosystem and make it compatible with everything that you access from us, we can bring all of that incredible capability to this little tiny computer we call Jetson, a robotics computer.


We also today are announcing this incredibly advanced new SDK, we call it Isaac perceptor Isaac perceptor most of the robots today are pre-programmed.


They're either following rails on the ground, digital rails, or they'd be following April tags. But in the future, they're going to have perception. And the reason why you want that is so that you could easily program it. You say, would you like to go from point A to point B? And it will figure out a way to navigate its way there. So by only programming waypoints, the entire route could be adaptive. The entire environment could be re-programmed, just as I showed you at the very beginning with the warehouse. You can't do that with pre-programmed agv's. If those boxes fall down, they just all gum up and they just wait there for somebody to come clear it.


And so now with the Isaac perceptor, we have incredible state of the art vision odometry, 3D reconstruction, and in addition to 3D reconstruction depth perception. The reason for that is so that you can have two modalities to keep an eye on what's happening in the world. Isaac, Isaac perceptor the most used robot today, is a manipulator manufacturing arms and they are also pre programmed.


The computer vision algorithms, the AI algorithms, the control and path planning algorithms that are geometry aware, incredibly computationally intensive. We have made these kuda accelerated, so we have the world's first kuda accelerated motion planner that is geometry aware. You put something in front of it, it comes up with a new plan and articulates around it. It has excellent perception for Poe's estimation of a 3D object, not just not it's pose in 2D, but it's pose in 3D, So it has to imagine what's around and how best to grab it. So the foundation pose, the grip foundation, and the articulation algorithms are now available. We call it Isaac manipulator. And they also just run on V as computers.


We are starting to do some really great work in the next generation of robotics. The next generation of robotics will likely be a human Ord robotics.


We now have the necessary technology and as I was describing earlier, the necessary technology to imagine generalized human robotics. In a way, human robotics is likely easier. And the reason for that is because we have a lot more imitation training data that we can provide the robots, because we are constructed in a very similar way, it is very likely that the human robotics will be much more useful in our world because we created the world to be something that we can interoperate in and work well in. And the way that we set up our workstations and many factoring and logistics, they were designed for humans, they were designed for people. And so these human or robotics will likely be much more productive to deploy while we're creating just like we're doing with the others, the entire stack starting from the top a foundation model that learns from watching video human examples.


It could be in video form, it could be in virtual reality form. We then created a gym for it called Isaac Reinforcement Learning Gym, which allows the humanoid robot to learn how to adapt to the physical world, and then an incredible computer, the same computer that's going to go into a robotic car, this computer will run inside a human or robot called Thor. It's designed for transformer engines. We've combined several of these into one video.


This is something that you're going to really love, take a look. It's not enough for humans to imagine.


We have to invent. And explore and push beyond what's been done a fair amount of detail.


We create smarter and faster. We push it to fail so it can learn. We teach it, then help it teach itself. We broaden its understanding. To take on new challenges. With absolute precision. I'm succeed. We make it perceive and move. And even reason. So it can share our world with us.


This is where inspiration leads us. The next frontier. This is Nvidia project great. A general purpose foundation model for humanoid robot learning, The group model takes multimodal instructions and past interactions as input and produces the next action for the robot to execute. We developed Isaac Lab, a robot learning application to train group on Omniverse, Isaac Sim, and we scale out with Osmo, a new compute orchestration service that coordinates workflows across dgx systems for training and O Vx systems for simulation. With these tools, we can train Groot in physically based simulation and transfer zero shot to the real world.


The group model will enable a robot to learn from a handful of human demonstrations so it can help with everyday tasks. And emulate human movement just by observing us. This is made possible with Nvidia's technologies that can understand humans from videos, train models, and simulation, and ultimately deploy them directly to physical robots. Connecting group to a large language model even allows it to generate motions by following natural language instructions.


Hi Joe, one here, give me a high five. You're big. Let's high 5 5, can you give us some komos dirt dirt, check this out. All this incredible intelligence is powered by the new Jets and Thor Robotics chips designed for group built for the future with Isaac Lab Osmo and Groot, we're providing the building blocks for the next generation of AI powered robotics.


About the same size.


The soul of avidia, the intersection of computer graphics, physics, artificial intelligence, it all came to bear at this moment. The name of that project? General Robotics 0 0 3. I know, super good. Super good. Well, I think we have some special guests. Two weeks?


Hey, guys. So I understand you guys are powered by Jetson. They're powered by Jetsons, little Jetson Robotics computers inside. They learned a walk in Isaac Sim. Ladies and gentlemen, this, this is orange and this is the famous green. They are the bdx robots of Disney. Amazing, amazing Disney research. Come on, you guys, let's wrap up. Let's go. Five things. Where you going? I sit right here. Don't be afraid. Come here. Green. Hurry up. What do he saying? No, it's not time to eat. To help can be. I'll give you a snack in a moment. Let me finish up real quick. Come on green, hurry, hurry up, Stop wasting time Five things, 5 things.


First, a new industrial revolution, every data center should be accelerated, a trillion dollars worth of installed data centers will become modernized over the next several years. Second, because of the computational capability we brought to bear, a new way of doing software has emerged. Generative AI, which is going to create new infrastructure dedicated to doing one thing and one thing only, not for multi user data centers, but AI generators. These AI generation will create incredibly valuable software, a new industrial revolution.


Second, the computer of this revolution, the computer of this generation, generative AI, trillion parameters, Blackwell insane amounts of computers and computing. Third I'm trying to concentrate. Good job.


Third, new computer New computer creates new types of software. New types of software should be distributed in a new way that it can, on the one hand, be an endpoint in the cloud and easy to use, but still allow you to take it with you because it is your intelligence. Your intelligence should be packaged up in a way that allows you to take it with you. We call them Nims. And third, these Nims are going to help you create a new type of application for the future, not one that you wrote completely from scratch, but you're going to integrate them like teams create these applications. We have a fantastic capability between Nims, the AI technology, the tools Nemo, and the infrastructure dgx cloud in our AI Foundry to help you create proprietary applications, proprietary chatbots, and then lastly, everything that moves in the future will be robotic.


You're not going to be the only one. And these robotic systems, whether they are humanoid, amrs, self driving cars, forklifts, manipulating arms, they will all need one thing. Giant stadiums, warehouses, factories. There can be factories that are robotic, orchestrating factories, manufacturing lines that are robotics, building cars that are robotics. These systems all need one thing. They need a platform, a digital platform, a digital twin platform, and we call that Omniverse, the operating system of the robotics world.


These are the five things that we talked about today. What does Nvidia look like? What does Nvidia look like? When we talk about Gpu's, there's a very different image that I have when I, when people ask me about Gpu's first, I see a bunch of software stacks and things like that. And second, I see this, this is what we announced to you today, this is Blackwell, this is the platform.


Amazing processors, mv link switches, networking systems, and the system design is a miracle. This is Blackwell and this to me is what a GPU looks like in my mind.


Listen, orange, green. I think we have one more treat for everybody. What do you think? Should we, okay, we have one more thing to show you Rollins.


Thank you. Thank you, Have a great, have a great GTC Thank you all for coming. Thank you.


版权声明:部分文章推送时未能与原作者取得联系。若涉及版权问题,敬请原作者联系我们。

微信扫码关注该文公众号作者

戳这里提交新闻线索和高质量文章给我们。
相关阅读
黄仁勋的年会不能停1.58万字!2024 GTC黄仁勋完整版演讲全文+视频来了!早报|黄仁勋到访英伟达北上深办公室;温州红十字会回应“打工人嘴替”视频;雷军:小米汽车定价可能有点贵;辛巴称最应给孩子吃预制菜王传福:5 代 DM-i 续航 2000 公里;传腾讯急开发《幻兽帕鲁》类游戏;小扎、黄仁勋「友情换衣」,暗示合作|极客早知道比GTC2024更精彩,黄仁勋现场问答万字纪要(建议收藏)黄仁勋来华抚客米哈游GDC演讲:《原神》开放世界地图设计的四个技巧中文实录!黄仁勋集齐Transformer论文七大作者,对话一小时,干货满满黄仁勋亲自给OpenAI送货,全球首台DGX H200开箱了GTC2024黄仁勋2万字演讲全文:不止计算革命和Blackwell,新软件、AI代工厂、AI机器人皆在路上!网易开启大规模裁员,涉多个业务,最新回应;上汽研发总院被曝裁撤飞凡智驾部门;黄仁勋现身英伟达中国区年会丨雷峰早报疫情后首次到访!英伟达黄仁勋中国行程曝光英伟达难舍中国市场:黄仁勋时隔5年现身上海 穿东北大花袄热舞蔚来李斌美国会面黄仁勋/B 站回应五四短片被指抄袭/iPhone 屏下 Face ID 或推迟至 2026 年到来GTC 对话黄仁勋:我眼中的 GPU,和别人有很大差别黄仁勋回斯坦福演讲大秀“爹味”? 真相竟是…2500年古城“奇琴伊察”:城中六大著名景点黄仁勋年会扭秧歌;小米辟谣澎湃OS是自研系统;白剑:只有蔚来手机能单手截屏《世界宗教是一家》黄仁勋最新演讲:伟大都是熬过来的!穿大花背心扭秧歌!黄仁勋到访英伟达京沪深办公室走马观花看纽约(3)纽约MOMA的前世今生时代2024最具影响力100人:黄仁勋、Bengio、纳德拉 、王传福等人入选现场采访黄仁勋:20个灵魂问题,谈GPU定价和中国出口,怼AGI时间表黄仁勋集齐AI「八龙珠」,万字对谈实录出走14:暂别深圳King GDC演讲:制作15000个关卡的经验总结,如何做好三消关卡?现场直击GTC|最科幻发布会,性能翻30倍的恐怖Blackwell芯片,黄仁勋宣告“新工业革命”来了[时尚]全天候慢跑 Asics GT-2000 11 GTX 开箱及GT-1000千公里使用感受一年狂赚4200亿,超级狠人黄仁勋谷歌工程师硬核长篇预测,证实黄仁勋观点:AGI或在2029年出现,AI五年内通过人类测试黄仁勋刚刚发布,英伟达最强GPU B200,首次采用Chiplet?黄仁勋、英伟达、GTC、机器人,给我带来的几个震撼……《博德之门3》CEO GDC演讲:一纸提案开始、研发6年、终成大器温柔敦厚的张爱玲
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。