Redian新闻
>
Andrej Karpathy|Let's build GPT: from scratch, in code, spelled

Andrej Karpathy|Let's build GPT: from scratch, in code, spelled

博客

 

Andrej Karpathy 是前特斯拉AI 负责人。

 

Jan 17, 2023

 

Chapters:

00:00:00 intro: ChatGPT, Transformers, nanoGPT, Shakespeare baseline language modeling, code setup

00:07:52 reading and exploring the data

00:09:28 tokenization, train/val split

00:14:27 data loader: batches of chunks of data

00:22:11 simplest baseline: bigram language model, loss, generation

00:34:53 training the bigram model

00:38:00 port our code to a script Building the "self-attention"

00:42:13 version 1: averaging past context with for loops, the weakest form of aggregation

00:47:11 the trick in self-attention: matrix multiply as weighted aggregation

00:51:54 version 2: using matrix multiply

00:54:42 version 3: adding softmax

00:58:26 minor code cleanup

01:00:18 positional encoding

01:02:00 THE CRUX OF THE VIDEO: version 4: self-attention

01:11:38 note 1: attention as communication

01:12:46 note 2: attention has no notion of space, operates over sets

01:13:40 note 3: there is no communication across batch dimension

01:14:14 note 4: encoder blocks vs. decoder blocks

01:15:39 note 5: attention vs. self-attention vs. cross-attention

01:16:56 note 6: "scaled" self-attention. why divide by sqrt(head_size) Building the Transformer 01:19:11 inserting a single self-attention block to our network

01:21:59 multi-headed self-attention

01:24:25 feedforward layers of transformer block

01:26:48 residual connections

01:32:51 layernorm (and its relationship to our previous batchnorm)

01:37:49 scaling up the model! creating a few variables. adding dropout Notes on Transformer

01:42:39 encoder vs. decoder vs. both (?) Transformers

01:46:22 super quick walkthrough of nanoGPT, batched multi-headed self-attention

01:48:53 back to ChatGPT, GPT-3, pretraining vs. finetuning, RLHF

01:54:32 conclusions

Corrections:

00:57:00 Oops "tokens from the future cannot communicate", not "past". Sorry! :)


 

戳这里 Claim your page
来源: 文学城-蓝调
相关阅读
跪姿不正确,让广州受伤害了?用爱编织梦想速揽2500星,Andrej Karpathy重写了一份minGPT库[电脑] The Grand Beyond The Grand —— 华硕ROG HYPERION创世神 装机SHOW!隐匿暴政 信仰的颠覆(六十五)题画不粘画买房买进了“匪窝”Andrej Karpathy 回归 OpenAI:ChatGPT 重新让 AI 科学家变得值钱AACO College Panel: From High School to College常书鸿 油画风筝再次加入OpenAI,特斯拉前AI总监Andrej Karpathy刚刚官宣!听毛主席作《反对党八股》报告Trump解决俄乌战争方案:Let Russia ‘Take Over’ Parts of Ukraine全球化的一大弊端的确是经济基础和上层建筑脱节。资本全球化,规则没有。川普贸易战打的就是规则。强调美国第一,隔离不同规则区Zhejiang Wants Children to Study AI From Younger Age谷歌这样恶搞细颈瓶,咸鱼翻身待何时顶尖法学院退出排名,高院再审哈佛案,我却想起那个去世的耶鲁女孩【庭院养蜂】在寒冷的北方,蜜蜂怎么过冬?“欸,世界你好,战狼我收回来了。”第一场雪这两国的元首大腹便便需要减肥Gpt 4一出,谁与争锋咀外文嚼汉字(188)“牛蝦”、 “牛蛙”Recovered From COVID, Young Chinese Gripped by Snow Fever特斯拉前AI总监Andrej Karpathy再度回归OpenAI!CEO奥特曼热烈欢迎Chinese Professor Suspended For Racist, Homophobic Speech第二次徒步圣路,750公里葡萄牙之路+英国之路:D03~老驴失蹄攀高枝儿不如门当户对!特斯拉自动驾驶专家Andrej Karpathy 加入OpenAI,欲研发ChatKPT莎士比亚的毒药系列:魔鬼的尖叫小说《转世的故事》一
logo
联系我们隐私协议©2024 redian.news
Redian新闻
Redian.news刊载任何文章,不代表同意其说法或描述,仅为提供更多信息,也不构成任何建议。文章信息的合法性及真实性由其作者负责,与Redian.news及其运营公司无关。欢迎投稿,如发现稿件侵权,或作者不愿在本网发表文章,请版权拥有者通知本网处理。