国际科技财经博客移民网络热点娱乐民生时事公众号

>

Andrej Karpathy｜Let's build GPT: from scratch, in code, spelled

Andrej Karpathy｜Let's build GPT: from scratch, in code, spelled

2023-01-17 22:01

Andrej Karpathy 是前特斯拉AI 负责人。

Great thread
— Elon Musk (@elonmusk) January 17, 2023

Jan 17, 2023

Chapters:

00:00:00 intro: ChatGPT, Transformers, nanoGPT, Shakespeare baseline language modeling, code setup

00:07:52 reading and exploring the data

00:09:28 tokenization, train/val split

00:14:27 data loader: batches of chunks of data

00:22:11 simplest baseline: bigram language model, loss, generation

00:34:53 training the bigram model

00:38:00 port our code to a script Building the "self-attention"

00:42:13 version 1: averaging past context with for loops, the weakest form of aggregation

00:47:11 the trick in self-attention: matrix multiply as weighted aggregation

00:51:54 version 2: using matrix multiply

00:54:42 version 3: adding softmax

00:58:26 minor code cleanup

01:00:18 positional encoding

01:02:00 THE CRUX OF THE VIDEO: version 4: self-attention

01:11:38 note 1: attention as communication

01:12:46 note 2: attention has no notion of space, operates over sets

01:13:40 note 3: there is no communication across batch dimension

01:14:14 note 4: encoder blocks vs. decoder blocks

01:15:39 note 5: attention vs. self-attention vs. cross-attention

01:16:56 note 6: "scaled" self-attention. why divide by sqrt(head_size) Building the Transformer 01:19:11 inserting a single self-attention block to our network

01:21:59 multi-headed self-attention

01:24:25 feedforward layers of transformer block

01:26:48 residual connections

01:32:51 layernorm (and its relationship to our previous batchnorm)

01:37:49 scaling up the model! creating a few variables. adding dropout Notes on Transformer

01:42:39 encoder vs. decoder vs. both (?) Transformers

01:46:22 super quick walkthrough of nanoGPT, batched multi-headed self-attention

01:48:53 back to ChatGPT, GPT-3, pretraining vs. finetuning, RLHF

01:54:32 conclusions

Corrections:

00:57:00 Oops "tokens from the future cannot communicate", not "past". Sorry! :)

戳这里 Claim your page

来源: 文学城-蓝调

相关阅读

速揽2500星，Andrej Karpathy重写了一份minGPT库用爱编织梦想莎士比亚的毒药系列：魔鬼的尖叫 “欸，世界你好，战狼我收回来了。”题画不粘画咀外文嚼汉字（188）“牛蝦”、 “牛蛙”攀高枝儿不如门当户对！听毛主席作《反对党八股》报告小说《转世的故事》一顶尖法学院退出排名，高院再审哈佛案，我却想起那个去世的耶鲁女孩特斯拉前AI总监Andrej Karpathy再度回归OpenAI！CEO奥特曼热烈欢迎这两国的元首大腹便便需要减肥 A Beijing Theater Is Committed to Screening Movies for the Blind 谷歌这样恶搞细颈瓶，咸鱼翻身待何时 Chinese Professor Suspended For Racist, Homophobic Speech Gpt 4一出，谁与争锋第一场雪 AACO College Panel: From High School to College 第二次徒步圣路，750公里葡萄牙之路+英国之路:D03~老驴失蹄 Zhejiang Wants Children to Study AI From Younger Age 全球化的一大弊端的确是经济基础和上层建筑脱节。资本全球化，规则没有。川普贸易战打的就是规则。强调美国第一，隔离不同规则区 Recovered From COVID, Young Chinese Gripped by Snow Fever 再次加入OpenAI，特斯拉前AI总监Andrej Karpathy刚刚官宣！【庭院养蜂】在寒冷的北方，蜜蜂怎么过冬？隐匿暴政信仰的颠覆（六十五）Trump解决俄乌战争方案：Let Russia ‘Take Over’ Parts of Ukraine 常书鸿油画特斯拉自动驾驶专家Andrej Karpathy 加入OpenAI，欲研发ChatKPT Andrej Karpathy 回归 OpenAI：ChatGPT 重新让 AI 科学家变得值钱风筝跪姿不正确，让广州受伤害了？买房买进了“匪窝”

热点事件追踪