Created
September 22, 2024 05:28
-
-
Save tiye/b4686378b0d1900a4dbbfd67abc8fc58 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 | |
00:00:00,100 --> 00:00:02,533 | |
嘿,朋友们,这里是Visionary 3D | |
2 | |
00:00:02,533 --> 00:00:03,700 | |
今天我们将 | |
3 | |
00:00:03,700 --> 00:00:06,766 | |
学习如何在实践中使用计算着色器 | |
4 | |
00:00:06,766 --> 00:00:11,200 | |
在 WebGPU 中,计算着色器非常吸引人 | |
5 | |
00:00:11,200 --> 00:00:14,133 | |
因为它们允许在GPU上运行大型模拟 | |
6 | |
00:00:14,133 --> 00:00:19,000 | |
在我们的例子中是在 Web 平台上 | |
7 | |
00:00:19,133 --> 00:00:21,600 | |
一个计算机是一个小的程序 | |
8 | |
00:00:21,600 --> 00:00:23,900 | |
它帮助我们执行非常快的 | |
9 | |
00:00:23,900 --> 00:00:25,966 | |
并行算法 | |
10 | |
00:00:26,166 --> 00:00:28,100 | |
我们将专注于最重要的事情 | |
11 | |
00:00:28,100 --> 00:00:29,466 | |
你需要 | |
12 | |
00:00:29,600 --> 00:00:32,166 | |
了解以开始编写你的第一个 | |
13 | |
00:00:32,333 --> 00:00:34,300 | |
实用的计算着色器 | |
14 | |
00:00:34,366 --> 00:00:37,533 | |
实用意味着我们不会只构建一个演示 | |
15 | |
00:00:37,733 --> 00:00:42,500 | |
像这样的工具很酷,因为它们能解决问题 | |
16 | |
00:00:42,566 --> 00:00:44,866 | |
我们将使用计算着色器来构建 | |
17 | |
00:00:44,866 --> 00:00:46,966 | |
一个快速的数学库 | |
18 | |
00:00:46,966 --> 00:00:50,600 | |
用于矩阵,但使用这个矩阵库 | |
19 | |
00:00:50,666 --> 00:00:54,600 | |
我将在本期节目中训练一个非常基础的神经网络 | |
20 | |
00:00:54,666 --> 00:00:56,000 | |
在本期节目中 | |
21 | |
00:00:56,133 --> 00:00:57,766 | |
不要太担心数学 | |
22 | |
00:00:57,766 --> 00:01:00,066 | |
因为我们今天要讲的数学 | |
23 | |
00:01:00,066 --> 00:01:02,166 | |
将非常简单 | |
24 | |
00:01:02,166 --> 00:01:05,333 | |
我们将构建的神经网络旨在 | |
25 | |
00:01:05,500 --> 00:01:08,700 | |
理解和识别手写数字 | |
26 | |
00:01:08,766 --> 00:01:12,466 | |
我们将使用的数据集称为MNIST数据集 | |
27 | |
00:01:12,466 --> 00:01:14,300 | |
第一步是加载这个 | |
28 | |
00:01:14,300 --> 00:01:16,166 | |
数据集并发送到我的 | |
29 | |
00:01:16,166 --> 00:01:18,300 | |
全屏片段着色器 | |
30 | |
00:01:18,300 --> 00:01:19,100 | |
使用这段代码 | |
31 | |
00:01:19,100 --> 00:01:21,400 | |
我将计算一个索引,我将 | |
32 | |
00:01:21,400 --> 00:01:22,866 | |
用它来读取颜色 | |
33 | |
00:01:23,300 --> 00:01:24,566 | |
如果你想了解更多关于 | |
34 | |
00:01:24,566 --> 00:01:27,466 | |
片段着色器和着色器编程的一般知识 | |
35 | |
00:01:27,466 --> 00:01:29,100 | |
你可以看看我的免费 | |
36 | |
00:01:29,100 --> 00:01:30,766 | |
着色器编程速成课程 | |
37 | |
00:01:30,866 --> 00:01:33,766 | |
以及我的 WebGPU 入门课程 | |
38 | |
00:01:34,133 --> 00:01:35,566 | |
我还提供一对一的 | |
39 | |
00:01:35,566 --> 00:01:38,533 | |
关于着色器编程的指导视频通话 | |
40 | |
00:01:38,566 --> 00:01:40,200 | |
3D Web 开发 | |
41 | |
00:01:40,300 --> 00:01:42,100 | |
和 threejs | |
42 | |
00:01:42,100 --> 00:01:45,733 | |
所以如果你感兴趣,请查看下面的描述 | |
43 | |
00:01:45,800 --> 00:01:48,133 | |
你只需要了解 JavaScript 的基础知识 | |
44 | |
00:01:48,133 --> 00:01:51,500 | |
然而,为了跟上期视频 | |
45 | |
00:01:51,733 --> 00:01:53,666 | |
到目前为止我向你展示的设置 | |
46 | |
00:01:53,666 --> 00:01:57,500 | |
只是为了可视化我们将要使用的数据集 | |
47 | |
00:01:57,500 --> 00:02:00,166 | |
与计算着色器无关 | |
48 | |
00:02:00,333 --> 00:02:02,666 | |
为了训练神经网络去 | |
49 | |
00:02:02,800 --> 00:02:04,166 | |
理解这些图像 | |
50 | |
00:02:04,200 --> 00:02:06,600 | |
我们需要有一种方法来处理大型的 | |
51 | |
00:02:06,600 --> 00:02:08,100 | |
矩阵运算 | |
52 | |
00:02:08,100 --> 00:02:10,100 | |
当我们将两个矩阵相加时 | |
53 | |
00:02:10,266 --> 00:02:13,966 | |
我们得到第三个矩阵,它是结果 | |
54 | |
00:02:14,266 --> 00:02:17,366 | |
矩阵运算的结果 | |
55 | |
00:02:17,533 --> 00:02:20,166 | |
张量是一个数字数组 | |
56 | |
00:02:20,200 --> 00:02:23,066 | |
向量是一维张量 | |
57 | |
00:02:23,300 --> 00:02:25,666 | |
矩阵是二维张量 | |
58 | |
00:02:25,666 --> 00:02:28,100 | |
所以它可以用一个数组表示 | |
59 | |
00:02:28,133 --> 00:02:30,000 | |
一个数字数组 | |
60 | |
00:02:30,066 --> 00:02:30,733 | |
在底层 | |
61 | |
00:02:30,733 --> 00:02:32,900 | |
这实际上是一个指针数组 | |
62 | |
00:02:32,900 --> 00:02:36,166 | |
每个指针都指向一个数字数组 | |
63 | |
00:02:36,166 --> 00:02:39,133 | |
一个 3D 张量将是相同的,但有 | |
64 | |
00:02:39,133 --> 00:02:40,933 | |
再多一个维度 | |
65 | |
00:02:41,066 --> 00:02:41,366 | |
你可以 | |
66 | |
00:02:41,366 --> 00:02:43,800 | |
清楚地看到,随着维度的增加,这种状态结构变得 | |
67 | |
00:02:43,800 --> 00:02:45,366 | |
越来越复杂 | |
68 | |
00:02:45,466 --> 00:02:48,533 | |
随着我们增加维度的数量 | |
69 | |
00:02:48,600 --> 00:02:52,066 | |
我们将使用一个更好的结构 | |
70 | |
00:02:52,133 --> 00:02:55,766 | |
矩阵的值可以重新排列以表示 | |
71 | |
00:02:55,766 --> 00:02:58,300 | |
一个一维数组代替 | |
72 | |
00:02:58,300 --> 00:03:00,566 | |
但是给定一些 x 和 y 值 | |
73 | |
00:03:00,566 --> 00:03:03,000 | |
我们应该能够访问到该数字 | |
74 | |
00:03:03,000 --> 00:03:05,466 | |
在数组的扁平化版本中 | |
75 | |
00:03:05,733 --> 00:03:06,866 | |
所以为了得到一个索引 | |
76 | |
00:03:06,866 --> 00:03:10,800 | |
我们将 y 乘以矩阵的宽度 | |
77 | |
00:03:10,900 --> 00:03:13,866 | |
然后加上 x 作为偏移量 | |
78 | |
00:03:14,200 --> 00:03:15,666 | |
x 就是 x | |
79 | |
00:03:15,666 --> 00:03:18,933 | |
但每次我们增加 y 的值 | |
80 | |
00:03:19,000 --> 00:03:22,566 | |
我们必须跳过一整行才能到达下一个点 | |
81 | |
00:03:22,666 --> 00:03:24,133 | |
具有相同的 x | |
82 | |
00:03:24,466 --> 00:03:28,133 | |
这就是为什么我们将 y 乘以列数 | |
83 | |
00:03:28,133 --> 00:03:30,266 | |
或矩阵的宽度 | |
84 | |
00:03:30,300 --> 00:03:32,500 | |
所以在内存中表示张量 | |
85 | |
00:03:32,500 --> 00:03:34,400 | |
你至少需要两样东西 | |
86 | |
00:03:34,733 --> 00:03:37,100 | |
一个保存数据的缓冲区 | |
87 | |
00:03:37,200 --> 00:03:40,333 | |
和一个帮助我们查看数据的视图 | |
88 | |
00:03:40,533 --> 00:03:43,200 | |
这个视图通常被称为张量形状 | |
89 | |
00:03:43,333 --> 00:03:46,200 | |
想想我们可以用一维数组表示任何张量 | |
90 | |
00:03:46,200 --> 00:03:48,133 | |
用一个一维数组 | |
91 | |
00:03:48,300 --> 00:03:50,133 | |
这对我来说很震撼 | |
92 | |
00:03:50,300 --> 00:03:52,466 | |
您不必更改底层数据 | |
93 | |
00:03:52,533 --> 00:03:55,566 | |
您需要更改的是形状 | |
94 | |
00:03:55,700 --> 00:03:58,533 | |
我想在本视频中给您的第一课是 | |
95 | |
00:03:58,533 --> 00:04:00,766 | |
始终从问题开始 | |
96 | |
00:04:01,000 --> 00:04:03,700 | |
尝试找出最佳数据结构是什么 | |
97 | |
00:04:03,700 --> 00:04:05,600 | |
来帮助您解决该问题 | |
98 | |
00:04:05,700 --> 00:04:07,566 | |
以最有效的方式 | |
99 | |
00:04:07,900 --> 00:04:10,266 | |
我决定先在 JavaScript 中实现一些基本的张量 | |
100 | |
00:04:10,266 --> 00:04:12,966 | |
操作 | |
101 | |
00:04:13,266 --> 00:04:14,900 | |
纯 JavaScript 让我们 | |
102 | |
00:04:14,900 --> 00:04:17,500 | |
暂时忘记 GPU 方面的事情 | |
103 | |
00:04:17,500 --> 00:04:20,600 | |
并了解基本功能 | |
104 | |
00:04:20,600 --> 00:04:23,900 | |
我们需要两个相同大小的矩阵 | |
105 | |
00:04:23,900 --> 00:04:28,066 | |
一个元素的加法运算非常简单 | |
106 | |
00:04:28,066 --> 00:04:30,400 | |
就像我们将两个矩阵对齐一样 | |
107 | |
00:04:30,400 --> 00:04:32,266 | |
正好在彼此的顶部 | |
108 | |
00:04:32,266 --> 00:04:36,200 | |
这给了我们一个具有完全相同大小的矩阵 | |
109 | |
00:04:36,266 --> 00:04:37,466 | |
作为结果 | |
110 | |
00:04:37,666 --> 00:04:38,266 | |
我们循环遍历 | |
111 | |
00:04:38,266 --> 00:04:41,133 | |
所有数字,然后使用相同的索引 | |
112 | |
00:04:41,333 --> 00:04:44,600 | |
在矩阵中添加和存储值 | |
113 | |
00:04:44,866 --> 00:04:48,066 | |
加法、乘法、减法和除法并不是那么有趣 | |
114 | |
00:04:48,166 --> 00:04:50,466 | |
但我们非常关心 | |
115 | |
00:04:50,500 --> 00:04:52,666 | |
是矩阵矩阵乘法的效率 | |
116 | |
00:04:52,733 --> 00:04:56,933 | |
这是 JavaScript 中的一个基本实现 | |
117 | |
00:04:57,300 --> 00:05:00,533 | |
在此操作中,与加法不同 | |
118 | |
00:05:00,533 --> 00:05:03,066 | |
我们需要创建一个新的矩阵 | |
119 | |
00:05:03,066 --> 00:05:06,200 | |
因为结果矩阵的形状 | |
120 | |
00:05:06,300 --> 00:05:08,700 | |
根本不同 | |
121 | |
00:05:08,766 --> 00:05:10,533 | |
从根本上来说是不同的 | |
122 | |
00:05:10,966 --> 00:05:14,400 | |
求和操作是这样实现的 | |
123 | |
00:05:14,500 --> 00:05:17,133 | |
我们创建一个变量来保存总和 | |
124 | |
00:05:17,133 --> 00:05:20,866 | |
并且我们在循环中累加所有值 | |
125 | |
00:05:21,000 --> 00:05:23,066 | |
在这个阶段我们了解到 | |
126 | |
00:05:23,066 --> 00:05:26,166 | |
每个张量操作都是一个函数 | |
127 | |
00:05:26,200 --> 00:05:28,866 | |
一个函数有输入和输出 | |
128 | |
00:05:28,866 --> 00:05:30,800 | |
所以张量操作 | |
129 | |
00:05:30,966 --> 00:05:33,366 | |
有输入和输出 | |
130 | |
00:05:33,566 --> 00:05:36,066 | |
我要告诉你我认为的 | |
131 | |
00:05:36,166 --> 00:05:40,100 | |
在 WebGPU 中创建这些操作的最佳方法 | |
132 | |
00:05:40,133 --> 00:05:45,366 | |
今天我不会涉及一些的实现 | |
133 | |
00:05:45,366 --> 00:05:47,300 | |
因为那是一个 reduce 操作 | |
134 | |
00:05:47,300 --> 00:05:51,666 | |
在 WebGPU 中实现会有很大的不同 | |
135 | |
00:05:52,066 --> 00:05:53,300 | |
三件事很重要 | |
136 | |
00:05:53,300 --> 00:05:55,733 | |
当您想使用计算着色器时 | |
137 | |
00:05:56,066 --> 00:05:56,800 | |
着色器 | |
138 | |
00:05:56,800 --> 00:05:59,800 | |
这只是用着色语言编写的一些代码 | |
139 | |
00:05:59,900 --> 00:06:00,966 | |
绑定组 | |
140 | |
00:06:00,966 --> 00:06:03,800 | |
它们是着色器的输入和输出 | |
141 | |
00:06:03,966 --> 00:06:05,800 | |
和命令编码器 | |
142 | |
00:06:05,800 --> 00:06:09,000 | |
它允许您为 GPU 编写任何代码命令 | |
143 | |
00:06:09,000 --> 00:06:12,400 | |
执行着色器只是用着色语言编写的一些代码 | |
144 | |
00:06:12,400 --> 00:06:13,566 | |
着色语言 | |
145 | |
00:06:13,566 --> 00:06:15,733 | |
这意味着这段代码将会 | |
146 | |
00:06:15,733 --> 00:06:17,933 | |
存在于我们的 JavaScript 应用程序中 | |
147 | |
00:06:18,200 --> 00:06:20,000 | |
作为字符串 | |
148 | |
00:06:20,500 --> 00:06:24,766 | |
字符串很易用,因为您可以操纵它们 | |
149 | |
00:06:25,100 --> 00:06:27,166 | |
我们将在本视频中编写的代码 | |
150 | |
00:06:27,166 --> 00:06:28,866 | |
免费提供 | |
151 | |
00:06:28,966 --> 00:06:30,266 | |
在描述中 | |
152 | |
00:06:30,333 --> 00:06:33,766 | |
我们将使用 WebGPU 的着色语言 | |
153 | |
00:06:33,766 --> 00:06:36,266 | |
也称为 WGSL | |
154 | |
00:06:36,600 --> 00:06:40,366 | |
u32 定义了一个无符号 32 位整数 | |
155 | |
00:06:40,666 --> 00:06:42,600 | |
vec3 只是一个构造函数 | |
156 | |
00:06:42,600 --> 00:06:44,466 | |
说明我们有一个向量 | |
157 | |
00:06:44,466 --> 00:06:46,800 | |
具有三个不同的部分 | |
158 | |
00:06:46,966 --> 00:06:52,066 | |
组件名称为 x y 和 z 或 r g b | |
159 | |
00:06:52,166 --> 00:06:55,966 | |
您可以按顺序互换使用这些名称 | |
160 | |
00:06:56,400 --> 00:06:58,700 | |
f32 定义 32 位浮点数 | |
161 | |
00:06:58,700 --> 00:07:01,066 | |
并包装在 vec3 构造函数中 | |
162 | |
00:07:01,066 --> 00:07:04,733 | |
我们得到一个三维 f32 向量 | |
163 | |
00:07:05,066 --> 00:07:08,533 | |
要创建一个类,我们使用 struct 关键字 | |
164 | |
00:07:09,000 --> 00:07:09,300 | |
这里 | |
165 | |
00:07:09,300 --> 00:07:12,200 | |
我正在定义我们计算的输入和输出 | |
166 | |
00:07:12,200 --> 00:07:15,400 | |
着色器第一个绑定是一个统一缓冲区 | |
167 | |
00:07:15,400 --> 00:07:18,766 | |
它保存着一些数据,这些数据在每个 | |
168 | |
00:07:18,766 --> 00:07:20,800 | |
着色器的单个实例 | |
169 | |
00:07:21,066 --> 00:07:21,500 | |
第二个 | |
170 | |
00:07:21,500 --> 00:07:24,966 | |
绑定是我们输入矩阵中的值数组 | |
171 | |
00:07:25,100 --> 00:07:27,500 | |
第三个绑定是我们的输出 | |
172 | |
00:07:27,500 --> 00:07:28,600 | |
计算着色器 | |
173 | |
00:07:28,766 --> 00:07:30,533 | |
因为我们允许重新然后写 | |
174 | |
00:07:30,533 --> 00:07:32,400 | |
访问我们的定义 | |
175 | |
00:07:32,400 --> 00:07:34,333 | |
所以我们可以写入那个缓冲区 | |
176 | |
00:07:34,366 --> 00:07:36,366 | |
在我们计算结果之后 | |
177 | |
00:07:36,466 --> 00:07:38,300 | |
函数是这样定义的 | |
178 | |
00:07:38,300 --> 00:07:39,400 | |
而这个功能 | |
179 | |
00:07:39,400 --> 00:07:42,133 | |
专门检查 3D 点是否 | |
180 | |
00:07:42,133 --> 00:07:43,700 | |
超出范围 | |
181 | |
00:07:43,900 --> 00:07:45,800 | |
最后返回一个布尔值 | |
182 | |
00:07:45,800 --> 00:07:48,466 | |
它可以是真或假 | |
183 | |
00:07:48,733 --> 00:07:51,766 | |
我们的计算器的主要部分定义在这里 | |
184 | |
00:07:51,866 --> 00:07:55,366 | |
我们给这个计算着色器一个 64 的工作组大小 | |
185 | |
00:07:55,466 --> 00:07:57,266 | |
我稍后再谈 | |
186 | |
00:07:57,666 --> 00:07:58,700 | |
在我们的主要功能中 | |
187 | |
00:07:58,700 --> 00:08:02,000 | |
我们还可以访问几个内置变量 | |
188 | |
00:08:02,000 --> 00:08:05,300 | |
在这里,我们专门要求提供全局 ID | |
189 | |
00:08:05,533 --> 00:08:08,400 | |
我也会说说这到底是什么 | |
190 | |
00:08:08,533 --> 00:08:09,733 | |
几分钟后 | |
191 | |
00:08:10,066 --> 00:08:13,200 | |
我们终于到了计算着色器的主要逻辑 | |
192 | |
00:08:13,200 --> 00:08:14,500 | |
我们检查看看 | |
193 | |
00:08:14,500 --> 00:08:17,966 | |
我们的全局 ID 是否超出范围 | |
194 | |
00:08:18,333 --> 00:08:19,766 | |
如果超出范围 | |
195 | |
00:08:19,766 --> 00:08:22,700 | |
我们将简单地什么都不做就返回 | |
196 | |
00:08:22,866 --> 00:08:25,400 | |
之后我们可以做一些计算 | |
197 | |
00:08:25,500 --> 00:08:26,966 | |
在这里的意思是 | |
198 | |
00:08:27,066 --> 00:08:30,533 | |
在输出缓冲区中添加和存储值 | |
199 | |
00:08:30,900 --> 00:08:33,866 | |
如果我们将此与我们的简单实现进行比较 | |
200 | |
00:08:33,866 --> 00:08:35,100 | |
在 JavaScript 中 | |
201 | |
00:08:35,100 --> 00:08:37,533 | |
我们可以看到一些明显的差异 | |
202 | |
00:08:37,800 --> 00:08:39,066 | |
第一个区别是我们应该 | |
203 | |
00:08:39,066 --> 00:08:39,800 | |
清楚地定义输入 | |
204 | |
00:08:39,800 --> 00:08:41,533 | |
和我们着色器的输出 | |
205 | |
00:08:41,533 --> 00:08:43,266 | |
以及我们着色器的输出 | |
206 | |
00:08:43,500 --> 00:08:45,900 | |
第二个区别是我们应该检查 | |
207 | |
00:08:45,900 --> 00:08:49,133 | |
看看我们是否超出了预期范围 | |
208 | |
00:08:49,133 --> 00:08:51,133 | |
最后我们必须处理 | |
209 | |
00:08:51,133 --> 00:08:53,466 | |
称为工作组大小的东西 | |
210 | |
00:08:53,466 --> 00:08:56,100 | |
然后并且只有这样我们才能做到 | |
211 | |
00:08:56,100 --> 00:08:57,766 | |
我们的简单计算 | |
212 | |
00:08:57,900 --> 00:09:01,066 | |
计算着色器就像一个并行的 for 循环 | |
213 | |
00:09:01,166 --> 00:09:03,666 | |
您将 for 循环的主体放置 | |
214 | |
00:09:03,700 --> 00:09:07,166 | |
在计算着色器的 main 函数内 | |
215 | |
00:09:07,300 --> 00:09:09,733 | |
计算着色器的主要功能 | |
216 | |
00:09:09,766 --> 00:09:12,266 | |
进行一些数学计算 | |
217 | |
00:09:12,466 --> 00:09:14,700 | |
基于线程索引 | |
218 | |
00:09:14,800 --> 00:09:17,166 | |
也就是全局 ID | |
219 | |
00:09:17,566 --> 00:09:20,366 | |
这意味着您唯一可用的工具 | |
220 | |
00:09:20,366 --> 00:09:23,400 | |
这可以让您在每个线程中做不同的事情 | |
221 | |
00:09:23,400 --> 00:09:25,366 | |
是全局线程 ID | |
222 | |
00:09:25,733 --> 00:09:27,533 | |
所以当你写计算着色器的时候 | |
223 | |
00:09:27,533 --> 00:09:28,700 | |
这是极其 | |
224 | |
00:09:28,700 --> 00:09:31,966 | |
重要的是你要做所有的计算 | |
225 | |
00:09:32,100 --> 00:09:34,000 | |
基于这个索引 | |
226 | |
00:09:34,166 --> 00:09:35,366 | |
让我们退后一步 | |
227 | |
00:09:35,366 --> 00:09:38,500 | |
并讨论如何运行这段代码 | |
228 | |
00:09:38,500 --> 00:09:43,566 | |
在 GPU 上,现代 GPU 拥有大量的核心 | |
229 | |
00:09:43,666 --> 00:09:47,966 | |
我的 GPU 准确地说是 7680 | |
230 | |
00:09:48,200 --> 00:09:50,800 | |
但你的 GPU 可能不同 | |
231 | |
00:09:50,900 --> 00:09:54,066 | |
WebGPU 设计了一个强大的系统 | |
232 | |
00:09:54,133 --> 00:09:58,866 | |
适用于我将简化的每个 GPU | |
233 | |
00:09:59,166 --> 00:10:02,933 | |
GPU 计算层次结构的最小部分 | |
234 | |
00:10:03,000 --> 00:10:05,600 | |
是一个工作项或一个线程 | |
235 | |
00:10:05,866 --> 00:10:07,566 | |
我将阅读这篇文章的一部分 | |
236 | |
00:10:07,566 --> 00:10:09,400 | |
Ralph Lavigne 的文章来解释这一点 | |
237 | |
00:10:09,400 --> 00:10:10,600 | |
来解释这一点 | |
238 | |
00:10:11,000 --> 00:10:15,400 | |
一个 GPU Compute 中的线程就像传统的 CPU 线程 | |
239 | |
00:10:15,400 --> 00:10:17,333 | |
但有一些区别 | |
240 | |
00:10:17,600 --> 00:10:18,600 | |
而在 CPU 上 | |
241 | |
00:10:18,600 --> 00:10:21,700 | |
生成线程的成本很高 | |
242 | |
00:10:21,700 --> 00:10:24,166 | |
在 GPU 上它非常便宜 | |
243 | |
00:10:24,266 --> 00:10:27,566 | |
并且通常许多小线程模式 | |
244 | |
00:10:27,600 --> 00:10:30,700 | |
是实现性能的最佳方式 | |
245 | |
00:10:31,200 --> 00:10:34,366 | |
GPU 的硬件模型通常被描述为 | |
246 | |
00:10:34,366 --> 00:10:35,566 | |
单指令 | |
247 | |
00:10:35,666 --> 00:10:38,466 | |
多线程或 SIMT | |
248 | |
00:10:38,800 --> 00:10:41,666 | |
这类似于更传统的单指令 | |
249 | |
00:10:41,666 --> 00:10:42,733 | |
多数据 | |
250 | |
00:10:42,866 --> 00:10:44,400 | |
或 SIMD | |
251 | |
00:10:44,866 --> 00:10:47,600 | |
我们的着色器代码的主要功能将在 | |
252 | |
00:10:47,600 --> 00:10:49,333 | |
许多许多线程 | |
253 | |
00:10:49,333 --> 00:10:53,000 | |
因此它们都执行相同的任务,但很多 | |
254 | |
00:10:53,000 --> 00:10:55,100 | |
多次并行 | |
255 | |
00:10:55,466 --> 00:10:57,533 | |
全局 ID 内置变量 | |
256 | |
00:10:57,533 --> 00:11:01,900 | |
我们在着色器代码中看到的 是全局线程 ID | |
257 | |
00:11:02,066 --> 00:11:03,133 | |
工作组大小 | |
258 | |
00:11:03,133 --> 00:11:06,200 | |
我们在基本的计算着色器代码中看到的 | |
259 | |
00:11:06,300 --> 00:11:09,666 | |
描述我们要生成的线程数 | |
260 | |
00:11:09,900 --> 00:11:12,000 | |
每个工作组 | |
261 | |
00:11:12,133 --> 00:11:14,066 | |
这将我们引向下一个重要部分 | |
262 | |
00:11:14,066 --> 00:11:16,600 | |
GPU 计算层次结构 | |
263 | |
00:11:16,800 --> 00:11:20,200 | |
一个工作组 一个工作组是一组线程 | |
264 | |
00:11:20,200 --> 00:11:22,533 | |
通常并行执行 | |
265 | |
00:11:22,533 --> 00:11:25,200 | |
工作组中的所有线程都安排为一起运行 | |
266 | |
00:11:25,266 --> 00:11:27,300 | |
一在 WeGPU 中 | |
267 | |
00:11:27,300 --> 00:11:30,566 | |
工作负载被建模为三维网格 | |
268 | |
00:11:30,733 --> 00:11:34,500 | |
在那里每个立方体是一个线程,线程被分组为 | |
269 | |
00:11:34,500 --> 00:11:36,866 | |
更大的立方体形成一个 | |
270 | |
00:11:36,966 --> 00:11:40,200 | |
工作组 当我们定义工作组的大小时 | |
271 | |
00:11:40,200 --> 00:11:43,100 | |
我们正在定义三个维度的大小 | |
272 | |
00:11:43,100 --> 00:11:46,300 | |
意思是如果你没有指定一个组件 | |
273 | |
00:11:46,466 --> 00:11:48,100 | |
它将被设置为 1 | |
274 | |
00:11:48,300 --> 00:11:52,266 | |
因此这相当于这个 也等于这个 | |
275 | |
00:11:52,933 --> 00:11:55,333 | |
接下来我们要考虑的重点是 | |
276 | |
00:11:55,466 --> 00:11:57,933 | |
调度工作组 | |
277 | |
00:11:58,300 --> 00:12:01,100 | |
一次调度是一个计算单元 | |
278 | |
00:12:01,200 --> 00:12:04,533 | |
全部共享相同的输入和输出 | |
279 | |
00:12:04,666 --> 00:12:07,366 | |
我们只能调度工作组 | |
280 | |
00:12:07,466 --> 00:12:10,700 | |
意思是如果我们想运行 21 个线程 | |
281 | |
00:12:10,700 --> 00:12:13,266 | |
工作组大小为 64 | |
282 | |
00:12:13,400 --> 00:12:17,533 | |
我们需要至少运行着色器 64 次 | |
283 | |
00:12:17,533 --> 00:12:18,700 | |
来完成这个 | |
284 | |
00:12:19,133 --> 00:12:21,700 | |
所以我们不能运行单个线程 | |
285 | |
00:12:21,700 --> 00:12:24,933 | |
我们只能调度工作组 | |
286 | |
00:12:25,200 --> 00:12:29,200 | |
这将导致一些意想不到的事情,称为过度调度 | |
287 | |
00:12:29,500 --> 00:12:32,466 | |
我们只想为 21 个线程运行代码 | |
288 | |
00:12:32,466 --> 00:12:35,900 | |
但它将改为为 64 个线程运行 | |
289 | |
00:12:36,266 --> 00:12:38,366 | |
这正是我们需要检查的原因 | |
290 | |
00:12:38,366 --> 00:12:40,566 | |
我们没有超出界限 | |
291 | |
00:12:40,700 --> 00:12:42,900 | |
我们的 GPU 程序 | |
292 | |
00:12:43,266 --> 00:12:46,000 | |
要计算您需要运行的工作组数 | |
293 | |
00:12:46,066 --> 00:12:50,600 | |
您可以将 21 除以工作组大小,即 64 | |
294 | |
00:12:50,766 --> 00:12:53,866 | |
那会给你 0.32 | |
295 | |
00:12:53,966 --> 00:12:57,366 | |
但是,对该数字使用上限函数 | |
296 | |
00:12:57,500 --> 00:13:00,700 | |
您可以获得最少数量的工作组 | |
297 | |
00:13:00,900 --> 00:13:04,366 | |
在这种情况下,密封点 32 是 1 | |
298 | |
00:13:04,566 --> 00:13:08,666 | |
所以你只需要调度一个工作组 | |
299 | |
00:13:09,200 --> 00:13:10,700 | |
现在考虑到所有这一切 | |
300 | |
00:13:10,700 --> 00:13:14,066 | |
让我们运行这个 WebGPU 计算着色器 | |
301 | |
00:13:14,500 --> 00:13:17,066 | |
首先我们要定义一个操作类 | |
302 | |
00:13:17,066 --> 00:13:19,000 | |
它将处理着色器代码 | |
303 | |
00:13:19,000 --> 00:13:21,166 | |
对于每个矩阵运算 | |
304 | |
00:13:21,533 --> 00:13:23,733 | |
我们将构建一个统一的缓冲区 | |
305 | |
00:13:23,733 --> 00:13:26,533 | |
这是一个辅助类,您可以在 | |
306 | |
00:13:26,533 --> 00:13:27,700 | |
描述中获取 | |
307 | |
00:13:28,000 --> 00:13:30,933 | |
这个统一的缓冲区将使我们的着色器能够访问一些 | |
308 | |
00:13:30,933 --> 00:13:32,200 | |
重要的信息 | |
309 | |
00:13:32,300 --> 00:13:34,166 | |
比如矩阵的大小 | |
310 | |
00:13:34,200 --> 00:13:36,066 | |
计算器的界限 | |
311 | |
00:13:36,166 --> 00:13:38,133 | |
以及诸如此类的事情 | |
312 | |
00:13:38,666 --> 00:13:41,300 | |
然后我们将定义我们的绑定组布局 | |
313 | |
00:13:41,300 --> 00:13:44,266 | |
它描述了输入和输出的类型 | |
314 | |
00:13:44,266 --> 00:13:45,866 | |
我们正在发送到着色器 | |
315 | |
00:13:46,300 --> 00:13:47,166 | |
然后我们要 | |
316 | |
00:13:47,166 --> 00:13:47,666 | |
生成 | |
317 | |
00:13:47,666 --> 00:13:50,700 | |
我们的着色器代码并做一些字符串操作 | |
318 | |
00:13:50,766 --> 00:13:52,533 | |
注入自定义函数 | |
319 | |
00:13:52,700 --> 00:13:55,166 | |
执行我们需要的计算 | |
320 | |
00:13:55,500 --> 00:13:58,766 | |
然后我们将创建一个 WebGPU 管道 | |
321 | |
00:13:58,766 --> 00:14:01,666 | |
它基本上构造了一个计算管道 | |
322 | |
00:14:01,700 --> 00:14:05,666 | |
使用着色器模块和我们的绑定组布局 | |
323 | |
00:14:06,000 --> 00:14:09,866 | |
在 WebGPU 中,您需要定义您的绑定组布局 | |
324 | |
00:14:09,900 --> 00:14:13,666 | |
并非常具体地说明绑定的类型 | |
325 | |
00:14:13,733 --> 00:14:15,866 | |
你正在发送到 GPU | |
326 | |
00:14:16,366 --> 00:14:18,366 | |
最后我们要调用 compute | |
327 | |
00:14:18,366 --> 00:14:21,466 | |
这将创建一个 Compute Pass 编码器 | |
328 | |
00:14:21,733 --> 00:14:24,733 | |
编码器将调用 GPU | |
329 | |
00:14:24,800 --> 00:14:27,266 | |
来执行我们的指令 | |
330 | |
00:14:27,466 --> 00:14:29,366 | |
我们称之为调度工作组 | |
331 | |
00:14:29,366 --> 00:14:30,466 | |
它需要数量 | |
332 | |
00:14:30,466 --> 00:14:32,600 | |
我们想要调度的工作组 | |
333 | |
00:14:32,733 --> 00:14:34,200 | |
我们计算那个数字 | |
334 | |
00:14:34,200 --> 00:14:37,066 | |
通过获取我们想要调度的线程数 | |
335 | |
00:14:37,066 --> 00:14:39,900 | |
并将其除以工作组大小 | |
336 | |
00:14:40,066 --> 00:14:43,366 | |
并使用密封来计算最小数量 | |
337 | |
00:14:43,366 --> 00:14:45,466 | |
我们需要运行的工作组 | |
338 | |
00:14:45,666 --> 00:14:47,533 | |
为了完成工作 | |
339 | |
00:14:47,800 --> 00:14:49,400 | |
让我们谈谈一些重要的 | |
340 | |
00:14:49,400 --> 00:14:51,866 | |
我故意遗漏的部分 | |
341 | |
00:14:51,966 --> 00:14:53,600 | |
为了简单起见 | |
342 | |
00:14:54,100 --> 00:14:54,666 | |
首先 | |
343 | |
00:14:54,666 --> 00:14:57,766 | |
让我们总结一下创建单个步骤的所有步骤 | |
344 | |
00:14:57,766 --> 00:14:59,166 | |
矩阵运算 | |
345 | |
00:14:59,366 --> 00:15:02,200 | |
我们需要创建一些带有一些数据的矩阵 | |
346 | |
00:15:02,533 --> 00:15:05,100 | |
创建矩阵运算计算着色器 | |
347 | |
00:15:05,333 --> 00:15:06,900 | |
设置我们想要的矩阵 | |
348 | |
00:15:06,900 --> 00:15:09,866 | |
使用绑定组作为输入和输出 | |
349 | |
00:15:09,966 --> 00:15:12,733 | |
更新 uniform 最后 | |
350 | |
00:15:12,966 --> 00:15:14,333 | |
执行着色器 | |
351 | |
00:15:14,333 --> 00:15:17,200 | |
通过调度多个工作组 | |
352 | |
00:15:17,466 --> 00:15:21,166 | |
为了在 GPU 领域读取和写入数据 | |
353 | |
00:15:21,266 --> 00:15:24,666 | |
我们需要创建一种叫做 GPU 缓冲区的东西 | |
354 | |
00:15:24,733 --> 00:15:28,666 | |
对我来说,GPU 缓冲区是存在于 GPU 上的数组 | |
355 | |
00:15:29,000 --> 00:15:34,133 | |
可以通过提供两个主要参数来创建 GPU 缓冲区 | |
356 | |
00:15:34,133 --> 00:15:37,066 | |
以字节为单位指定的大小 | |
357 | |
00:15:37,066 --> 00:15:38,100 | |
和一个用法 | |
358 | |
00:15:38,100 --> 00:15:42,533 | |
我们通过利用 GPU 缓冲区使用标志来设置 | |
359 | |
00:15:42,766 --> 00:15:45,066 | |
创建 GPU 缓冲区时 | |
360 | |
00:15:45,100 --> 00:15:47,733 | |
您需要非常小心尺寸 | |
361 | |
00:15:47,766 --> 00:15:49,566 | |
因为如果缓冲区的大小 | |
362 | |
00:15:49,566 --> 00:15:52,266 | |
超过 WebGPU 限制 | |
363 | |
00:15:52,333 --> 00:15:54,933 | |
缓冲区创建将失败 | |
364 | |
00:15:55,366 --> 00:15:56,700 | |
在视频开始时 | |
365 | |
00:15:56,700 --> 00:15:59,400 | |
当我最初加载显赫数据集时 | |
366 | |
00:15:59,500 --> 00:16:02,566 | |
我收到一个错误,没有很好的解释 | |
367 | |
00:16:02,800 --> 00:16:04,566 | |
正是出于这个原因 | |
368 | |
00:16:05,100 --> 00:16:06,666 | |
我的粗略解决方法 | |
369 | |
00:16:06,666 --> 00:16:10,800 | |
是将 eminst 图像的数量除以 2 | |
370 | |
00:16:11,100 --> 00:16:14,200 | |
这就解决了缓冲区创建问题 | |
371 | |
00:16:14,400 --> 00:16:15,300 | |
在这种情况下 | |
372 | |
00:16:15,300 --> 00:16:18,200 | |
我们正在使用复制目标标志 | |
373 | |
00:16:18,300 --> 00:16:19,566 | |
这意味着这个缓冲区 | |
374 | |
00:16:19,566 --> 00:16:21,733 | |
将用作目的地 | |
375 | |
00:16:21,800 --> 00:16:24,366 | |
在 GPU 复制操作中 | |
376 | |
00:16:24,533 --> 00:16:26,200 | |
复制到这个 GPU 缓冲区 | |
377 | |
00:16:26,200 --> 00:16:29,200 | |
允许我们将数据从一个矩阵移动到另一个矩阵 | |
378 | |
00:16:29,266 --> 00:16:31,066 | |
非常快速和容易 | |
379 | |
00:16:31,066 --> 00:16:33,966 | |
我们还使用了复制源标志 | |
380 | |
00:16:34,066 --> 00:16:36,466 | |
这意味着这个缓冲区将被使用 | |
381 | |
00:16:36,466 --> 00:16:37,500 | |
作为源 | |
382 | |
00:16:37,500 --> 00:16:39,900 | |
在 GPU 复制操作中 | |
383 | |
00:16:40,200 --> 00:16:43,100 | |
当我们想要读回数据时,这很有用 | |
384 | |
00:16:43,100 --> 00:16:45,166 | |
稍后到 CPU | |
385 | |
00:16:45,366 --> 00:16:48,300 | |
最后我们使用存储标志 | |
386 | |
00:16:48,333 --> 00:16:50,733 | |
它创建了一个存储缓冲区 | |
387 | |
00:16:50,900 --> 00:16:53,533 | |
存储缓冲区是大型数据数组 | |
388 | |
00:16:53,566 --> 00:16:55,900 | |
适用于我们的用例 | |
389 | |
00:16:55,966 --> 00:16:58,700 | |
操作着色器将使用所有东西运行 | |
390 | |
00:16:58,700 --> 00:16:59,866 | |
我们提供的 | |
391 | |
00:16:59,966 --> 00:17:03,100 | |
问题是要读取 GPU 缓冲区数据 | |
392 | |
00:17:03,166 --> 00:17:07,666 | |
我们必须将 GPU 缓冲区移动到 CPU 领域 | |
393 | |
00:17:07,900 --> 00:17:11,133 | |
为此,我们将创建一个临时缓冲区 | |
394 | |
00:17:11,266 --> 00:17:14,933 | |
通过将 GPU 缓冲区复制到该临时缓冲区中 | |
395 | |
00:17:14,966 --> 00:17:17,800 | |
我们将能够调用 map async | |
396 | |
00:17:17,900 --> 00:17:19,266 | |
这将给我们 | |
397 | |
00:17:19,400 --> 00:17:21,700 | |
结果的数组缓冲区 | |
398 | |
00:17:21,966 --> 00:17:22,400 | |
我要去 | |
399 | |
00:17:22,400 --> 00:17:25,100 | |
构建几个操作并将它们存储在一个 | |
400 | |
00:17:25,100 --> 00:17:27,100 | |
操作管理员 class | |
401 | |
00:17:27,133 --> 00:17:30,266 | |
通过这种方式,我们在 | |
402 | |
00:17:30,266 --> 00:17:31,933 | |
初始化时间 | |
403 | |
00:17:32,300 --> 00:17:36,066 | |
终于到了使用这个矩阵计算系统的时候了 | |
404 | |
00:17:36,400 --> 00:17:37,166 | |
我要创造 | |
405 | |
00:17:37,166 --> 00:17:39,766 | |
两个具有一些随机整数值的矩阵 | |
406 | |
00:17:39,766 --> 00:17:41,166 | |
介于 1 和 4 之间 | |
407 | |
00:17:41,366 --> 00:17:44,100 | |
然后我要创建第三个矩阵 | |
408 | |
00:17:44,266 --> 00:17:46,900 | |
保存乘法数据 | |
409 | |
00:17:47,466 --> 00:17:49,533 | |
我要打印前两个矩阵 | |
410 | |
00:17:49,533 --> 00:17:50,200 | |
然后我要 | |
411 | |
00:17:50,200 --> 00:17:52,500 | |
执行乘法过程 | |
412 | |
00:17:52,666 --> 00:17:55,266 | |
并打印结果矩阵 | |
413 | |
00:17:55,666 --> 00:17:58,966 | |
如果我们检查控制台,它可以工作 | |
414 | |
00:17:58,966 --> 00:18:01,133 | |
我们得到了我们想要的 | |
415 | |
00:18:01,500 --> 00:18:05,166 | |
我花了一整个星期添加更多操作 | |
416 | |
00:18:05,266 --> 00:18:06,766 | |
优化系统 | |
417 | |
00:18:06,966 --> 00:18:10,733 | |
最后运行神经网络训练循环 | |
418 | |
00:18:10,766 --> 00:18:12,800 | |
在 MNS 数据集上 | |
419 | |
00:18:13,000 --> 00:18:15,900 | |
在我实施所有优化技巧之后 | |
420 | |
00:18:15,966 --> 00:18:17,300 | |
我设法得到了 | |
421 | |
00:18:17,400 --> 00:18:22,466 | |
4 秒训练时间准确率达 78% | |
422 | |
00:18:22,666 --> 00:18:24,366 | |
最初我想谈谈 | |
423 | |
00:18:24,366 --> 00:18:25,866 | |
优化和 | |
424 | |
00:18:25,866 --> 00:18:28,266 | |
我认为这个视频已经太长了 | |
425 | |
00:18:28,500 --> 00:18:32,166 | |
所以如果我谈论它可能会更好 | |
426 | |
00:18:32,166 --> 00:18:35,100 | |
如果您有兴趣观看我的下一个 devvlog | |
427 | |
00:18:35,300 --> 00:18:36,400 | |
点击喜欢按钮 | |
428 | |
00:18:36,400 --> 00:18:39,266 | |
我会确保制作更多 WebGPU 内容 | |
429 | |
00:18:39,400 --> 00:18:41,300 | |
在未来的剧集中 | |
430 | |
00:18:41,566 --> 00:18:44,500 | |
如果你想快速学习计算机 | |
431 | |
00:18:44,700 --> 00:18:46,766 | |
我提供一对一指导 | |
432 | |
00:18:46,966 --> 00:18:49,933 | |
如需预订电话,请查看下面的描述 | |
433 | |
00:18:50,300 --> 00:18:52,566 | |
如果您想与我一起做一个项目 | |
434 | |
00:18:52,566 --> 00:18:55,066 | |
并在网络上构建实时应用程序 | |
435 | |
00:18:55,066 --> 00:18:58,766 | |
使用 threejs 或 WebGPU | |
436 | |
00:19:02,766 --> 00:19:04,066 | |
在我说再见之前 | |
437 | |
00:19:04,066 --> 00:19:06,733 | |
让我给你一些很棒的资源,让你更进一步 | |
438 | |
00:19:06,733 --> 00:19:08,400 | |
学习计算机 | |
439 | |
00:19:08,400 --> 00:19:10,066 | |
和 WebGPU | |
440 | |
00:19:10,366 --> 00:19:13,366 | |
我强烈建议您阅读 Surma 的这篇文章 | |
441 | |
00:19:13,566 --> 00:19:14,733 | |
因为它会给你 | |
442 | |
00:19:14,733 --> 00:19:16,966 | |
更多详细信息 | |
443 | |
00:19:17,133 --> 00:19:18,733 | |
关于计算着色器 | |
444 | |
00:19:18,766 --> 00:19:21,266 | |
我在创建此视频时使用了本文作为参考 | |
445 | |
00:19:21,266 --> 00:19:22,466 | |
所以一定要检查一下 | |
446 | |
00:19:22,533 --> 00:19:24,333 | |
太棒了,我也强烈推荐您查看 | |
447 | |
00:19:24,500 --> 00:19:27,766 | |
WebGPU 基础知识 | |
448 | |
00:19:29,600 --> 00:19:31,200 | |
因为它为您提供了一组很棒的 | |
449 | |
00:19:31,200 --> 00:19:33,966 | |
文章帮助您学习 WebGPU | |
450 | |
00:19:34,333 --> 00:19:35,966 | |
伙计们,这个视频就到这里了 | |
451 | |
00:19:36,000 --> 00:19:37,966 | |
我希望你今天学到了一些东西 | |
452 | |
00:19:38,166 --> 00:19:40,533 | |
我们下期视频再见 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment