跳转至

Index

ch2 专业阶段新增了 GPGPU 体系结构与建模专题(2 篇),是本期训练营的重点实验项目。该实验基于 QEMU 实现了一个 RISC-V GPGPU 设备原型,目前已完成原型开发。

简化架构图(参考 Vortex 设计):

+----------------------------------------------------------------------------------------+
| Guest OS (RISC-V or Other Arch)    GPGPU App/Kernel --> GPGPU Driver (MMIO + DMA)      |
+--------------------------------------+-------------------------------------------------+
                                       | PCIe
=======================================|==================================================
                         QEMU          | Device Model
=======================================|==================================================
+--------------------------------------v-------------------------------------------------+
| PCIe Frontend (gpgpu.c)                                                                |
|                                                                                        |
| BAR0 (CTRL 1MB)             BAR2 (VRAM 64MB)             BAR4 (DOORBELL 64KB)          |
| +----------------------+    +----------------------+      +----------------------+     |
| | Kernel Dispatch      |    |                      |      | DMA Engine           |     |
| |   kernel_addr/args   |    | BAR map (PCIe window)+--+   |   src/dst/size/ctrl  |     |
| |   grid_dim  (X,Y,Z)  |    |                      |  |   | MSI-X (4 vectors)    |     |
| |   block_dim (X,Y,Z)  |    |                      |  |   | IRQ enable/pending   |     |
| | Global Control       |    +----------------------+  |   +----------------------+     |
| | IRQ Status           |                              |                                |
| +----------+-----------+                              |                                |
|            | dispatch                                 | map                            |
+------------+------------------------------------------+--------------------------------+
             |                                          |
+------------v------------------------------------------+--------------------------------+
| SIMT Backend (gpgpu_core.c)                           |                                |
|                                                       |                                |
| +----------------------+                              |                                |
| | VRAM (64MB)          | <-- PCIe BAR2 maps here +----+                                |
| | GPU Local Memory     |                                                               |
| +----------^-----------+                                                               |
|            | ld/st                                                                     |
|                                                                                        |
| Grid --> Block(0,0)  Block(1,0)  Block(2,0) ...                                        |
|               |                                                                        |
|               v                                                                        |
|       +--- Block ------------------------------------------------------------+         |
|       |                                                                      |         |
|       |  +--- Warp 0 --------+  +--- Warp 1 --------+  +--- Warp 2 --+       |         |
|       |  | Lane 0 .. Lane 31 |  | Lane 0 .. Lane 31 |  | Lane 0..31  |       |         |
|       |  | +----+     +----+ |  | +----+     +----+ |  | +----+      |       |         |
|       |  | | PC |     | PC | |  | | PC |     | PC | |  | | PC |      |  ...  |         |
|       |  | | x0 |     | x0 | |  | | x0 |     | x0 | |  | | x0 |      |       |         |
|       |  | |... |     |... | |  | |... |     |... | |  | |... |      |       |         |
|       |  | |x31 |     |x31 | |  | |x31 |     |x31 | |  | |x31 |      |       |         |
|       |  | +----+     +----+ |  | +----+     +----+ |  | +----+      |       |         |
|       |  | active_mask (32b) |  | active_mask (32b) |  | active_mask |       |         |
|       |  +-------------------+  +-------------------+  +-------------+       |         |
|       |                                                                      |         |
|       | barrier / sync            mhartid = [block|warp|thread]              |         |
|       +----------------------------------------------------------------------+         |
|                                                                                        |
+----------------------------------------------------------------------------------------+

主要特性

  • SIMT 执行模型:支持 Thread/Block/Grid 层级的线程组织与 Warp 调度
  • PCIe 设备实现:作为标准 PCIe 设备挂载,支持 BAR/MMIO、DMA、MSI-X
  • QTest 测试框架:集成 QEMU QTest 基础设施进行设备级自动化测试
  • 前后端分层架构:PCIe 前端负责命令队列与寄存器交互,cmodel 后端执行 kernel 计算

考核方式

  • 基于 Qtest 框架搭建 GPGPU 测题集,用于验证功能完备性,根据测题 Pass 数目计算学员得分
  • 开放题目:基于该 GPGPU 设计一个简单的 AI 软件栈(编程模型 + 驱动),类 cuda 风格
  • 开放题目:直接将 Vortex 的 simx 集成到 QEMU 当中,并将其 AI 软件栈适配 ArceOS/rCore