Personal Website
Personal Website
Home
Posts
Projects
Light
Dark
Automatic
Posts
PMPP Reading notes
本篇文章更新自己在阅读PMPP book的过程中记录的有趣的点。 Chapter 3: Scalable parallel execution __syncthreads()会barrier一整个block中的所有threads,需要注意的是,如果在if-else branch中的每个分支都出现了该barrier,那么我们需要保证线程之间不会互相等待。 线程在被分配资源时,按照block-by-block的顺序分配。资源被组织在SM(Streaming Multiprocessor)上。每个设备会从以下几个方面限制资源的使用率: 每个SM允许分配的最大block数量 每个CUDA device上处于活跃状态的block数量 每个SM允许分配的最大线程数量 在线程被block-by-block的分配到SM上以后,会按照warp的方式调度。同一个warp中的线程有相同的执行时间。每一个SM可以同时执行一小部分warp,这里有一个问题是:为什么SM没有能力在同一时刻执行所有的warp,但是我们仍然需要很多warp?这是为了通过调度来掩盖long-latency ops,比如全局内存访问。 这种调度方式带来的另一个好处是我们不需要像CPU那样准备很多很大的cache,从而可以把chip上更多的区域出让给浮点数计算单元等。
Last updated on Feb 12, 2025
1 min read
CS 144
notes of Stanford CS144 Computer networks
Last updated on Mar 25, 2024
CS course notes
CMU 15445
Database course notes
Last updated on Oct 2, 2023
1 min read
CS course notes
Distributed System course by Martin Kleppmann
notes of Distributed System course by Martin Kleppmann
Last updated on Oct 7, 2023
CS course notes
An intro survey of Federated Learning Privacy Protection
notes of the Future of Decentralization, AI, and Computing Summit at UC Berkeley
Last updated on Mar 8, 2024
8 min read
Talks
CS 15418
notes of 15148 Parallel Computer Architecture and Programming
Last updated on Oct 3, 2023
CS course notes
CS 61C
notes of 61C Great Ideas in Computer Architecture (Machine Structures)
Last updated on Aug 31, 2023
CS course notes
CS 106L
notes of 106L Standard cpp Programming
Last updated on Mar 26, 2024
CS course notes
,
Programming Language
CS 106x & UIUC CS 225
notes of CS 106x & UIUC CS 225 Data Structures and Algorithms
Last updated on Aug 31, 2023
CS course notes
CMake-Git
notes of how to use CMake and Git
Last updated on Aug 31, 2023
Tools
»
Cite
×