vLLM Framework Source Code Analysis: Block Allocation and Management
1. Block Overview A significant innovation of vLLM is the division of the physical layer GPU and CPU available memory into several blocks, which effectively reduces memory fragmentation issues. Specifically, vLLM’s blocks are divided into logical and physical levels, with a mapping relationship between the two. The following diagram explains the relationship between the two … Read more