About Me

I am a Ph.D. student at UC Berkeley, advised by Prof. Ion Stoica. I am broadly interested in LLM inference and AI infra.

woosuk.kwon[at]berkeley.edu

Education
  • Ph.D. in Computer Science

    UC Berkeley, 2021 - Present

  • B.S. in Computer Science and Mathematics

    Seoul National University, 2015 - 2021

Current Projects

vLLM
A high-throughput and memory-efficient inference and serving engine for LLMs.
vLLM

Publications

Gemma 2: Improving Open Language Models at a Practical Size
arXiv 2024
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
arXiv 2024
Efficient Memory Management for Large Language Model Serving with PagedAttention
SOSP 2023
SkyPilot: An Intercloud Broker for Sky Computing
NSDI 2023
A Fast Post-Training Pruning Framework for Transformers
NeurIPS 2022
Learned Token Pruning for Transformers
KDD 2022
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
NeurIPS 2020 (Spotlight)
Graphene: Strong yet Lightweight Row Hammer Protection
MICRO 2020 (IEEE Micro Top Picks Honorable Mention)