Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
arXiv 2024
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Xiaoxuan Liu,
Cade Daniel,
Langxiang Hu,
Woosuk Kwon,
Zhuohan Li,
Xiangxi Mo,
Alvin Cheung,
Zhijie Deng,
Ion Stoica,
Hao Zhang
arXiv 2024
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon,
Zhuohan Li,
Siyuan Zhuang,
Ying Sheng,
Lianmin Zheng,
Cody Hao Yu,
Joseph E. Gonzalez,
Hao Zhang,
Ion Stoica
SOSP 2023
SkyPilot: An Intercloud Broker for Sky Computing
Zongheng Yang,
Zhanghao Wu,
Michael Luo,
Wei-Lin Chiang,
Romil Bhardwaj,
Woosuk Kwon,
Siyuan Zhuang,
Frank Sifei Luan,
Gautam Mittal,
Scott Shenker,
Ion Stoica
NSDI 2023
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon,
Sehoon Kim,
Michael W. Mahoney,
Joseph Hassoun,
Kurt Keutzer,
Amir Gholami
NeurIPS 2022
Learned Token Pruning for Transformers
Sehoon Kim,
Sheng Shen,
David Thorsley,
Amir Gholami,
Woosuk Kwon,
Joseph Hassoun,
Kurt Keutzer
KDD 2022
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Woosuk Kwon,
Gyeong-In Yu,
Eunji Jeong,
Byung-Gon Chun
NeurIPS 2020 (Spotlight)
Graphene: Strong yet Lightweight Row Hammer Protection
Yeonhong Park,
Woosuk Kwon,
Eojin Lee,
Tae Jun Ham,
Jung Ho Ahn,
Jae W. Lee
MICRO 2020 (IEEE Micro Top Picks Honorable Mention)