HPCA/CGO/PPoPP/CC 2026 Program
This program is tentative and subject to change.
Sat 31 JanDisplayed time zone: Hobart change
07:45 - 16:00 | |||
08:45 - 10:30 | |||
08:45 - 10:30 | MLPerf-BenchHPCA Workshops and Tutorials at Bilgola Website: https://sites.google.com/g.harvard.edu/mlperf-bench-hpca26/home | ||
08:45 - 10:30 | CACHPPPoPP Workshops and Tutorials at Bondi Chair(s): Jose Nelson Amaral University of Alberta, Bruce Hoppe Massachusetts Institute of Technology, Yihan Sun University of California, Riverside Website with schedule: https://fastcode.org/events/coevolution-workshop/ | ||
08:45 - 10:30 | |||
08:45 - 10:30 | |||
08:45 - 10:30 | |||
08:45 - 10:30 | Opening and Keynote TalkCC Main Conference at Coogee Chair(s): Uday Bondhugula Indian Institute of Science | ||
09:00 15mDay opening | Opening note from program chairs CC Main Conference Uday Bondhugula Indian Institute of Science | ||
09:15 75mKeynote | Building Compilers for AI Accelerators: Lessons from Real Hardware CC Main Conference | ||
08:45 - 10:30 | Arch4HealthHPCA Workshops and Tutorials at Cronulla Website with schedule: https://events.safari.ethz.ch/hpca26-arch4health/ | ||
08:45 - 10:30 | MLIRPPoPP Workshops and Tutorials at Curl Curl Chair(s): Kunwar Grover AMD, Mahesh Ravishankar AMD, Saday Sadayappan University of Utah, USA | ||
10:30 - 11:00 | |||
10:30 30mCoffee break | Break Catering | ||
11:00 - 12:45 | |||
11:00 - 12:45 | |||
11:00 - 12:45 | CACHPPPoPP Workshops and Tutorials at Bondi Chair(s): Jose Nelson Amaral University of Alberta, Bruce Hoppe Massachusetts Institute of Technology, Yihan Sun University of California, Riverside Website with schedule: https://fastcode.org/events/coevolution-workshop/ | ||
11:00 - 12:45 | |||
11:00 - 12:45 | |||
11:00 - 12:45 | |||
11:00 - 12:45 | |||
11:00 26mTalk | GraalMHC: ML-Based Method-Hotness Classification for Binary-Size Reduction in Optimizing Compilers CC Main Conference Milan Cugurovic Oracle and University of Belgrade, Aleksandar Prokopec Oracle Labs, Boris Spasojevic Oracle Labs, Zurich, Switzerland, Vojin Jovanovic Oracle Labs, Milena Vujosevic Janicic University of Belgrade and Oracle | ||
11:26 26mTalk | It’s about Time - Temporal Abstractions for Asynchronous GPU Tensor Computations CC Main Conference | ||
11:52 26mTalk | Optimizing Sparse Tensor Compilation for Sparse Output CC Main Conference Shideh Hashemian University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh, Amir Shaikhha University of Edinburgh | ||
12:18 26mTalk | RIFS: Run-time Invariant Function Specialization CC Main Conference Saba Jamilan University of California, Santa Cruz, Snehasish Kumar Google LLC, Heiner Litz UC Santa Cruz | ||
11:00 - 12:45 | Arch4HealthHPCA Workshops and Tutorials at Cronulla Website with schedule: https://events.safari.ethz.ch/hpca26-arch4health/ | ||
11:00 - 12:45 | MLIRPPoPP Workshops and Tutorials at Curl Curl Chair(s): Kunwar Grover AMD, Mahesh Ravishankar AMD, Saday Sadayappan University of Utah, USA | ||
12:45 - 13:45 | |||
12:45 60mLunch | Lunch Catering | ||
13:45 - 15:30 | |||
13:45 - 15:30 | |||
13:45 - 15:30 | |||
13:45 - 15:30 | |||
13:45 - 15:30 | |||
13:45 - 15:30 | Optimizations for safety and moreCC Main Conference at Coogee Chair(s): V Krishna Nandivada IIT Madras | ||
13:45 26mTalk | DiTOX: Fault Detection and Localization in the ONNX Optimizer CC Main Conference | ||
14:11 26mTalk | SSMR: Statically Detecting Speculation Safe Memory Regions to Mitigate Transient Execution Attacks CC Main Conference Ange-Thierry Ishimwe University of Colorado Boulder, Sam Mcdiarmid-sterling University of Colorado Boulder, Zack McKevitt University of Colorado Boulder, Tamara Silbergleit Lehman University of Colorado Boulder | ||
14:37 26mTalk | CHEHAB: Automatic Compiler Code Optimization for Fully Homomorphic Encryption CC Main Conference Riyadh Baghdadi New York University Abu Dhabi, Abdessamed Seddiki New York University Abu Dhabi and Ecole Superieure d'Informatique, Arab Mohammed New York University Abu Dhabi and Ecole Superieure d'Informatique, Zakaria Hebbal Ecole nationale Supérieure d'Informatique, Aimad Chabounia Ecole Superieure d'Informatique; New York University Abu Dhabi, Eduardo Chielle New York University Abu Dhabi, Michail Maniatakos New York University Abu Dhabi, MENACER Djamel Eddine Ecole Superieure d'Informatique, Karima Benatchba Ecole Nationale Supérieure d'Informatique, Challal Yacine University of Doha for Science and Technology | ||
15:03 26mTalk | Parallel and Customizable Equality Saturation CC Main Conference Jonathan Van der Cruysse McGill University, Abd-El-Aziz Zayed McGill University, Mai Jacob Peng McGill University, Christophe Dubach McGill University | ||
13:45 - 15:30 | Arch4HealthHPCA Workshops and Tutorials at Cronulla Website with schedule: https://events.safari.ethz.ch/hpca26-arch4health/ | ||
13:45 - 15:30 | MLIRPPoPP Workshops and Tutorials at Curl Curl Chair(s): Kunwar Grover AMD, Mahesh Ravishankar AMD, Saday Sadayappan University of Utah, USA | ||
15:30 - 16:00 | |||
15:30 30mCoffee break | Break Catering | ||
16:00 - 17:45 | |||
16:00 - 17:45 | |||
16:00 - 17:45 | |||
16:00 - 17:45 | |||
16:00 - 17:45 | |||
16:00 - 17:45 | |||
16:00 26mTalk | Accelerating Sparse Algebra with Program Synthesis CC Main Conference José Wesley De Souza Magalhães University of Edinburgh, Shideh Hashemian University of Edinburgh, Alexander Brauckmann University of Edinburgh, Jackson Woodruff University of Edinburgh, Elizabeth Polgreen University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh | ||
16:26 26mTalk | Schedgehammer: Auto-Tuning Compiler Optimizations Beyond Numerical Parameters CC Main Conference Johannes Lenfers University of Münster, Martin Lücke AMD, Sven Spehr University of Münster, Justus Dieckmann University of Münster, Johannes Jansen University of Münster, Sergei Gorlatch University of Muenster | ||
16:52 26mTalk | TinyGen: Portable and Compact Code Generation for Tiny Machine Learning CC Main Conference | ||
17:18 26mTalk | CPerfSmith - A Randomized C Program Generator for Performance-Oriented Compiler Testing CC Main Conference Boda Yashwanth Indian institute of Technology Roorkee, Chunduri Abhijit Indian institute of Technology Roorkee, Ruchi Kumari Indian institute of Technology Roorkee, Awanish Pandey IIT Roorkee | ||
16:00 - 17:45 | Arch4HealthHPCA Workshops and Tutorials at Cronulla Website with schedule: https://events.safari.ethz.ch/hpca26-arch4health/ | ||
16:00 - 17:45 | MLIRPPoPP Workshops and Tutorials at Curl Curl Chair(s): Kunwar Grover AMD, Mahesh Ravishankar AMD, Saday Sadayappan University of Utah, USA | ||
Sun 1 FebDisplayed time zone: Hobart change
07:45 - 19:00 | |||
08:45 - 10:30 | |||
08:45 - 10:30 | |||
08:45 - 10:30 | LATHCCGO Workshops and Tutorials at Bronte Website with schedule: https://jnamaral.github.io/LATHC26/ | ||
08:45 - 10:30 | |||
08:45 - 10:30 | MCCSysHPCA Workshops and Tutorials at Collaroy Website with schedule: https://events.safari.ethz.ch/hpca26-MCCSys/ | ||
08:45 - 10:30 | |||
08:45 20mTalk | Inside VOLT: Designing of an Open-Source GPU Compiler (Tool) CC Main Conference Shinnung Jeong Georgia Institute of Technology, Chihyo Ahn Georgia Tech, Huanzhi Pu Georgia Institute of Technology, Jisheng Zhao Georgia Institute of Technology, Hyesoon Kim Georgia Institute of Technology, Blaise Tine University of California, Los Angeles | ||
09:05 20mTalk | Nsight Python: A Python-First Profiling Toolkit for Seamless GPU Kernel Analysis (Tool) CC Main Conference | ||
09:30 60mPanel | Panel: The role of compilers in the era of AI chips and programming frameworks CC Main Conference P: Ayal Zaks Mobileye, P: Albert Cohen Google DeepMind, P: Nicholas Smith Tenstorrent, P: Uday Bondhugula Indian Institute of Science | ||
08:45 - 10:30 | |||
08:45 - 10:30 | ScaleDNNPPoPP Workshops and Tutorials at Curl Curl Chair(s): Dhabaleswar K. Panda Ohio State University, Nawras Alnaasan Ohio State University Website with schedule: https://nowlab.cse.ohio-state.edu/tutorials/hidl_PPoPP26/ | ||
10:30 - 11:00 | |||
10:30 30mCoffee break | Break Catering | ||
11:00 - 12:45 | |||
11:00 - 12:45 | |||
11:00 - 12:45 | LATHCCGO Workshops and Tutorials at Bronte Website with schedule: https://jnamaral.github.io/LATHC26/ | ||
11:00 - 12:45 | |||
11:00 - 12:45 | MCCSysHPCA Workshops and Tutorials at Collaroy Website with schedule: https://events.safari.ethz.ch/hpca26-MCCSys/ | ||
11:00 - 12:45 | |||
11:00 26mTalk | HORIZON: Estimating Alias Analysis Precision Bounds and Their Impact on Performance CC Main Conference | ||
11:26 26mTalk | Type Deduction Analysis: Reconstructing Transparent Pointer Types in LLVM-IR CC Main Conference Niccolò Nicolosi Politecnico di Milano, Gabriele Magnani Politecnico di Milano, Emilio Corigliano Politecnico di Milano, Davide Baroffio Politecnico di Milano, Federico Reghenzani Politecnico di Milano, Giovanni Agosta Politecnico di Milano, Italy | ||
11:52 26mTalk | Compact Representation and Interleaved Solving for Scalable Constraint-Based Points-to Analysis CC Main Conference | ||
12:18 26mTalk | Practical MHP Analysis for Java CC Main Conference | ||
11:00 - 12:45 | |||
11:00 - 12:45 | ScaleDNNPPoPP Workshops and Tutorials at Curl Curl Chair(s): Dhabaleswar K. Panda Ohio State University, Nawras Alnaasan Ohio State University Website with schedule: https://nowlab.cse.ohio-state.edu/tutorials/hidl_PPoPP26/ | ||
12:45 - 13:45 | |||
12:45 60mLunch | Lunch Catering | ||
13:45 - 15:30 | |||
13:45 - 15:30 | |||
13:45 - 15:30 | DiffPPPPoPP Workshops and Tutorials at Bondi Chair(s): Paul Hovland Argonne National Laboratory, Jan Hueckelheim Argonne National Laboratory Website with schedule: https://diffprog-ppopp.github.io/ | ||
13:45 - 15:30 | LATHCCGO Workshops and Tutorials at Bronte Website with schedule: https://jnamaral.github.io/LATHC26/ | ||
13:45 - 15:30 | DDRPPPoPP Workshops and Tutorials at Bungan Chair(s): Umang Mathur National University of Singapore, Andreas Pavlogiannis Aarhus University More information at https://sites.google.com/view/race-prediction-tutorial. | ||
13:45 - 15:30 | MCCSysHPCA Workshops and Tutorials at Collaroy Website with schedule: https://events.safari.ethz.ch/hpca26-MCCSys/ | ||
13:45 - 15:30 | |||
13:45 - 15:30 | DPC4HPCA Workshops and Tutorials at Curl Curl Website with schedule: https://sites.google.com/view/dpc4-2026/ | ||
15:30 - 16:00 | |||
15:30 30mCoffee break | Break Catering | ||
16:00 - 17:45 | |||
16:00 - 17:45 | |||
16:00 - 17:45 | DiffPPPPoPP Workshops and Tutorials at Bondi Chair(s): Paul Hovland Argonne National Laboratory, Jan Hueckelheim Argonne National Laboratory Website with schedule: https://diffprog-ppopp.github.io/ | ||
16:00 - 17:45 | LATHCCGO Workshops and Tutorials at Bronte Website with schedule: https://jnamaral.github.io/LATHC26/ | ||
16:00 - 17:45 | DDRPPPoPP Workshops and Tutorials at Bungan Chair(s): Umang Mathur National University of Singapore, Andreas Pavlogiannis Aarhus University More information at https://sites.google.com/view/race-prediction-tutorial. | ||
16:00 - 17:45 | MCCSysHPCA Workshops and Tutorials at Collaroy Website with schedule: https://events.safari.ethz.ch/hpca26-MCCSys/ | ||
16:00 - 17:45 | |||
16:00 - 17:45 | DPC4HPCA Workshops and Tutorials at Curl Curl Website with schedule: https://sites.google.com/view/dpc4-2026/ | ||
18:00 - 20:00 | Welcome ReceptionCatering at Parkside Ballroom All attendees registered for the main conference are invited to attend the welcome reception from 18:00 on Sunday evening, where there will be great food and drink and an opportunity to engage with the vibrant HPCA/CGO/PPoPP/CC community. | ||
18:00 2hSocial Event | Welcome Reception Catering | ||
18:00 - 20:00 | |||
18:00 2hPoster | Tensor Abstraction Enabling Explicit Layout Optimization in Homomorphic Encryption CGO Student Research Competition | ||
18:00 2hPoster | UniCon: Unified Controllers for the Quantum Computers CGO Student Research Competition Ercüment Kaya Technical University of München and Leibniz Supercomputing Centre, Hossam Ahmed Technical University of München and Leibniz Supercomputing Centre, Martin Schulz Technical University of Munich | ||
18:00 2hPoster | MDH-DSL: Reduction-Aware Data Parallelism via Multi-Dimensional Homomorphisms CGO Student Research Competition | ||
18:00 2hPoster | Effective Tiling for the Snitch Cluster CGO Student Research Competition | ||
18:00 2hPoster | Automated Adversarial Test Generation for Debugging Neural Compiler Optimizations CGO Student Research Competition Vasu Jindal Columbia University | ||
18:00 2hPoster | Unlocking Vectorization Scope: Extensible Vectorization via Unified Dependence Semantics CGO Student Research Competition | ||
18:00 2hPoster | Unifying Medium Sparse Processing Frameworks CGO Student Research Competition | ||
18:00 2hPoster | Bridging Linalg Dialect with Gemmini Backend CGO Student Research Competition | ||
18:00 2hPoster | Leveraging Alias Analysis Without Porting CGO Student Research Competition Ravikiran Ravindranath Reddy University of Murcia, Alberto Ros University of Murcia, Alexandra Jimborean University of Murcia | ||
Mon 2 FebDisplayed time zone: Hobart change
07:45 - 16:00 | |||
08:30 - 08:45 | WelcomePlenary Keynotes at Pyrmont Chair(s): Steve Blackburn Google and Australian National University, Tony Hosking Australian National University, Shuaiwen Leon Song Together AI and University of Sydney The conference will formally open with a Welcome to Country from a Traditional Owner of the Eora Nation where the ICC is located. Following that, the General Chairs will welcome you. | ||
08:30 15mDay opening | Welcome Plenary Keynotes Steve Blackburn Google and Australian National University, Tony Hosking Australian National University, Shuaiwen Leon Song Together AI and University of Sydney | ||
08:45 - 09:45 | 2025 ACM/IEEE-CS Ken Kennedy AwardPlenary Keynotes at Pyrmont Chair(s): Steve Blackburn Google and Australian National University | ||
08:45 60mKeynote | Compiler 2.0: Building the Next Generation Compilers with Machine Learning Plenary Keynotes Saman Amarasinghe Massachusetts Institute of Technology | ||
09:50 - 11:10 | Cache Coherence and Chiplet InterconnectsHPCA Main Conference at Collaroy Chair(s): Alberto Ros University of Murcia | ||
09:50 20mTalk | $C^3$ : CXL Coherence Controllers for Heterogeneous Architectures HPCA Main Conference Anatole Lefort Technical University of Munich (TUM), David Schall Technical University of Munich, Nicolò Carpentieri Technical University of Munich, Julian Pritzi Technical University of Munich, Soham Chakraborty TU Delft, Nicolai Oswald NVIDIA, Pramod Bhatotia TU Munich Pre-print | ||
10:10 20mTalk | Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation HPCA Main Conference Yanjing Wang National University of Defense Technology, Lizhou Wu National University of Defense Technology, Sunfeng Gao National University of Defense Technology, Yibo Tang National University of Defense Technology, Junhui Luo National University of Defense Technology, Zicong Wang National University of Defense Technology, Yang Ou National University of Defense Technology, Dezun Dong NUDT, Nong Xiao National University of Defense Technology & Sun Yat-sen University, Mingche Lai National University of Defense Technology | ||
10:30 20mTalk | PhasedStore: Supporting High-performance Write-through Cache-coherence Protocols under TSO HPCA Main Conference Burak Ocalan University of Illinois Urbana-Champaign, Chloe Alverti University of Illinois at Urbana-Champaign, Shashwat Jaiswal University of Illinois Urbana-Champaign, USA, Antonis Psistakis University of Illinois Urbana-Champaign, David Koufaty Unaffiliated, Suyash Mahar UC San Diego, Steven Swanson University of California San Diego, Josep Torrellas University of Illinois at Urbana-Champaign | ||
10:50 20mTalk | Deadlock-Free Bridge Module for Inter-Chiplet Communication in Open Chiplet Ecosystem HPCA Main Conference Zhiqiang Chen National University of Defense Technology, Wenwen Fu National University of Defense Technology, Yongwen Wang National University of Defense Technology, Hongwei Zhou National University of Defense Technology | ||
09:50 - 11:10 | |||
09:50 20mTalk | Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models HPCA Main Conference Chiyue Wei Duke University, Cong Guo Duke University, Junyao Zhang Duke University, Haoxuan Shan Duke University, Yifan Xu Duke University, Ziyue Zhang Duke University, Yudong Liu Duke University, Qinsi Wang Duke University, Changchun Zhou Duke University, Hai "Helen" Li Duke University, Yiran Chen Duke University | ||
10:10 20mTalk | LoCaLUT: Harnessing Capacity–Computation Tradeoffs for LUT-Based Inference in DRAM-PIM HPCA Main Conference Junguk Hong Seoul National University, Changmin Shin Seoul National University, Sukjin Kim Seoul National University, Si Ung Noh Seoul National University, Taehee Kwon Seoul National University, Seongyeon Park Seoul National University, Hanjun Kim Yonsei University, Youngsok Kim Yonsei University, Jinho Lee Seoul National University | ||
10:30 20mTalk | RPU - A Reasoning Processing Unit HPCA Main Conference Matthew Adiletta Harvard University, David Brooks Harvard University, Gu-Yeon Wei Harvard University | ||
10:50 20mTalk | PinDrop: Breaking the Silence on SDCs in a Large-Scale Fleet HPCA Main Conference Peter W. Deutsch Massachusetts Institute of Technology/Meta, Harish D. Dixit Meta, Gautham Vunnam Meta, Carl Moran Meta, Eleanor Ozer Meta, Sriram Sankar Meta | ||
09:50 - 11:10 | Homomorphic Encryption AccelerationHPCA Main Conference at Cronulla Chair(s): Jung Ho Ahn Seoul National University | ||
09:50 20mTalk | UniFHE: Faster Accelerator for FHE with Diverse Algebraic Structure and Balanced Memory System HPCA Main Conference Qingyun Niu Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS and School of Cyber Security, University of Chinese Academy of Sciences, Lutan Zhao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Ming Cai Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS and School of Cyber Security, University of Chinese Academy of Sciences, kai li Institute of Information Engineering,CAS, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Rui Hou Institute of Information Engineering, CAS | ||
10:10 20mTalk | Leveraging ASIC AI Chips for Homomorphic Encryption HPCA Main Conference Jianming Tong Georgia Institute of Technology, Tianhao Huang MIT, Leo de Castro MIT, Anirudh Itagi Georgia Institute of Technology, Jingtian Dang Georgia Tech, Anupam Golder Georgia Institute of Technology, Asra Ali Google, Jevin Jiang Google, Jeremy Kun Google, Arvind Massachusetts Institute of Technology, G. Edward Suh Cornell University, USA, Tushar Krishna Georgia Institute of Technology Pre-print | ||
10:30 20mTalk | CROPHE: Cross-Operator Dataflow Optimization for Fully Homomorphic Encryption Accelerators HPCA Main Conference Xinhua Chen Fudan University, Jiangbin Dong Xi'an Jiaotong University, Hongren Zheng Tsinghua University, Tian Tang Tsinghua University, Mingyu Gao Tsinghua University | ||
10:50 20mTalk | Peregrine: Accelerating TFHE Bootstrapping on GPUs via Multi-Level External Product Co-Design HPCA Main Conference Haoqi He State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, Chinese Academy of Sciences and School of Cyber Security, University of Chinese Academy of Sciences, Zhiwei Wang State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Lutan Zhao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Dian Jiao State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS, Dan Meng Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Rui Hou Institute of Information Engineering, CAS | ||
09:50 - 11:10 | |||
09:50 20mTalk | Binary Compatible Critical Section DelegationBest Paper Award PPoPP Main Conference DOI | ||
10:10 20mTalk | Hapax Locks: Scalable Value-Based Mutual Exclusion PPoPP Main Conference DOI | ||
10:30 20mTalk | Fixing Non-blocking Data Structures for Better Compatibility with Memory Reclamation Schemes PPoPP Main Conference DOI | ||
10:50 20mTalk | Multiverse: Transactional Memory with Dynamic Multiversioning PPoPP Main Conference Gaetano Coccimiglio University of Waterloo, Trevor Brown University of Waterloo, Srivatsan Ravi University of Southern California DOI | ||
11:10 - 11:30 | |||
11:10 20mCoffee break | Break Catering | ||
11:30 - 12:50 | DRAM Security and ReliabilityHPCA Main Conference at Collaroy Chair(s): Saugata Ghose University of Illinois Urbana-Champaign | ||
11:30 20mTalk | MIRZA: Efficiently Mitigating Rowhammer with Randomization and ALERT HPCA Main Conference Hritvik Taneja Georgia Tech, Ali Hajiabadi ETH Zurich, Michele Marazzi ABB Research, Kaveh Razavi ETH Zürich, Moinuddin K. Qureshi Georgia Tech | ||
11:50 20mTalk | SALT: Track-and-Mitigate Subarrays, Not Rows, for Blast-Radius-Free Rowhammer Defense HPCA Main Conference Moinuddin K. Qureshi Georgia Tech | ||
12:10 20mTalk | ReScue: Reliable and Secure CXL Memory HPCA Main Conference Chihun Song UIUC, Austin Antony Cruz UIUC, Michael Jaemin Kim Meta, Minbok Wi Seoul National University, Gaohan Ye UIUC, Kyungsan Kim Samsung Electronics, Sangyeol Lee Samsung Electronics, Jung Ho Ahn Seoul National University, Nam Sung Kim UIUC | ||
12:30 20mTalk | Secret Caching Sauce for High-Performance Secure Memory HPCA Main Conference Xu Jiang Huazhong University of Science and Technology, Xueliang Wei Huazhong University of Science and Technology, YiFei Qu Huazhong University of Science and Technology, Dan Feng Huazhong University of Science and Technology, China, Yulai Xie Huazhong University of Science and Technology, Wei Tong Huazhong University of Science and Technology, China | ||
11:30 - 12:50 | Near-Data Processing and StorageHPCA Main Conference at Coogee Chair(s): Jisung Park POSTECH (Pohang University of Science and Technology) | ||
11:30 20mTalk | PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-based Long-Context LLM Inference System HPCA Main Conference hyucksung kwon Hanyang University, Kyungmo Koo Hanyang University, Janghyeon Kim Hanyang University, Woongkyu Lee Hanyang University, Minjae Lee Hanyang University, Gyeonggeun Jung KAIST, Hyungdeok Lee Solution Advanced Technology, SK hynix, Yousub Jung Solution Advanced Technology, SK hynix, Jaehan Park Solution Advanced Technology, SK hynix, Yosub Song Solution Advanced Technology, SK hynix, Byeongsu Yang Solution Advanced Technology, SK hynix, Haerang Choi Solution Advanced Technology, SK hynix, Guhyun Kim Solution Advanced Technology, SK hynix, Jongsoon Won Solution Advanced Technology, SK hynix, Woojae Shin Solution Advanced Technology, SK hynix, Changhyun Kim Solution Advanced Technology, SK hynix, Shin Gyeongcheol Solution Advanced Technology, SK hynix, Yongkee Kwon Tenstorrent, Ilkon Kim Solution Advanced Technology, SK hynix, Euicheol Lim SK hynix, John Kim KAIST, Jungwook Choi Hanyang University | ||
11:50 20mTalk | Adaptive Draft Sequence Length: Enhancing Speculative Decoding Throughput on PIM-Enabled Systems HPCA Main Conference Runze Wang Huazhong University of Science and Technology, Qinggang Wang Huazhong University of Science and Technology, Haifeng Liu Huazhong University of Science and Technology, Long Zheng Huazhong University of Science and Technology, XIAOFEI LIAO Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology, Jingling Xue University of New South Wales | ||
12:10 20mTalk | Conduit: Programmer-Transparent Near-Data Processing Using Multiple Compute-Capable Resources in SSDs HPCA Main Conference Rakesh Nadig ETH Zurich, Vamanan Arulchelvan ETH Zurich, Mayank Kabra ETH Zurich, Harshita Gupta ETH Zurich, Rahul Bera ETH Zurich, Nika Mansouri Ghiasi ETH Zurich, Nanditha Rao ETH Zurich, Qingcai Jiang ETH Zurich, Andreas Kosmas Kakolyris ETH Zurich, Yu Liang ETH Zurich, Mohammad Sadrosadati ETH Zürich, Onur Mutlu ETH Zurich | ||
12:30 20mTalk | N-DIPPER: A Distributed Inter-die Peak Power Management Network for NAND Systems HPCA Main Conference | ||
11:30 - 12:50 | Scheduling and Load BalancingPPoPP Main Conference at Pyrmont Chair(s): V Krishna Nandivada IIT Madras | ||
11:30 20mTalk | Rethinking Thread Scheduling under Oversubscription: A User-Space Framework for Coordinating Multi-runtime and Multi-process WorkloadsBest Paper Nominee PPoPP Main Conference DOI | ||
11:50 20mTalk | Waste-Efficient Work Stealing PPoPP Main Conference Kyle Singer Massachusetts Institute of Technology, Kunal Agrawal Washington University in St. Louis, TB Schardl Massachusetts Institute of Technology DOI | ||
12:10 20mTalk | DiggerBees: Depth First Search Leveraging Hierarchical Block-Level Stealing on GPUs PPoPP Main Conference Yuyao Niu Barcelona Supercomputing Center, Yuechen Lu China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing, Marc Casas Barcelona Supercomputing Center DOI | ||
12:30 20mTalk | PANA: A Fine-Grained Runtime-Adaptive Load Balancing for Parallel SpMV on Multicore CPUs PPoPP Main Conference Haodong Bian Tsinghua University, Youhui Zhang Tsinghua University, Xiang Fei Tsinghua University, Jianqiang Huang Qinghai University, Xiaoying Wang Qinghai University DOI | ||
12:50 - 14:10 | |||
12:50 80mLunch | Lunch Catering | ||
14:10 - 15:30 | |||
14:10 20mTalk | Flow-Graph-Aware Tiling and Rescheduling for Memory-Efficient On-Device Inference CGO Main Conference Pre-print | ||
14:30 20mTalk | VFlatten: Selective Value-Object Flattening using Hybrid Static and Dynamic Analysis CGO Main Conference Arjun H. Kumar IIT Mandi, Bhavya Hirani SVNIT, Surat, Hang Shao IBM, Tobi Ajila IBM, Vijay Sundaresan IBM Canada, Daryl Maier IBM Canada, Manas Thakur IIT Bombay Pre-print Media Attached | ||
14:50 20mTalk | FRUGAL: Pushing GPU Applications beyond Memory Limits CGO Main Conference Lingqi Zhang RIKEN RCCS, Tengfei Wang Google Cloud, Jiajun Huang University of California, Riverside, Chen Zhuang Tokyo Institute of Technology, Riken Center for Computational Science, Ivan Ivanov Institute of Science Tokyo, Peng Chen RIKEN RCCS, Toshio Endo , Mohamed Wahib RIKEN Center for Computational Science Pre-print | ||
15:10 20mTalk | Automatic Data Enumeration for Fast Collections CGO Main Conference Pre-print Media Attached | ||
14:10 - 15:30 | |||
14:10 20mTalk | Predicting DRAM Failures at Scale: A Two-Stage Approach for Heterogeneous Systems HPCA Main Conference Chenglin Wang Xiamen University, Shouxin Wang Xiamen University, Shuyue Zhou Xiamen University, Ronglong Wu Xiamen University, Zhirong Shen Xiamen University, Lu Tang Xiamen University, Yiming Zhang Xiamen University, Jialiang Yu Huawei, Min Zhou Huawei | ||
14:30 20mTalk | MemSOS: OS-Guided Selective Memory Mirroring HPCA Main Conference Junghoon Kim Seoul National University & Samsung Electronics, Jongheon Jeong Seoul National University, Seokwon Moon Seoul National University, Seong Hoon Seo Seoul National University, Yeonhong Park Seoul National University, Jinkyu Jeong Yonsei University, Nam Sung Kim UIUC, Jae W. Lee Seoul National University | ||
14:50 20mTalk | ASPA: Reassigning DDR5 Parity Bandwidth HPCA Main Conference Fan Li University of Central Florida, Qiufeng Li George Washington University, Yanan Guo University of Rochester, Weidong Cao George Washington University, Xin Xin University of Central Florida | ||
15:10 20mTalk | HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture with Unified Low-Cost Iterative Error Correction HPCA Main Conference Zhen He Tsinghua University, Yiqi Wang Tsinghua University, Zhiheng Yue Tsinghua University, Zihan Wu Tsinghua University, Huiming Han Tsinghua University, Shaojun Wei Tsinghua University, Yang Hu Tsinghua University, Fengbin Tu The Hong Kong University of Science and Technology, Shouyi Yin Tsinghua University | ||
14:10 - 15:30 | LLM Inference Serving SystemsHPCA Main Conference at Coogee Chair(s): Jian Li Chinese Academy of Meteorological Sciences | ||
14:10 20mTalk | Towards Resource-Efficient Serverless LLM Inference with SLINFER HPCA Main Conference | ||
14:30 20mTalk | ELORA: Efficient LoRA and KV Cache Management for Multi-LoRA LLM Serving HPCA Main Conference Jiuchen Shi Shanghai Jiao Tong University & The Hong Kong Polytechnic University, Hang Zhang Shanghai Jiao Tong University, Yixiao Wang Shanghai Jiao Tong University, Quan Chen Shanghai Jiao Tong University, China, Yizhou Shan Huawei Cloud, Kaihua Fu Hong Kong University of Science and Technology, Wei Wang Hong Kong University of Science and Technology, Minyi Guo Shanghai Jiao Tong University | ||
14:50 20mTalk | PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models HPCA Main Conference | ||
15:10 20mTalk | The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective HPCA Main Conference | ||
14:10 - 15:30 | Quantum Compilation and SimulationHPCA Main Conference at Cronulla Chair(s): Gokul Subramanian Ravi University of Michigan | ||
14:10 20mTalk | CLINE: Improving Control Flow Compilation of Quantum Programs with Control Line Encoding HPCA Main Conference Anbang Wu Shanghai Jiao Tong University, Liqiang Lu Zhejiang University, Jianwei Yin Zhejiang University, Jingwen Leng Shanghai Jiao Tong University, Minyi Guo Shanghai Jiao Tong University | ||
14:30 20mTalk | Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD HPCA Main Conference Ming Wang North Carolina State University, Ang Li Pacific Northwest National Laboratory, Frank Mueller North Carolina State University, USA | ||
14:50 20mTalk | DC-MBQC: A Distributed Quantum Compilation Framework for Measurement-Based Quantum Computing HPCA Main Conference Yecheng Xue Peking University, Rui Yang Peking University, Zhiding Liang The Chinese University of Hong Kong, Tongyang Li Peking University | ||
15:10 20mTalk | TraceQ: Trace-Based Reconstruction of Quantum Circuit Dataflow in Surface-Code Fault-Tolerant Quantum Computing HPCA Main Conference Theodoros Trochatos Yale University, Christopher Kang University of Chicago, Andrew Wang Cornell University, Frederic T. Chong University of Chicago, Jakub Szefer Northwestern University | ||
14:10 - 15:30 | |||
14:10 20mTalk | UFO Trees: Practical and Provably-Efficient Parallel Batch-Dynamic TreesBest Paper Nominee PPoPP Main Conference Quinten De Man University of Maryland, Atharva Sharma University of Maryland, Kishen N Gowda University of Maryland, Laxman Dhulipala University of Maryland, College Park DOI | ||
14:30 20mTalk | Sharded Elimination and Combining for Highly-Efficient Concurrent Stacks PPoPP Main Conference Ajay Singh FORTH ICS, Nikos Metaxakis , Panagiota Fatourou FORTH ICS and University of Crete, Greece DOI | ||
14:50 20mTalk | Concurrent Balanced Augmented Trees PPoPP Main Conference Evan Wrench University of British Columbia, Ajay Singh FORTH ICS, Younghun Roh Massachusetts Institute of Technology, Panagiota Fatourou University of Crete & FORTH, Siddhartha Jayanti Google Research, Eric Ruppert York University, Yuanhao Wei University of British Columbia DOI | ||
15:10 20mTalk | Parallel Dynamic Spatial Indexes PPoPP Main Conference Ziyang Men University of California, Riverside, Bo Huang University of California, Riverside, Yan Gu University of California, Riverside, Yihan Sun University of California, Riverside DOI | ||
15:30 - 15:50 | |||
15:30 20mCoffee break | Break Catering | ||
15:50 - 17:10 | Parallelization / VectorizationCGO Main Conference at Bronte Chair(s): V Krishna Nandivada IIT Madras | ||
15:50 20mTalk | Enabling Automatic Compiler-Driven Vectorization of Transformers CGO Main Conference Shreya Alladi University of Murcia, Alberto Ros University of Murcia, Alexandra Jimborean University of Murcia Pre-print Media Attached | ||
16:10 20mTalk | Unlocking Python Multithreading Capabilities using OpenMP-Based Programming with OMP4Py CGO Main Conference César Piñeiro University of Santiago de Compostela, Juan C. Pichel University of Santiago de Compostela Pre-print Media Attached | ||
16:30 20mTalk | The Parallel-Semantics Program Dependence Graph for Parallel Optimization CGO Main Conference Yian Su Northwestern University, Brian Homerding Northwestern University, Haocheng Gao Northwestern University, Federico Sossai Northwestern University, Yebin Chon Princeton University, David I. August Princeton University, Simone Campanoni Google / Northwestern University Pre-print Media Attached | ||
16:50 20mTalk | From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization CGO Main Conference Shuaijiang Li Institute of Computing Technology at Chinese Academy of Sciences, Jiacheng Zhao Institute of Computing Technology at Chinese Academy of Sciences; University of Chinese Academy of Sciences; Zhongguancun Laboratory, Ying Liu Institute of Computing Technology, Chinese Academy of Sciences, Shuoming Zhang Institute of Computing Technology at Chinese Academy of Sciences, Lei Chen University of Chinese Academy of Sciences, Yijin Li Institute of Computing Technology at Chinese Academy of Sciences, Yangyu Zhang Institute of Computing Technology,Chinese Academy of Sciences, lizhicheng Institute of Computing Technology at Chinese Academy of Sciences, Runyu Zhou Institute of Computing Technology at Chinese Academy of Sciences, Xiyu Shi Institute of Computing Technology at Chinese Academy of Sciences, Chunwei Xia University of Leeds, Yuan Wen University of Aberdeen, Xiaobing Feng ICT CAS, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences Pre-print | ||
15:50 - 17:10 | Processing-in-Memory ArchitecturesHPCA Main Conference at Collaroy Chair(s): Byeongho Kim Samsung Electronics | ||
15:50 20mTalk | The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution HPCA Main Conference Minh S. Q. Truong Carnegie Mellon University, Yiqiu Sun University of Illinois Urbana-Champaign, Dawei Xiong University of Illinois Urbana-Champaign, Amol Shah University of Illinois Urbana-Champaign, Alex Glass Carnegie Mellon University, Abraham Farrell University of Illinois Urbana-Champaign, James A. Bain Carnegie Mellon University, L. Richard Carley Carnegie Mellon University, Saugata Ghose University of Illinois Urbana-Champaign Link to publication | ||
16:10 20mTalk | CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM HPCA Main Conference Shunchen Shi Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Qijia Yang Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Fan Yang Institute of Computing Technology, Chinese Academy of Science, Yu Huang Huazhong University of Science and Technology, Youwei Zhuo Peking University, Zhichun Li Institute of Computing Technology, Chinese Academy of Sciences ; University of Chinese Academy of Sciences, Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Xueqi Li State Key Lab of Processors, Institute of Computing Technology, CAS | ||
16:30 20mTalk | PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures HPCA Main Conference | ||
16:50 20mTalk | Count2Multiply: Reliable In-Memory High-Radix Counting HPCA Main Conference Joao Paulo Cardoso de Lima TU Dresden, ScaDS.AI, Benjamin F. Morris III Duke University, Asif Ali Khan TU Dresden, Germany, Jeronimo Castrillon TU Dresden, Germany, Alex Jones Syracuse University | ||
15:50 - 17:10 | Efficient LLM Inference TechniquesHPCA Main Conference at Coogee Chair(s): Jovan Stojkovic University of Illinois at Urbana-Champaign | ||
15:50 20mTalk | PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion HPCA Main Conference Huizheng Wang Tsinghua University, Hongbin Wang Tsinghua University, Zichuan Wang Tsinghua University, Zhiheng Yue Tsinghua University, Yang Wang Tsinghua University, Chao Li Shanghai Jiao Tong University, Yang Hu Tsinghua University, Shouyi Yin Tsinghua University | ||
16:10 20mTalk | AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization HPCA Main Conference Kosuke Matsushima Institute of Science Tokyo, Yasuyuki Okoshi Institute of Science Tokyo, Masato Motomura Institute of Science Tokyo, Daichi Fujiki Institute of Science Tokyo | ||
16:30 20mTalk | BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache HPCA Main Conference Dayou Du University of Edinburgh, Shijie Cao Microsoft Research, Jianyi Cheng University of Edinburgh, UK, Luo Mai University of Edinburgh, Ting Cao Institute for AI Industry Research (AIR), Tsinghua University, Mao Yang Microsoft Research | ||
16:50 20mTalk | GyRot: Leveraging Hidden Synergy between Rotation and Fine-grained Group Quantization for Low-bit LLM Inference HPCA Main Conference | ||
15:50 - 17:10 | 3D Graphics and Rendering AccelerationHPCA Main Conference at Cronulla Chair(s): Yunho Oh Korea University | ||
15:50 20mTalk | GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering HPCA Main Conference Junseo Lee Seoul National University, Sangyun Jeon Seoul National University, Jungi Lee Seoul National University, Junyong Park Seoul National University, Jaewoong Sim Seoul National University | ||
16:10 20mTalk | Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing HPCA Main Conference Xiaotong Huang Shanghai Jiao Tong University, He Zhu Shanghai Jiao Tong University, Tianrui Ma Institute of Computing Technology, Chinese Academy of Sciences, Yuxiang Xiong Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Zhezhi He Shanghai Jiao Tong University, Yiming Gan Institute of Computing Technology, Chinese Academy of Sciences, Zihan Liu Shanghai Jiao Tong University, Jingwen Leng Shanghai Jiao Tong University, Yu Feng Shanghai Jiao Tong University, Minyi Guo Shanghai Jiao Tong University | ||
16:30 20mTalk | FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing HPCA Main Conference Yuzhe Fu Duke University, Changchun Zhou Duke University, Hancheng Ye Duke University, Bowen Duan Duke University, Qiyu Huang Yale University, Chiyue Wei Duke University, Cong Guo Duke University, Hai "Helen" Li Duke University, Yiran Chen Duke University | ||
16:50 20mTalk | ORANGE: Exploring \underline{O}ckham's \underline{R}azor for Neural Rendering by \underline{A}ccelerating 3DGS on \underline{N}PUs with \underline{GE}MM-Friendly Blending and Balanced Workloads HPCA Main Conference Haomin Li Shanghai Jiao Tong University, Yue Liang Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Bowen Zhu Shanghai Jiao Tong University, Zongwu Wang Shanghai Jiao Tong University, Yu Feng Shanghai Jiao Tong University, Liqiang Lu Zhejiang University, Li Jiang Shanghai Jiaotong University, Haibing Guan Shanghai Jiao Tong University | ||
15:50 - 17:10 | GPU and Heterogeneous ComputingPPoPP Main Conference at Pyrmont Chair(s): Frank Mueller North Carolina State University, USA | ||
15:50 20mTalk | PRISM: An Efficient GPU-Based Lossy Compression Framework for Progressive Data Retrieval with Multi-Level InterpolationBest Paper Nominee PPoPP Main Conference Bing Lu Institute of Computing Technology of Chinese Academy of Sciences, Zedong Liu University of Chinese Academy of Sciences, Hairui Zhao Jilin University, Dejun Luo University of Chinese Academy of Sciences, Wenjing Huang University of Chinese Academy of Sciences, Yida Gu University of Chinese Academy of Sciences, Jinyang Liu University of Houston, Guangming Tan University of Chinese Academy of Sciences, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences DOI | ||
16:10 20mTalk | Dynamic Detection of Inefficient Data Mapping Patterns in Heterogeneous OpenMP Applications PPoPP Main Conference Luke Marzen Iowa State University, Junhyung Shim Iowa State University, Ali Jannesari Iowa State University DOI | ||
16:30 20mTalk | Root-Down Exposure for Maximal Clique Enumeration on GPUs PPoPP Main Conference DOI | ||
16:50 20mTalk | ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities PPoPP Main Conference Weile Luo The Hong Kong University of Science and Technology, Guangzhou, Yuhan Chen The Hong Kong University of Science and Technology, Guangzhou, Xiangrui Yu The Hong Kong University of Science and Technology, Guangzhou, Qiang Wang Harbin Institute of Technology, Shenzhen, Ruibo Fan The Hong Kong University of Science and Technology, Guangzhou, Hongyuan Liu Stevens Institute of Technology, Xiaowen Chu The Hong Kong University of Science and Technology, Guangzhou DOI | ||
17:30 - 19:00 | Business MeetingCGO Main Conference at Bronte Chair(s): Steve Blackburn Google and Australian National University, Albert Cohen Google DeepMind, Timothy M. Jones University of Cambridge | ||
17:30 90mMeeting | CGO Business Meeting CGO Main Conference | ||
17:30 - 19:00 | |||
17:30 90mMeeting | HPCA Business Meeting HPCA Main Conference | ||
17:30 - 19:00 | Business MeetingPPoPP Main Conference at Cronulla Chair(s): Tony Hosking Australian National University, Madan Musuvathi Microsoft Research, Kenjiro Taura The University of Tokyo | ||
17:30 90mMeeting | PPoPP Business Meeting PPoPP Main Conference | ||
Tue 3 FebDisplayed time zone: Hobart change
08:15 - 16:00 | |||
08:45 - 09:45 | |||
08:45 60mKeynote | Oracle Parfait – Scaling Vulnerability Detection from Enterprise Systems to Cloud-Scale Systems and Beyond Plenary Keynotes Cristina Cifuentes Oracle Software Assurance | ||
09:50 - 11:10 | |||
09:50 20mTalk | Binary Diffing via Library Signatures CGO Main Conference Andrei Rimsa CEFET-MG, Anderson Faustino da Silva State University of Maringá, Camilo Santana Melgaço Federal University of Minas Gerais, Fernando Magno Quintão Pereira Federal University of Minas Gerais Pre-print Media Attached | ||
10:10 20mTalk | BIT: Empowering Binary Analysis through the LLVM Toolchain CGO Main Conference Puzhuo Liu Ant Group & Tsinghua University, Peng Di Ant Group & UNSW, Jingling Xue University of New South Wales, Yu Jiang Tsinghua University Pre-print | ||
10:30 20mTalk | Dr.avx: A Dynamic Compilation System for Seamlessly Executing Hardware-Unsupported Vectorization Instructions CGO Main Conference Yue Tang East China Normal University, Mianzhi Wu East China Normal University, Yufeng Li East China Normal University, Haoyu Liao East China Normal University, Jianmei Guo East China Normal University, Bo Huang East China Normal University Pre-print Media Attached | ||
10:50 20mTalk | Practical: Are Abstract-Interpreter Baseline JITs Worth It? An Empirical Evaluation through Metacompilation CGO Main Conference Nahuel Palumbo Université Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL, Guillermo Polito Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Stéphane Ducasse Inria; University of Lille; CNRS; Centrale Lille; CRIStAL, Pablo Tesone Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Pharo Consortium Pre-print | ||
09:50 - 11:10 | |||
09:50 20mTalk | TPDE: A Fast Adaptable Compiler Back-End Framework CGO Main Conference Pre-print Media Attached | ||
10:10 20mTalk | Synthesizing Instruction Selection Back-Ends from ISA Specifications Made Practical CGO Main Conference Pre-print | ||
10:30 20mTalk | SparseX: Synergizing GPU Libraries for Sparse Matrix Multiplication on Heterogeneous Processors CGO Main Conference Ruifeng Zhang North Carolina State University, Xiangwei Wang North Carolina State University, Ang Li Pacific Northwest National Laboratory, Xipeng Shen North Carolina State University Pre-print Media Attached | ||
10:50 20mTalk | Compilation of Generalized Matrix Chains with Symbolic Sizes CGO Main Conference Pre-print Media Attached | ||
09:50 - 11:10 | |||
09:50 20mTalk | The Last-Level Branch Predictor Revisited HPCA Main Conference David Schall Technical University of Munich, Mária Ďuračková University Of Edinburgh, Boris Grot University of Edinburgh, UK | ||
10:10 20mTalk | Tempranillo: Non-Speculative Early Register Release HPCA Main Conference Carlos Escuin Computing Systems Lab, Huawei Technologies Switzerland AG, Paolo Salvatore Galfano Computing Systems Laboratory, Zurich Research Center, Huawei Technologies, Switzerland, Davide Basilio Bartolini Computing Systems Laboratory, Zurich Research Center, Huawei Technologies, Switzerland, Leeor Peled Boole Labs, Tel-Aviv Research Center, Huawei Technologies, Israel, Mehdi Alipour Computing Systems Laboratory, Zurich Research Center, Huawei Technologies, Switzerland | ||
10:30 20mTalk | SMTcheck: Accurate SMT Interference Prediction to Improve Scheduling Efficiency in Datacenters HPCA Main Conference Sanghyun Kim Sungkyunkwan University, Jinhyeok Oh Sungkyunkwan University, Taehun Kim Sungkyunkwan University, Gyutae Kim Sungkyunkwan University, Youngsok Kim Yonsei University, Jaehyun Hwang Sungkyunkwan University, Joonsung Kim Sungkyunkwan University | ||
10:50 20mTalk | I-POP: Ignite Positive Prefetchers HPCA Main Conference Yiquan Lin Zhejiang University and Alibaba Group, Wenhai Lin Alibaba Group, Yiquan Chen Alibaba Group, Jiexiong Xu Zhejiang University and Alibaba Group, Shishun Cai Alibaba Group, Jiarong Ye Zhejiang University, Zonghui Wang Zhejiang University, Wenzhi Chen Zhejiang University | ||
09:50 - 11:10 | Wafer-Scale Systems for Large ModelsHPCA Main Conference at Coogee Chair(s): Hyesoon Kim Georgia Institute of Technology, Hyesoon Kim Georgia Institute of Technology | ||
09:50 20mTalk | WATOS: Efficient LLM Training Strategies and Architecture Co-exploration for Wafer-scale Chip HPCA Main Conference Huizheng Wang Tsinghua University, Zichuan Wang Tsinghua University, Hongbin Wang Tsinghua University, Jingxiang Hou Tsinghua University, Taiquan Wei Tsinghua University, Chao Li Shanghai Jiao Tong University, Yang Hu Tsinghua University, Shouyi Yin Tsinghua University | ||
10:10 20mTalk | FACE: Fully PD Overlapped Scheduling and Multi-Level Architecture Co-Exploration on Wafer HPCA Main Conference Zheng Xu Tsinghua University, Dehao Kong Tsinghua University, Jiaxin Liu Tsinghua University, Dingcheng Jiang Tsinghua University, Xu Dai Shanghai Artificial Intelligence Laboratory, Jinyi Deng Tsinghua University, Yang Hu Tsinghua University, Shouyi Yin Tsinghua University | ||
10:30 20mTalk | TEMP: A Memory Efficient Physical-aware Tensor Partition-Mapping Framework on Wafer-scale Chips HPCA Main Conference Huizheng Wang Tsinghua University, Taiquan Wei Tsinghua University, Zichuan Wang Tsinghua University, Dingcheng Jiang Tsinghua University, Qize Yang Tsinghua University, Jiaxin Liu Tsinghua University, Jingxiang Hou Tsinghua University, Chao Li Shanghai Jiao Tong University, Jinyi Deng Tsinghua University, Yang Hu Tsinghua University, Shouyi Yin Tsinghua University | ||
10:50 20mTalk | MoEntwine: Unleashing the Potential of Wafer-scale Chips for Large-scale Expert Parallel Inference HPCA Main Conference Xinru Tang Tsinghua University, Jingxiang Hou Tsinghua University, Dingcheng Jiang Tsinghua University, Taiquan Wei Tsinghua University, Jiaxin Liu Tsinghua University, Jinyi Deng Tsinghua University, Huizheng Wang Tsinghua University, Qize Yang Tsinghua University, Haoran Shang Tsinghua University, Chao Li Shanghai Jiao Tong University, Yang Hu Tsinghua University, Shouyi Yin Tsinghua University | ||
09:50 - 11:10 | Stencil and Sparse Matrix ComputationPPoPP Main Conference at Pyrmont Chair(s): Shoaib Kamil Adobe Research | ||
09:50 20mTalk | SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping PPoPP Main Conference Qiqi Gu Shanghai Jiao Tong University, Chenpeng Wu Shanghai Jiao Tong University, Heng Shi , Jianguo Yao Shanghai Jiao Tong University; Shanghai Enflame Technology DOI | ||
10:10 20mTalk | ASM-SpMM: Unleashing the Potential of Arm SME for Sparse Matrix Multiplication Acceleration PPoPP Main Conference Jiazhi Jiang Sun Yat-sen University, Xijia Yao Sun Yat-sen University, Jiayu Chen Sun Yat-sen University, jinhui wei Sun Yat-sen University, Dan Huang , Yutong Lu Sun Yat-sen University DOI | ||
10:30 20mTalk | Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores PPoPP Main Conference Kaige Zhang Beihang University, Hailong Yang Beihang University, Xin You Beihang University, Tianyu Feng Beihang University, Yufan Xu Independent Researcher, Zhongzhi Luan Beihang University, Yi Liu Beihang University, Depei Qian Beihang University DOI | ||
10:50 20mTalk | VDHA: Vector-Driven Hash Aggregation for Sparse Matrix-Sparse Vector Multiplication on GPUs PPoPP Main Conference Yuchen Li Tsinghua University, Zhe Pan Tsinghua University, Peng Qu Tsinghua University, Youhui Zhang Tsinghua University DOI | ||
11:10 - 11:30 | |||
11:10 20mCoffee break | Break Catering | ||
11:30 - 12:50 | Mixed Precision and QuantizationPPoPP Main Conference at Balmoral Chair(s): Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences | ||
11:30 20mTalk | RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization PPoPP Main Conference Qihao Zhang Tsinghua University, MingLiang Tang Tsinghua University, Mingshu Zhai Tsinghua University, Kinman Lei Tsinghua University, Jidong Zhai Tsinghua University DOI | ||
11:50 20mTalk | High-Throughput Non-Uniformly Quantized 3-bit LLM Inference PPoPP Main Conference YuAng Chen Chinese University of Hong Kong, Wenqi Zeng Hong Kong University of Science and Technology, Jeffrey Xu Yu Chinese University of Hong Kong DOI | ||
12:10 20mTalk | JanusQuant: Accurate and Efficient 2-bit KV Cache Quantization for Long-Context Inference PPoPP Main Conference Chengyu Sun Wuhan University, Yaqi Xia Wuhan University, Hulin Wang , Donglin Yang Nvidia Corporation, Xiaobo Zhou University of Macau, Dazhao Cheng WuHan University DOI | ||
12:30 20mTalk | HierCut: Enabling 16-bit Format Mixed Precision for Molecular Dynamics through Hierarchical Cutoff PPoPP Main Conference zeyu song Tsinghua University, Lin Gan Tsinghua University, Xiaohui Duan Shandong University, Jiayu Fu Tsinghua University, Zhengrui Li Tsinghua University, Yinuo Wang Tsinghua University, Guangzhao Li Chinese Academy of Sciences, Guangwen Yang Tsinghua University DOI | ||
11:30 - 12:50 | Caching and PrefetchingHPCA Main Conference at Collaroy Chair(s): David Schall Technical University of Munich | ||
11:30 20mTalk | Athena: Synergizing Data Prefetching and Off-Chip Prediction via Online Reinforcement Learning HPCA Main Conference Zhenrong Lang ETH Zürich, Rahul Bera ETH Zurich, Caroline Hengartner ETH Zürich, Konstantinos Kanellopoulos ETH Zurich, Rakesh Kumar NTNU, Mohammad Sadrosadati ETH Zürich, Onur Mutlu ETH Zurich | ||
11:50 20mTalk | Streamlined On-Chip Temporal Prefetching HPCA Main Conference | ||
12:10 20mTalk | Intermittence-Aware Cache Compression HPCA Main Conference Gan Fang Purdue University, Jianping Zeng Arizona State University, Yuchen Zhou Purdue University, Changhee Jung Purdue University, USA | ||
12:30 20mTalk | TENET-v2: Applying Relation-centric Notation to Model and Optimize Data Swizzle in the Cache of Modern NPU HPCA Main Conference Hanyu Zhang Zhejiang University, Fangxu Guo Zhejiang University, Liqiang Lu Zhejiang University, Long Wang Huawei Technologies, Yunfei Du Huawei Technologies, Zhe Wang Huawei Technologies, Jinghan Zhang Huawei Technologies, Jie Zhang Peking University, Chenli Xue Zhejiang University, Chengpeng Wu Zhejiang University, Ziyi Zhang Zhejiang University, Yun Liang Peking University, Size Zheng Tsinghua University, Jianwei Yin Zhejiang University | ||
11:30 - 12:50 | Visual and Multimodal AccelerationHPCA Main Conference at Coogee Chair(s): Yu Feng Shanghai Jiao Tong University | ||
11:30 20mTalk | V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval HPCA Main Conference | ||
11:50 20mTalk | SFD: Towards Segment Fusion Dataflow for Spatial Accelerators HPCA Main Conference Fuyu Wang Sun Yat-sen University, Minghua Shen Sun Yat-sen University, Yufei Ding UCSD, Nong Xiao National University of Defense Technology & Sun Yat-sen University, Yutong Lu Sun Yat-sen University | ||
12:10 20mTalk | VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models through Dual Redundancy HPCA Main Conference Xujiang Xiang The Hong Kong University of Science and Technology, Fengbin Tu The Hong Kong University of Science and Technology | ||
12:30 20mTalk | Cambricon-GS: An Accelerator for 3D Gaussian Splatting Training with Gaussian-Pixel Hybrid Parallelism HPCA Main Conference Rui Wen Institute of Computing Technology, Chinese Academy of Sciences, Zhifei Yue University of Science and Technology of China, Tianbo Liu University of Science and Technology of China, Xinkai Song Institute of Computing Technology, Chinese Academy of Sciences, Jin Li Institute of Computing Technology, Chinese Academy of Sciences, Di Huang Chinese Academy of Sciences, Institute of Computing Technology, Jiaming Guo Institute of Computing Technology, Chinese Academy of Sciences, Xing Hu Institute of Computing Technology, Chinese Academy of Sciences, zidong du Institute of Computing Technology, Chinese Academy of Sciences, Qi Guo Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies | ||
11:30 - 12:50 | Zero-Knowledge and Private Information RetrievalHPCA Main Conference at Cronulla Chair(s): Hanjun Kim POSTECH | ||
11:30 20mTalk | zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates HPCA Main Conference Alhad Daftardar New York University, Jianqiao Cambridge Mo New York University, Joey Ah-kiow New York University, Benedikt Bünz New York University, Siddharth Garg New York University, Brandon Reagen New York University | ||
11:50 20mTalk | Conflux: A High-Performance Keyword Private Retrieval System for Dynamic Datasets HPCA Main Conference Zehao Chen Shandong University, Zhaoyan Shen Shandong University, Qian Wei Shandong University, Hang Lu Institute of Computing Technology, Chinese Academy of Sciences, Lei Ju Shandong University | ||
12:10 20mTalk | An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation HPCA Main Conference Yilan Zhu Ant Group, Geng Yang Ant Group, Xingyu Tian Simon Fraser University, Dilshan Kumarathunga Simon Fraser University, Liang Kong Ant Group, Xianglong Deng UCAS, Shengyu Fan UCAS, Guang Fan Ant Group, Guiming Shi Tsinghua University, Lei Chen University of Chinese Academy of Sciences, Bo Zhang Ant Group, Yisong Chang Ant Group, Shoumeng Yan Ant Group, Zhenman Fang Simon Fraser University, Mingzhe Zhang Ant Group | ||
12:30 20mTalk | IVE: An Accelerator for Single-Server Private Information Retrieval Using a Versatile Processing Element HPCA Main Conference Sangpyo Kim Seoul National University, Hyesung Ji Seoul National University, Jongmin Kim Seoul National University, Jaiyoung Park Seoul National University, Wonseok Choi Seoul National University, Jung Ho Ahn Seoul National University Pre-print | ||
11:30 - 12:50 | Cluster and Cloud ComputingPPoPP Main Conference at Pyrmont Chair(s): Ruslan Nikolaev Pennsylvania State University | ||
11:30 20mTalk | Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds PPoPP Main Conference Xiaokang Hu Alibaba Cloud Computing, Yuchao Cao Alibaba Cloud Computing, Naixuan Guan Alibaba Cloud Computing, Yifan Wu Alibaba Cloud Computing, Xishi Qiu Alibaba Cloud Computing, Shengdong Dai Alibaba Cloud Computing, Ben Luo Alibaba Cloud Computing, Sanchuan Cheng Alibaba Cloud Computing, Fudong Qiu Alibaba Cloud Computing, Yibin Shen Alibaba Cloud, Jiesheng Wu Alibaba Cloud Computing DOI | ||
11:50 20mTalk | zBuffer: Zero-Copy and Metadata-Free Serialization for Fast RPC with Scatter-Gather Reflection PPoPP Main Conference Xiangyu Liu Xiamen University, Huiba Li Alibaba, Shun Gai Alibaba, Youmin Chen Shanghai Jiao Tong University, Yiming Zhang Xiamen University DOI | ||
12:10 20mTalk | Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters PPoPP Main Conference DOI | ||
12:30 20mTalk | Trojan Horse: Aggregate-and-Batch for Scaling Up Sparse Direct Solvers on GPU ClustersBest Paper Nominee PPoPP Main Conference Yida Li China University of Petroleum-Beijing, Siwei Zhang China University of Petroleum-Beijing, Yiduo Niu China University of Petroleum-Beijing, Yang Du China University of Petroleum-Beijing, Qingxiao Sun China University of Petroleum-Beijing, Zhou Jin China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing DOI | ||
12:50 - 14:10 | |||
12:50 80mAwards | HPCA Awards Lunch Catering | ||
12:50 - 14:10 | |||
12:50 80mLunch | Lunch Catering | ||
14:10 - 15:30 | Distributed TrainingPPoPP Main Conference at Balmoral Chair(s): Bo Fang University of Texas at Arlington | ||
14:10 20mTalk | COCCL: A Collective Communication Library Supporting Easy Integration and Configuration of Customized Compression for Scalable LLM Training PPoPP Main Conference Xingchen Liu University of Chinese Academy of Sciences, Haoran Kong Chinese University of Hong Kong, Shenzhen, Hairui Zhao Jilin University, Shengkai Lyu University of Chinese Academy of Sciences, Zheng Wei University of Chinese Academy of Sciences, Man Liu University of Chinese Academy of Sciences, Xingjian Tian University of Chinese Academy of Sciences, Liyang Zhao University of Chinese Academy of Sciences, Zhuohan Chen University of Chinese Academy of Sciences, Fakang Wang Ant Group, Zizhong Chen Chinese University of Hong Kong, Shenzhen, Zhan Wang University of Chinese Academy of Sciences, Guangming Tan University of Chinese Academy of Sciences, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences DOI | ||
14:30 20mTalk | Elastor: Elastic and Efficient Model Partitioning and Checkpointing for Fault-Tolerant Distributed Training PPoPP Main Conference Xuanyu Wang Peking University, Fangcheng FU Shanghai Jiao Tong University, Haoyang Li Peking University, Hao Ge Peking University, Sheng Lin Peking University, Jiawen Niu Peking University, Bin Cui Peking University DOI | ||
14:50 20mTalk | HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism PPoPP Main Conference Geng Zhang National University of Singapore, Shenggan Cheng National University of Singapore, Xuanlei Zhao National University of Singapore, Ziming Liu , Yang You National University of Singapore DOI | ||
15:10 20mTalk | CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model TrainingBest Paper Nominee PPoPP Main Conference Yida Gu University of Chinese Academy of Sciences, Fakang Wang AntGroup, Jianhao Fu AntGroup, Zhenhang Sun Ant Group, Qianyu Zhang Ant Group, Hairui Zhao Jilin University, Xingchen Liu University of Chinese Academy of Sciences, Yang Tian Ant Group, Wenjing Huang University of Chinese Academy of Sciences, Zedong Liu University of Chinese Academy of Sciences, Yifan Chen Ant Group, Jinwu Yang University of Chinese Academy of Sciences, Yueyuan Zhou University of Chinese Academy of Sciences, Qian Zhao Ant Group, Haoxu Li University of Chinese Academy of Sciences, Tao Wang Ant Group, Feng Yu Ant Group, Zhan Wang University of Chinese Academy of Sciences, Guangming Tan University of Chinese Academy of Sciences, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences DOI | ||
14:10 - 15:30 | |||
14:10 20mTalk | PIP: Making Andersen’s Points-to Analysis Sound and Practical for Incomplete C Programs CGO Main Conference Håvard Rognebakke Krogstie NTNU, Helge Bahmann Independent Researcher, Magnus Själander Norwegian University of Science and Technology (NTNU), Nico Reissmann Independent Researcher Pre-print Media Attached | ||
14:30 20mTalk | Thinking Fast and Correct: Automated Rewriting of Numerical Code through Compiler Augmentation CGO Main Conference Siyuan Brant Qian University of Illinois at Urbana-Champaign, Vimarsh Sathia University of Illinois Urbana Champaign, Ivan Ivanov Institute of Science Tokyo, Jan Hueckelheim Argonne National Laboratory, Paul Hovland Argonne National Laboratory, William S. Moses University of Illinois Urbana-Champaign Pre-print Media Attached | ||
14:50 20mTalk | PolyUFC: Polyhedral Compilation Meets Roofline Analysis for Uncore Frequency Capping CGO Main Conference Nilesh Rajendra Shah Indian Institute of Technology Hyderabad, India, M V V S Manoj Kumar IIT Hyderabad, Dhairya Baxi IIT Hyderabad, Ramakrishna Upadrasta IIT Hyderabad Pre-print | ||
15:10 20mTalk | Accelerating App Recompilation across Android System Updates by Code Reusing CGO Main Conference Hongtao Wu Wuhan University, Yu Chen Wuhan University, Mengfei Xie Wuhan University, Futeng Yang Guangdong OPPO Mobile Telecommunications, Jun Yan Guangdong OPPO Mobile Telecommunications, Jiang Ma OPPO Electronics Corp., Jianming Fu Wuhan University, Jason Xue MBZUAI, Qingan Li Wuhan University, China Pre-print | ||
14:10 - 15:30 | Memory Systems for Scalable ComputingHPCA Main Conference at Collaroy Chair(s): Alexandros Daglis Georgia Tech | ||
14:10 20mTalk | BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism HPCA Main Conference | ||
14:30 20mTalk | RoMe: Row Granularity Access Memory System for Large Language Models HPCA Main Conference Hwayong Nam Seoul National University, Seungmin Baek Seoul National University, Jumin Kim Seoul National University, Michael Jaemin Kim Meta, Jung Ho Ahn Seoul National University Pre-print | ||
14:50 20mTalk | HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs HPCA Main Conference daoxuan xu William & Mary, Ying Li William & Mary, Yuwei Sun UIUC, Jie Ren William & Mary, Yifan Sun William&Mary | ||
15:10 20mTalk | Pulse: Fine-Grained Hierarchical Hashing Index for Disaggregated Memory HPCA Main Conference Guangyang Deng Xiamen University, Zixiang Yu Xiamen University, Zhirong Shen Xiamen University, Qiangsheng Su Xiamen University, Jiwu Shu Xiamen University | ||
14:10 - 15:30 | |||
14:10 20mTalk | LILo: Harnessing the On-chip Accelerators in Intel CPUs for Compressed LLM Inference Acceleration HPCA Main Conference Hyungyo Kim UIUC, Qirong Xia UIUC, Jinghan Huang UIUC, Nachuan Wang UIUC, Jung Ho Ahn Seoul National University, Younjoo Lee Seoul National University, Wajdi K Feghali Intel, Ren Wang Intel Labs, Nam Sung Kim UIUC | ||
14:30 20mTalk | ReThermal: Co-Design of Thermal-Aware Static and Dynamic Scheduling for LLM Training on Liquid-Cooled Wafer-Scale Chips HPCA Main Conference Chengran Li Tsinghua University, Huizheng Wang Tsinghua University, Jiaxin Liu Tsinghua University, Jingyao Liu Tsinghua University, Zhiheng Yue Tsinghua University, Xia Li Shanghai AI Lab, Shenfei Jiang Shanghai AI Lab, Jinyi Deng Tsinghua University, Yang Hu Tsinghua University, Shouyi Yin Tsinghua University | ||
14:50 20mTalk | TraceRTL: Agile Performance Evaluation for Microarchitecture Exploration HPCA Main Conference Zifei Zhang SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yinan Xu SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Sa Wang SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Dan Tang SKLP, Institute of Computing Technology, Chinese Academy of Sciences; Beijing Institute of Open Source Chip, Yungang Bao State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences | ||
15:10 20mTalk | Nugget: Portable Program Snippets HPCA Main Conference Zhantong Qiu University of California, Davis, Mahyar Samani University of California, Davis, Jason Lowe-Power University of California, Davis & Google | ||
14:10 - 15:30 | |||
14:10 20mTalk | BASES: Enabling Energy-Efficient and Error-Resilient Analog CIM Acceleration via Reformation of Coding Bases HPCA Main Conference hongrui guo Institute of Computing Technology, Chinese Academy of Sciences, Tianrui Ma Institute of Computing Technology, Chinese Academy of Sciences, zidong du Institute of Computing Technology, Chinese Academy of Sciences, Mo Zou Institute of Computing Technology, Chinese Academy of Sciences, Yifan Hao ICT, Chinese Academy of Sciences, Yongwei Zhao Institute of Computing Technology, Chinese Academy of Sciences, Rui Zhang Chinese Academy of Sciences, Wei Li Institute of Software Chinese Academy of Sciences; University of Chinese Academy of Sciences, Xing Hu Institute of Computing Technology, Chinese Academy of Sciences, Zhiwei Xu Institute of Computing Technology of the Chinese Academy of Sciences, China, Qi Guo Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies | ||
14:30 20mTalk | A PN-Free Digital SAT Accelerator Using Crossbar Architecture and Frequency-Controlled Counters HPCA Main Conference Zhezheng Ren University of Waterloo, Chenao Yuan University of Waterloo, Yuke Zhang University of Toronto, Shiyu Su University of Waterloo | ||
14:50 20mTalk | ESTroM: Element-Flow Architecture For Processing Sparse Tractable Probabilistic Models HPCA Main Conference anjunyi fan Peking University, Xuejie Liu Peking University, Anji Liu University of California, Los Angeles, Qiuping Wu Peking University, Jiaqi Yang Peking University, Yuchao Qin Peking University, Guy Van den Broeck University of California at Los Angeles, Yitao Liang Peking University, Bonan Yan Peking University | ||
15:10 20mTalk | GustavSNN: Unleashing the Power of Gustavson's Algorithm on SNN Acceleration with Column-Parallel Tick-Batch Dataflow HPCA Main Conference Sangwoo Hwang Korea University, Donghun Lee Korea University, Jahyun Koo DGIST, Jaeha Kung Korea University | ||
14:10 - 15:30 | |||
14:10 20mTalk | Pipelonk: Accelerating End-to-End Zero-Knowledge Proof Generation on GPUs for PLONK-Based Protocols PPoPP Main Conference Zhiyuan Zhang Shandong University, Yanxin Cai Shandong University, Wenhao Yin Shandong University, Xueyu Wu The University of Hong Kong, Yi Wang Shenzhen University, Lei Ju Shandong University, Zhuoran Ji Shandong University DOI | ||
14:30 20mTalk | ParDiff: Efficiently Parallelizing Reverse-Mode Automatic Differentiation with Direct Indexing PPoPP Main Conference Shuhong Huang Tsinghua University, Shizhi Tang Qingcheng.AI, Yuan Wen University of Aberdeen, Huanqi Cao Tsinghua University, Ruibai Tang Tsinghua University, yidong chen , Jiping Yu Tsinghua University, Yang Li Lenovo Research, Chao Jiang Lenovo Research, Limin Xiao Lenovo Research, Jidong Zhai Tsinghua University DOI | ||
14:50 20mTalk | Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs PPoPP Main Conference Zhonghai Zhang Institute of Computing Technology, Chinese Academy of Sciences / University of Chinese Academy of Sciences, Yewen Li The Hong Kong University of Science and Technology, Ke Meng Chinese Academy of Sciences, Chunming Zhang Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan University of Chinese Academy of Sciences DOI | ||
15:10 20mTalk | PIM-zd-tree: A Fast Space-Partitioning Index Leveraging Processing-in-Memory PPoPP Main Conference Yiwei Zhao Carnegie Mellon University, Hongbo Kang Tsinghua University, Ziyang Men University of California, Riverside, Yan Gu University of California, Riverside, Guy E. Blelloch Carnegie Mellon University, Laxman Dhulipala University of Maryland, College Park, Charles McGuffey Reed College, Phil Gibbons Carnegie Mellon University DOI | ||
15:30 - 15:50 | |||
15:30 20mCoffee break | Break Catering | ||
15:50 - 17:10 | |||
15:50 20mTalk | BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing PPoPP Main Conference Hanjing Shen Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Jian Liu Beijing University of Aeronautics and Astronautics, Li Jiang Shanghai Jiaotong University, Haibing Guan Shanghai Jiao Tong University DOI | ||
16:10 20mTalk | Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving PPoPP Main Conference Jianxiong Liao Sun Yat-sen University, Quanxing Dong Sun Yat-sen University, Yunkai Liang Sun Yat-sen University, Zhi Zhou Sun Yat-sen University, Xu Chen Sun Yat-sen University DOI | ||
16:30 20mTalk | MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models PPoPP Main Conference DOI | ||
16:50 20mTalk | ChituDiffusion: A Data-Characteristic-Aware Serving System for Diffusion Models PPoPP Main Conference Chengzhang Wu Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Kezhao Huang Tsinghua University, Zixuan Ma Tsinghua University, Dong Dong , Jidong Zhai Tsinghua University DOI | ||
15:50 - 17:10 | Compiling for ML 2CGO Main Conference at Bronte Chair(s): Fabrice Rastello University Grenoble Alpes - Inria - CNRS - Grenoble INP - LIG | ||
15:50 20mTalk | QIGen: A Kernel Generator for Inference on Nonuniformly Quantized Large Language Models CGO Main Conference Pre-print Media Attached | ||
16:10 20mTalk | DyPARS: Dynamic-Shape DNN Optimization via Pareto-Aware MCTS for Graph Variants CGO Main Conference Hao Qian University of New South Wales, Guangli Li Institute of Computing Technology, Chinese Academy of Sciences, Qiuchu Yu Institute of Computing Technology at Chinese Academy of Sciences, Xueying Wang Beijing University of Posts and Telecommunications, Jingling Xue University of New South Wales Pre-print Media Attached | ||
16:30 20mTalk | Compiler-Runtime Co-operative Chain of Verification for LLM-Based Code Optimization CGO Main Conference Hyunho Kwon Yonsei University, Sanggyu Shin SAIT, Ju Min Lee Yonsei University, Hoyun Youm Yonsei University, Seungbin Song SAIT, Seongho Kim Yonsei University, Hanwoong Jung Samsung Advanced Institute of Technology, Seungwon Lee Samsung Advanced Institute of Technology, Hanjun Kim Yonsei University Pre-print | ||
16:50 20mTalk | Hexcute: A Compiler Framework for Automating Layout Synthesis in GPU Programs CGO Main Conference Xiao Zhang University of Toronto; NVIDIA, Yaoyao Ding University of Toronto; Vector Institute; NVIDIA, Bolin Sun University of Toronto; NVIDIA, Yang Hu NVIDIA, Tatiana Shpeisman Google, Gennady Pekhimenko University of Toronto / Vector Institute Pre-print Media Attached | ||
15:50 - 17:10 | |||
15:50 20mTalk | NPUWattch: ML-based Power, Area, and Timing Modeling for Neural Accelerators HPCA Main Conference Sehyeon Kim Yonsei University, Minkwan Kim Yonsei University, Chanho Park Yonsei University, Hanmok Park Kyungpook National University, Seonghoon Kim Kyungpook National University, Taigon Song Kyungpook National University, William Song Yonsei University | ||
16:10 20mTalk | Area Bloating and the Future of Specialization HPCA Main Conference | ||
16:30 20mTalk | Advancing Full-stack Acceleration for Schrödinger-Style Quantum Simulation HPCA Main Conference Shuang Liang Imperial College London, Yuncheng Lu Imperial College London, Ce Guo Imperial College London, Paul H J Kelly Imperial College London, Wayne Luk Imperial College London, Hongxiang Fan Imperial College London | ||
16:50 20mTalk | COMET: Communication and Memory Co-Design for Fine-Grained AI Inference in MCM Accelerators HPCA Main Conference Taishu Sheng College of Computer Science and Technology, National University of Defense Technology, Guangyu Sun Peking University, Dezun Dong NUDT | ||
15:50 - 17:10 | |||
15:50 20mTalk | Compression-Aware Gradient Splitting for Collective Communications in Distributed Training HPCA Main Conference Pranati Majhi Texas A&M University, Sabuj Laskar Texas A&M University, Abdullah Muzahid Texas A & M University, Eun Jung Kim | ||
16:10 20mTalk | SCALE: Tackling Communication Bottlenecks in Confidential Multi-GPU ML HPCA Main Conference Joongun Park Georgia Tech, Yongqin Wang University of Southern California, Huan Xu Georgia Institute of Technology, Hanjiang Wu Georgia Institute of Technology, Mengyuan Li USC, Tushar Krishna Georgia Institute of Technology | ||
16:30 20mTalk | AutoHAAP: Automated Heterogeneity-Aware Asymmetric Partitioning for LLM Training HPCA Main Conference Yuanyuan Wang Zhejiang Lab, Nana Tang Zhejiang Lab, Yuyang Wang Zhejiang Lab, Shu Pan Zhejiang Lab, Dingding Yu Zhejiang Lab, Zeyue Wang Zhejiang Lab, Mou Sun Zhejiang Lab, Kejie Fu Zhejiang Lab, Fangyu Wang Zhejiang Lab, Yunchuan Chen Zhejiang Lab, Ning Sun Zhejiang Lab, Fei Yang Zhejiang Lab | ||
16:50 20mTalk | Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems HPCA Main Conference Chen Zhang Shanghai Jiao Tong University, Qijun Zhang Shanghai Jiao Tong University, Zhuoshan Zhou Shanghai Jiao Tong University, Yijia Diao Shanghai Jiao Tong University, Haibo Wang Huawei, Zhe Zhou Huawei, Zhipeng Tu Huawei, Zhiyao Li Huawei, Guangyu Sun Peking University, Zhuoran Song Shanghai Jiao Tong University, Zhigang Ji Shanghai Jiao Tong University, Jingwen Leng Shanghai Jiao Tong University, Minyi Guo Shanghai Jiao Tong University | ||
15:50 - 17:10 | Domain Specific AcceleratorsHPCA Main Conference at Cronulla Chair(s): Jaewoong Sim Seoul National University | ||
15:50 20mTalk | Uni-STC: Unified Sparse Tensor Core HPCA Main Conference Haocheng Lian China University of Petroleum-Beijing, Qiyue Zhang China University of Petroleum-Beijing, Xinran Zhao China University of Petroleum-Beijing, Meichen Dong China University of Petroleum-Beijing, Yijie Nie China University of Petroleum-Beijing, Zhengyi Zhao China University of Petroleum-Beijing, Junzhong Shen National University of Defense Technology, Wei Guo National University of Defense Technology, Chun Huang National University of Defense Technology, Bingcai Sui National University of Defense Technology, Weifeng Liu China University of Petroleum-Beijing | ||
16:10 20mTalk | AUM: Unleashing the Efficiency Potential of Shared Processors with Accelerator Units for LLM Serving HPCA Main Conference Xinkai Wang Shanghai Jiao Tong University, Chao Li Shanghai Jiao Tong University, Yiming Zhuansun Shanghai Jiao Tong University, Jinyang Guo Shanghai Jiao Tong University, Xiaofeng Hou Shanghai Jiao Tong University, Jing Wang Shanghai Jiao Tong University, Luping Wang Alibaba Group, Weigao Chen Alibaba Group, Cheng Huang Alibaba Group, Guodong Yang Alibaba Group, Liping Zhang Alibaba Group, Minyi Guo Shanghai Jiao Tong University | ||
16:30 20mTalk | DRACO: A Hardware-Efficient Robot Rigid Body Dynamics Accelerator with Precision-Aware Quantization Framework HPCA Main Conference Xingyu Liu The Hong Kong University of Science and Technology, Jiawei Liang The Hong Kong University of Science and Technology, Yipu Zhang The Hong Kong University of Science and Technology, Linfeng Du The Hong Kong University of Science and Technology, Chaofang Ma The Hong Kong University of Science and Technology, Hui Yu Hong Kong University of Science and Technology, Xu Jiang University of Electronic Science and Technology of China, Wei Zhang The Hong Kong University of Science and Technology | ||
16:50 20mTalk | REASON: Accelerating Probabilistic Logical Reasoning for Neuro-Symbolic Cognitive Intelligence HPCA Main Conference Zishen Wan Georgia Institute of Technology, Che-Kai Liu Georgia Institute of Technology, Jiayi Qian Georgia Institute of Technology, Hanchen Yang Georgia Institute of Technology, Arijit Raychowdhury Georgia Institute of Technology, Tushar Krishna Georgia Institute of Technology | ||
15:50 - 17:10 | Graphs and Graph Neural NetworksPPoPP Main Conference at Pyrmont Chair(s): Ali Jannesari Iowa State University | ||
15:50 20mTalk | ElasGNN: An Elastic Training Framework for Distributed GNN Training PPoPP Main Conference Siqi Wang Beihang University, Hailong Yang Beihang University, Pengbo Wang Beihang University, Hongliang Cao Beihang University, Yufan Xu Independent Researcher, Xuezhu Wang Beihang University, Zhongzhi Luan Beihang University, Yi Liu Beihang University, Depei Qian Beihang University DOI | ||
16:10 20mTalk | APERTURE: Algorithm-System Co-optimization for Temporal Graph Network Inference PPoPP Main Conference Yiqing Wang Beihang University, Hailong Yang Beihang University, Enze Yu Beihang University, Qingxiao Sun Beihang University, Kejie Ma Beihang University, Kaige Zhang Beihang University, chenhao xie Beihang University, Depei Qian Beihang University DOI | ||
16:30 20mTalk | TAC: Cache-Based System for Accelerating Billion-Scale GNN Training on Multi-GPU Platform PPoPP Main Conference Zhiqiang Liang , Hongyu Gao , Fang Liu Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences, Xingguo Shi University of Chinese Academy of Sciences, Juyu Gu University of Chinese Academy of Sciences, Peng Di Ant Group & UNSW, San Li University of Chinese Academy of Sciences, Lei Tang University of Chinese Academy of Sciences, Chunbao Zhou University of Chinese Academy of Sciences, Lian Zhao University of Chinese Academy of Sciences, yangang wang University of Chinese Academy of Sciences, Xuebin Chi University of Chinese Academy of Sciences DOI | ||
16:50 20mTalk | DTMiner: A Data-Centric System for Efficient Temporal Motif Mining PPoPP Main Conference hou yinbo Huazhong University of Science and Technology, Hao Qi Huazhong University of Science and Technology, Ligang He University of Warwick, Jin Zhao Huazhong University of Science and Technology, Yu Zhang School of Computer Science and Technology, Huazhong University of Science and Technology, Hui Yu Hong Kong University of Science and Technology, Longlong Lin Southwest University, Lin Gu Huazhong University of Science and Technology, Wenbin Jiang Huazhong University of Science and Technology, XIAOFEI LIAO Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology DOI | ||
17:15 - 18:15 | |||
17:15 - 18:15 | Genomics and BioinformaticsHPCA Main Conference at Cronulla Chair(s): Abdulaziz Tabbakh King Fahd University of Petroleum and Minerals | ||
17:15 20mTalk | GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping HPCA Main Conference Julien Eudine Huawei Technologies Switzerland AG, Chu Li Huawei Zurich Research Center, Zhuo Cheng Huawei Zurich Research Center, Renzo Andri Huawei Technologies Switzerland AG, Onur Mutlu ETH Zurich, Can Firtina ETH Zurich and UMD, Mohammad Sadrosadati ETH Zürich, Nika Mansouri Ghiasi ETH Zurich, Konstantina Koliogeorgi ETH Zurich, Anirban Nag Huawei Zurich Research Center, Arash Tavakkol Huawei Zurich Research Center, Haiyu Mao King's College London, Shai Bergman Huawei Zurich Research Center, Ji Zhang Huawei Zurich Research Center | ||
17:35 20mTalk | SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis HPCA Main Conference Nika Mansouri Ghiasi ETH Zurich, Talu Güloglu ETH Zurich, Harun Mustafa ETH Zurich and Johns Hopkins University, Can Firtina ETH Zurich and UMD, Konstantina Koliogeorgi ETH Zurich, Konstantinos Kanellopoulos ETH Zurich, Haiyu Mao King's College London, Rakesh Nadig ETH Zurich, Mohammad Sadrosadati ETH Zürich, Jisung Park POSTECH (Pohang University of Science and Technology), Onur Mutlu ETH Zurich | ||
17:55 20mTalk | NP-CAM: Efficient and Scalable DNA Classification using a NoC-Partitioned CAM Architecture HPCA Main Conference Benjamin F. Morris III Duke University, Tergel Molom-Ochir Duke University, Changchun Zhou Duke University, Yiran Chen Duke University, Alex Jones Syracuse University, Hai "Helen" Li Duke University | ||
17:15 - 18:15 | Optimizing TransformersPPoPP Main Conference at Pyrmont Chair(s): Shaoshuai Zhang University of Electronic Science and Technology of China | ||
17:15 20mTalk | FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism PPoPP Main Conference Jianxing Xu University of Science and Technology of China, Yuanbo Wen , Jun Bi Chinese Academy of Sciences, Ruibai Xu University of Science and Technology of China, Guanglin Xu Chinese Academy of Sciences, Rui Zhang Chinese Academy of Sciences, Wei Li Chinese Academy of Sciences, Ling Li Institute of Software, Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies, Qi Guo Chinese Academy of Sciences, Yunji Chen Chinese Academy of Sciences DOI | ||
17:35 20mTalk | Accelerating Sparse Transformer Inference on GPU PPoPP Main Conference Wenhao Dai China University of Petroleum-Beijing, Haodong Deng China University of Petroleum, Mengfei Rong China University of Petroleum, Xinyu Yang Beihang University, Hongyu Liu Baidu Inc., Fangxin Liu Shanghai Jiao Tong University, Hailong Yang Beihang University, Qianwen Cao China University of Petroleum, Qingxiao Sun Beihang University DOI | ||
17:55 20mTalk | MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends PPoPP Main Conference Feiyang Chen Shanghai Jiao Tong University, Yu Cheng Peking University, Lei Wang Peking University, Yuqing Xia Microsoft Research, Ziming Miao Microsoft Research, Lingxiao Ma Microsoft Research, Fan Yang Microsoft Research Asia, Jilong Xue Microsoft Research, Zhi Yang Peking University, Mao Yang Microsoft Research, Xingda Wei Shanghai Jiao Tong University, Haibo Chen Shanghai Jiao Tong University DOI | ||
18:30 - 21:30 | |||
18:30 3hSocial Event | Excursion Catering | ||
Wed 4 FebDisplayed time zone: Hobart change
08:15 - 10:00 | |||
08:30 - 08:45 | |||
08:30 15mDay opening | Didgeridoo Performance Plenary Keynotes | ||
08:45 - 09:45 | |||
08:45 60mKeynote | Architecting Resilience at Scale: From Research to Practice Plenary Keynotes | ||
09:50 - 11:10 | |||
09:50 20mTalk | Multidirectional Propagation of Sparsity Information across Tensor Slices CGO Main Conference Kaio Henrique Andrade Ananias Universidade Federal de Minas Gerais, Danila Seliayeu University of Alberta, Jose Nelson Amaral University of Alberta, Fernando Magno Quintão Pereira Federal University of Minas Gerais Pre-print Media Attached | ||
10:10 20mTalk | Synthesizing Specialized Sparse Tensor Accelerators for FPGAs via High-Level Functional Abstractions CGO Main Conference Pre-print | ||
10:30 20mTalk | Progressive Low-Precision Approximation of Tensor Operators on GPUs: Enabling Greater Trade-Offs between Performance and Accuracy CGO Main Conference Fan Luo Institute of Computing Technology at Chinese Academy of Sciences, Guangli Li Institute of Computing Technology, Chinese Academy of Sciences, Zhaoyang Hao Institute of Computing Technology at Chinese Academy of Sciences, Xueying Wang Beijing University of Posts and Telecommunications, Xiaobing Feng ICT CAS, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Jingling Xue University of New South Wales Pre-print | ||
10:50 20mTalk | Tensor Program Superoptimization through Cost-Guided Symbolic Program Synthesis CGO Main Conference Alexander Brauckmann University of Edinburgh, Aarsh Chaube University of Edinburgh, José Wesley De Souza Magalhães University of Edinburgh, Elizabeth Polgreen University of Edinburgh, Michael F. P. O'Boyle University of Edinburgh Pre-print Media Attached | ||
09:50 - 11:10 | Hardware Security and Side-Channel DefensesHPCA Main Conference at Collaroy Chair(s): Georgios Vavouliotis Huawei Zurich Research Center, Switzerland | ||
09:50 20mTalk | DSASSASSIN: Cross-VM Side-Channel Attacks by Exploiting Intel Data Streaming Accelerator HPCA Main Conference Ben Chen The Hong Kong University of Science and Technology (Guangzhou), Kunlin Li The Hong Kong University of Science and Technology (Guangzhou), Shuwen Deng Tsinghua University, Dongsheng Wang Tsinghua University, Yun Chen The Hong Kong University of Science and Technology (Guangzhou) | ||
10:10 20mTalk | SSBleed: Non-speculative Side-channel Attacks via Speculative Store Bypass on Armv9 CPUs HPCA Main Conference Chang Liu Tsinghua University, Hongpei Zheng Tsinghua University, Xin Zhang Peking University, Dapeng Ju Tsinghua University, Dongsheng Wang Tsinghua University, Yinqian Zhang Southern University of Science and Technology, Trevor E. Carlson National University of Singapore | ||
10:30 20mTalk | Protean: A Programmable Spectre Defense HPCA Main Conference Nicholas Mosier Stanford University, Hamed Nemati KTH Royal Institute of Technology, John C. Mitchell Stanford University, Caroline Trippel Stanford University | ||
10:50 20mTalk | HERO-Sign: Hierarchical Tuning and Efficient Compiler-Time GPU Optimizations for SPHINCS$^+$ Signature Generation HPCA Main Conference Yaoyun Zhou University of California, Merced, Qian Wang University of California, Merced (UC Merced) | ||
09:50 - 11:10 | Graph Neural Networks and Retrieval SystemsHPCA Main Conference at Coogee Chair(s): Amir Yazdanbakhsh Google Research, Brain Team | ||
09:50 20mTalk | VeloxGNN: Accelerating Out-of-Core based GNN Training with Low Data Migration and High Accuracy via Delayed Gradient Propagation HPCA Main Conference Yi Li University of Texas at Dallas, Tsun-Yu Yang Center for Computational Evolutionary Intelligence, Electrical & Computer Engineering, Duke University, Zhaoyan Shen Shandong University, Ming-Chang Yang The Chinese University of Hong Kong (CUHK), Bingzhe Li University of Texas at Dallas | ||
10:10 20mTalk | AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance HPCA Main Conference Seungkwan Kang KAIST, Seungjun Lee KAIST, Donghyun Gouk Panmnesia, Miryeong Kwon Panmnesia, Hyunkyu Choi Panmnesia, Junhyeok Jang Panmnesia, Sangwon Lee Panmnesia, Huiwon Choi KAIST, Jie Zhang Peking University, Wonil Choi Hanyang University, Mahmut Taylan Kandemir Pennsylvania State University, Myoungsoo Jung KAIST | ||
10:30 20mTalk | Scaling Graph Neural Network Training via Geometric Optimization HPCA Main Conference Fangzhou Ye University of Central Florida, Lingxiang Yin University of Central Florida, Hao Zheng University of Central Florida | ||
10:50 20mTalk | VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG HPCA Main Conference | ||
09:50 - 11:10 | GPU Kernel Optimization and Resource SharingHPCA Main Conference at Cronulla Chair(s): Hyojin Sung Seoul National University | ||
09:50 20mTalk | μShare: Non-Intrusive Kernel Co-Locating on NVIDIA GPUs HPCA Main Conference Wenhao Huang Tianjin University, Zhaolin Duan Tianjin University, Laiping Zhao Tianjin University, Yuhao Zhang Tianjin University, Yanjie Wang Tianjin University, Yiming Li Tianjin University, Yihan Wang Tianjin University, Yichi Chen Tianjin University, Zhihang Tang Tianjin University, Kang Chen Tsinghua University, Deze Zeng China University of Geosciences, Wenxin Li Tianjin University, Keqiu Li Tianjin University | ||
10:10 20mTalk | FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive operators via Inter-Core Connection HPCA Main Conference huang ziyu Shanghai Jiao Tong University, Yangjie Zhou National University of Singapore, Zihan Liu Shanghai Jiao Tong University, Xinhao Luo Shanghai Jiao Tong University, Yijia Diao Shanghai Jiao Tong University, Minyi Guo Shanghai Jiao Tong University, Jidong Zhai Tsinghua University, Yu Feng Shanghai Jiao Tong University, Chen Zhang Shanghai Jiao Tong University, Anbang Wu Shanghai Jiao Tong University, Jingwen Leng Shanghai Jiao Tong University | ||
10:30 20mTalk | Swift: High-Performance Sparse-Dense Matrix Multiplication on GPUs HPCA Main Conference Jinyu Hu Hunan University, Huizhang Luo Hunan University, Hong Jiang UT Arlington, Marc Casas Barcelona Supercomputing Center, Kenli Li National Supercomputing Center in Changsha, Hunan University, Chubo Liu Hunan University | ||
10:50 20mTalk | QuCo: Efficient and Flexible Hardware-Driven Automatic Configuration of Tile Transfers in GPUs HPCA Main Conference Nicolas Meseguer University of Murcia, daoxuan xu William & Mary, Yifan Sun William&Mary, Michael Pellauer Nvidia, José L. Abellán University of Murcia, Manuel E. Acacio Universidad de Murcia (UMU) | ||
09:50 - 11:10 | Matrix and Linear Algebra AlgorithmsPPoPP Main Conference at Pyrmont Chair(s): Tony Hosking Australian National University | ||
09:50 20mTalk | Towards Singular Value Decomposition for Rank-Deficient Matrices: An Efficient and Accurate Algorithm on GPU Architectures PPoPP Main Conference Lu Shi University of Electronic Science and Technology of China, WeiWei Xu Nanjing University of Information Science and Technology, Shaoshuai Zhang University of Electronic Science and Technology of China DOI | ||
10:10 20mTalk | A Diagonal Block Memory-Aware Polynomial Preconditioner for Linear and Eigenvalue Solvers PPoPP Main Conference Xiaojian Yang National University of Defense Technology, Yuhui Ni National University of Defense Technology, Fan Yuan Xiangtan University, Shengguo Li National University of Defense Technology, Dezun Dong NUDT, xuchuanfu National University of Defense Technology, Haipeng Jia Jia, Jie Liu National University of Defense Technology DOI | ||
10:30 20mTalk | A Distributed Matrix-Block-Vector Multiplication in Presence of System Performance Variability PPoPP Main Conference Yuchen Ma College of William & Mary, Bin Ren College of William & Mary, Andreas Stathopoulos College of William & Mary DOI | ||
10:50 20mTalk | Characterizing Matrix Multiplication Units across General Parallel Patterns in Scientific Computing PPoPP Main Conference Yuechen Lu China University of Petroleum-Beijing, Hongwei Zeng , Marc Casas Barcelona Supercomputing Center, Weifeng Liu China University of Petroleum-Beijing DOI | ||
11:10 - 11:30 | |||
11:10 20mCoffee break | Break Catering | ||
11:30 - 12:50 | |||
11:30 20mTalk | A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler CGO Main Conference Mohammed Tirichine New York University Abu Dhabi; Ecole nationale Supérieure d'Informatique, Nassim Ameur NYU Abu Dhabi; École Nationale Supérieure d’Informatique, Nazim Bendib NYU Abu Dhabi; École Nationale Supérieure d’Informatique, Iheb Nassim Aouadj NYU Abu Dhabi, Djad Bouchama NYU Abu Dhabi; University of Science and Technology Houari Boumediene, Rafik Bouloudene NYU Abu Dhabi; University of Science and Technology Houari Boumediene, Riyadh Baghdadi New York University Abu Dhabi Pre-print Media Attached | ||
11:50 20mTalk | Towards Threading the Needle of Debuggable Optimized Binaries CGO Main Conference Cristian Assaiante Sapienza University of Rome, Simone Di Biasio Sapienza University of Rome, Snehasish Kumar Google LLC, Giuseppe Antonio Di Luna Sapienza University of Rome, Daniele Cono D'Elia Sapienza University of Rome, Leonardo Querzoni Sapienza University Rome Pre-print Media Attached | ||
12:10 20mTalk | Compiler-Assisted Instruction Fusion CGO Main Conference Ravikiran Ravindranath Reddy University of Murcia, Sawan Singh AMD, Arthur Perais CNRS, Alberto Ros University of Murcia, Alexandra Jimborean University of Murcia Pre-print | ||
12:30 20mTalk | LLM-VeriOpt: Verification-Guided Reinforcement Learning for LLM-Based Compiler Optimization CGO Main Conference Xiangxin Fang Queen Mary University of London; University of Edinburgh, Jiaqin Kang Queen Mary University of London, Rodrigo C. O. Rocha University of Edinburgh, Sam Ainsworth University of Edinburgh, Lev Mukhanov IMEC (Cambridge); Queen Mary University of London Pre-print Media Attached | ||
11:30 - 12:50 | FPGA, SmartNIC, and Reconfigurable ComputingHPCA Main Conference at Collaroy Chair(s): Jinho Lee Seoul National University | ||
11:30 20mTalk | RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs HPCA Main Conference Hongshi Tan National University of Singapore, Yao CHEN , Xinyu Chen Hong Kong University of Science and Technology, Qizhen Zhang University of Toronto, Cheng Chen ByteDance, China, Weng-Fai Wong National University of Singapore, Bingsheng He National University of Singapore | ||
11:50 20mTalk | DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics HPCA Main Conference Anshu Gupta UC San Diego, Yingqi Cao UC San Diego, Jason Liang UC San Diego, Yatish Turakhia UC San Diego | ||
12:10 20mTalk | Sassy: SmartNIC-Assisted Notification Delivery for μs-scale RDMA Workloads HPCA Main Conference | ||
12:30 20mTalk | TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification HPCA Main Conference Yang Zhong Institute of Computing, Chinese Academy of Sciences, Haoran Wu University of Cambridge, Xueqi Li State Key Lab of Processors, Institute of Computing Technology, CAS, Sa Wang SKLP, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, David Boland The University of Sydney, Yungang Bao State Key Lab of Processors, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, Kan Shi Institute of Computing, Chinese Academy of Sciences | ||
11:30 - 12:50 | Efficient Serving and Resource ManagementHPCA Main Conference at Coogee Chair(s): Mohammad A. Islam University of Texas at Arlington | ||
11:30 20mTalk | Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates HPCA Main Conference Wenjun Yu Hong Kong Baptist University, Sitian Chen Hong Kong Baptist University, Amelie Chi Zhou Hong Kong Baptist University, Cheng Chen ByteDance, China | ||
11:50 20mTalk | AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices HPCA Main Conference Jovan Stojkovic University of Illinois at Urbana-Champaign, Abraham Farrell University of Illinois Urbana-Champaign, Zhangxiaowen Gong Intel, Christopher J. Hughes Intel, Josep Torrellas University of Illinois at Urbana-Champaign | ||
12:10 20mTalk | SpotCC: Facilitating Coded Computation for Prediction Serving Systems on Spot Instances HPCA Main Conference Lin Wang , Yuchong Hu Huazhong University of Science and Technology, Ziling Duan Huazhong University of Science and Technology, Mingqi Li Huazhong University of Science and Technology, Chenxuan Yao Huazhong University of Science and Technology, feifanliu Huazhong University of Science and Technology, Xiaolu Li Huazhong University of Science and Technology, Leihua Qin Huazhong University of Science and Technology, Dan Feng Huazhong University of Science and Technology, China | ||
12:30 20mTalk | LowCarb: Carbon-Aware Scheduling of Serverless Functions HPCA Main Conference | ||
11:30 - 12:50 | GPU Memory Management and Multi-Chiplet SystemsHPCA Main Conference at Cronulla Chair(s): EJ Kim Texas A&M University | ||
11:30 20mTalk | Exploration of LLM Workload Reliability based on di/dt effects and Voltage Droops HPCA Main Conference Zhixing Jiang University of Texas at Austin, Justin Garrigus University of Texas at Austin, Allison Seigler University of Texas at Austin, Ethan Syed University of Texas at Austin, Yan-Lun Huang University of Texas at Austin, Mehdi Sadi Advanced Micro Devices, Tawfik Rahal-Arabi Advanced Micro Devices, Lizy John University of Texas, Austin | ||
11:50 20mTalk | ARIADNE: Adaptive UVM Management for Efficient GPU Memory Oversubscription HPCA Main Conference Hyunkyun Shin Yonsei University, Seongtae Bang DGIST, Hyungwon Park DGIST, Daehoon Kim Yonsei University | ||
12:10 20mTalk | LRM-GPU: Alleviating Synchronization Overhead for Multi-Chiplet GPU Architecture HPCA Main Conference Baiqing Zhong Sun Yat-Sen University, Zhirong Ye Sun Yat-Sen University, Xiaojie Li Sun Yat-Sen University, Peilin Wang Sun Yat-Sen University, Haiqiu Huang Sun Yat-Sen University, Zhaolin Li Tsinghua University, Zhiyi Yu Sun Yat-sen University, Mingyu Wang Sun Yat-Sen University | ||
12:30 20mTalk | LEGO: Supporting LLM-enhanced Games with One Gaming GPU HPCA Main Conference Han Zhao Shanghai Jiao Tong University, Weihao Cui Shanghai Jiao Tong University, Zeshen Zhang Tongji University, Wenhao Zhang Shanghai Jiao Tong University, Jiangtong Li Tongji University, Quan Chen Shanghai Jiao Tong University, China, Youmin Chen Shanghai Jiao Tong University, Pu Pang Shanghai Jiao Tong University, Zijun Li Shanghai Jiao Tong University, Zhenhua Han The University of Hong Kong, Yuqing Yang Microsoft Research, Minyi Guo Shanghai Jiao Tong University | ||
12:50 - 13:20 | |||