Spotlight

Designing Computer Systems for Software 2.0
ISCA '18: 45th International Symposium on Computer Architecture, Keynote

Employing Machine Learning to generate models from data is replacing traditional software development in many applications. This fundamental shift in how we develop software is known as Software 2.0. However, the continued success of Software 2.0 relies on the availability of powerful, efficient and flexible computer systems.

This talk will introduce a design paradigm that exploits the characteristics of Software 2.0 to create computer systems that are optimized for both programmability and performance. The key to the design paradigm is a full-stack approach that integrates algorithms, domain-specific languages, advanced compilation technology and new hardware architectures.

Publications

Spatial: A Language and Compiler for Application Accelerators
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, Kunle Olukotun
PLDI '18: Programming Language Design and Implementation
Paper PDF

Plasticine: A Reconfigurable Architecture For Parallel Patterns
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, Kunle Olukotun
ISCA '16: 43rd International Symposium on Computer Architecture,
Top Picks special issue of IEEE Micro (May / June 2018)
Paper PDF

EmptyHeaded: A Relational Engine for Graph Processing
Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher Ré
SIGMOD '16: Special Interest Group on Management of Data, June 2016. (Best Of Award)
Paper PDF | Slides

Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling
Christopher De Sa, Kunle Olukotun, and Christopher Ré
ICML '16: Proceedings of the 33rd Intl. Conference on Machine Learning, June 2016. (Best Paper Award)
Paper PDF | Slides | Poster

Automatic Generation of Efficient Accelerators for Reconfigurable Hardware
David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, and Kunle Olukotun
ISCA '16: 43rd International Symposium on Computer Architecture, June 2016.
Paper PDF | Slides

Generating Configurable Hardware from Parallel Patterns
Raghu Prabhakar, David Koeplinger, Kevin J. Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, and Kunle Olukotun
ASPLOS '16: 21st International Conference on Architectural Support for Programming Languages and Operating Systems, April 2016.
Paper PDF

Have Abstraction and Eat Performance, Too: Optimized Heterogeneous Computing with Parallel Patterns
Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Arvind K. Sujeeth, Christopher De Sa, Christopher Aberger, and Kunle Olukotun
CGO '16: International Symposium on Code Generation and Optimization, March 2016.
Paper PDF

Rapidly Mixing Gibbs Sampling for a Class of Factor Graphs Using Hierarchy Width
Christopher De Sa, Ce Zhang, Christopher Ré, and Kunle Olukotun
NIPS '15: Proceedings of the 28th Neural Information Processing Systems Conference, December 2015.
Paper PDF | Poster

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms
Christopher De Sa, Ce Zhang, Christopher Ré, and Kunle Olukotun
NIPS '15: Proceedings of the 28th Neural Information Processing Systems Conference, December 2015.
Paper PDF | Poster

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems
Christopher De Sa, Kunle Olukotun, and Christopher Ré
ICML '15: Proceedings of the 32nd Intl. Conference on Machine Learning, July 2015.
Paper PDF | Slides | Poster

Locality-Aware Mapping of Nested Parallel Patterns on GPUs
HyoukJoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Tiark Rompf, and Kunle Olukotun
MICRO'14: 47th International Symposium on Microarchitecture, December 2014.
Paper PDF | Slides | Poster

Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages
Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun
TECS'14: ACM Transactions on Embedded Computing Systems, July 2014.
Paper PDF

Simplifying Scalable Graph Processing with a Domain-Specific Language
Sungpack Hong, Semih Salihoglu, Jennifer Widom, and Kunle Olukotun
CGO'14: International Symposium on Code Generation and Optimization, February 2014.
Paper PDF

Hardware Acceleration of Database Operations
Jared Casper and Kunle Olukotun
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays, February 2014.
Paper PDF | Slides

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs
Sungpack Hong, Nicole C. Rodia, and Kunle Olukotun
SC'13: International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2013.
Paper PDF | Slides | Code

Forge: Generating a High Performance DSL Implementation from a Declarative Specification
Arvind K. Sujeeth, Austin Gibbons, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Martin Odersky, and Kunle Olukotun
GPCE'13: 12th International Conference on Generative Programming: Concepts & Experiences, October 2013.
Paper PDF

Composition and Reuse with Compiled Domain-Specific Languages
Arvind K. Sujeeth, Tiark Rompf, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, Victoria Popic, Michael Wu, Aleksander Prokopec, Vojin Jovanovic, Martin Odersky, and Kunle Olukotun
ECOOP'13: European Conference on Object-Oriented Programming, July 2013.
Paper PDF

Optimizing Data Structures in High-Level Programs: New Directions for Extensible Compilers based on Staging
Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky
POPL'13: 40th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2013.
Paper PDF | Slides

A Case of System-level Hardware/Software Co-design and Co-verification of a Commodity Multi-Processor System with Custom Hardware
Sungpack Hong, Tayo Oguntebi, Jared Casper, Nathan Bronson, Christos Kozyrakis, and Kunle Olukotun
CODES+ISSS'12: 17th International Conference on Hardware/Software Codesign and System Synthesis, October 2012.
Paper PDF | Slides

Green-Marl: A DSL for Easy and Efficient Graph Analysis
Sungpack Hong, Hassan Chafi, Eric Sedlar, and Kunle Olukotun
ASPLOS '12: 17th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2012.
Paper PDF | Slides

SCD: A Scalable Coherence Directory with Flexible Sharer Set Encoding
Daniel Sanchez and Christos Kozyrakis
HPCA '12: 18th international Symposium on High Performance Computer Architecture, February 2012.
Paper PDF | Slides

CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization
Michael Bauer, Henry Cook, and Brucek Khailany
SC '11: Proceedings of the 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2011.
Paper PDF | Slides

Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers
Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan
SC '11: Proceedings of the 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2011.
Paper PDF | Slides

Dynamic Fine-Grain Scheduling of Pipeline Parallelism
Daniel Sanchez, David Lo, Richard M. Yoo, Jeremy Sugerman, and Christos Kozyrakis
PACT '11: 20th International Conference on Parallel Architectures and Compilation Techniques, October 2011.
Paper PDF | Slides

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun
PACT '11: 20th International Conference on Parallel Architectures and Compilation Techniques, October 2011.
Paper PDF | Slides

A Heterogeneous Parallel Framework for Domain-Specific Languages
Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun
PACT '11: 20th International Conference on Parallel Architectures and Compilation Techniques, October 2011.
Paper PDF | Slides

Implementing Domain-Specific Languages for Heterogeneous Parallel Computing
HyoukJoong Lee, Kevin J. Brown, Arvind K. Sujeeth, Hassan Chafi, Tiark Rompf, Martin Odersky, and Kunle Olukotun
IEEE Micro: Special Issue on CPU, GPU, and Hybrid Computing, September 2011.
Paper PDF

Building-Blocks for Performance Oriented DSLs
Tiark Rompf, Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Hassan Chafi, Martin Odersky, and Kunle Olukotun
DSL '11: IFIP Working Conference on Domain-Specific Languages, September 2011.
Paper PDF | Slides

OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning
Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun
ICML '11: Proceedings of the 28th Intl. Conference on Machine Learning, June 2011.
Paper PDF | Slides

Vantage: Scalable and Efficient Fine-Grain Cache Partitioning
Daniel Sanchez and Christos Kozyrakis
ISCA '11: Proceedings of the 38th Intl. Symposium on Computer Architecture, June 2011.
Paper PDF | Slides

Hardware Acceleration of Transactional Memory on Commodity Systems
Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan G. Bronson, Christos Kozyrakis, and Kunle Olukotun
ASPLOS '11: Proceedings of the 16th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, March 2011.
Paper PDF | Slides

Programming the Memory Hierarchy Revisited: Supporting Irregular Parallelism in Sequoia
Michael Bauer, John Clark, Eric Schkufza, and Alex Aiken
PPoPP '11: Proceedings of the 16th Annual Symposium on Principles and Practice of Parallel Programming, February 2011.
Paper PDF

Accelerating CUDA Graph Algorithms at Maximum Warp
Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun
PPoPP '11: Proceedings of the 16th Annual Symposium on Principles and Practice of Parallel Programming, February 2011.
Paper PDF | Slides

A Domain-Specific Approach to Heterogeneous Parallelism
Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun
PPoPP '11: Proceedings of the 16th Annual Symposium on Principles and Practice of Parallel Programming, February 2011.
Paper PDF | Slides

EigenBench: A Simple Exploration Tool for Orthogonal TM Characterisitics
Sungpack Hong, Tayo Oguntebi, Jared Casper, Nathan Bronson, Christos Koyrakis, and Kunle Olukotun
IISWC '10: Proceedings of the IEEE International Symposium on Workload Characteristics, December 2010. (Best Paper Award)
Paper PDF | Slides

The ZCache: Decoupling Ways and Associativity
Daniel Sanchez and Christos Kozyrakis
Micro '10: Proceedings of the 43rd Intl. Symposium on Microarchitecture, December 2010.
Paper PDF | Slides

Language Virtualization for Heterogeneous Parallel Computing
Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun
Onward! '10: Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, October 2010.
Paper PDF | Slides

Transactional Predication: High-Performance Concurrent Sets and Maps for STM
Nathan G. Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun
PODC '10: Proceedings of the 29th Annual ACM Conference on Principles of Distributed Computing, July 2010.
Paper PDF | Slides

Implementing and Evaluating Nested Parallel Transactions in Software Transactional Memory
Woongki Baek, Nathan Bronson, Christos Kozyrakis, and Kunle Olukotun
SPAA '10: Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, June 2010.
Paper PDF | Slides

Making Nested Parallel Transactions Practical using Lightweight Hardware Support
Woongki Baek, Nathan Bronson, Christos Kozyrakis, and Kunle Olukotun
ICS '10: Proceedings of the 24th Intl. Conference on Supercomputing, June 2010.
Paper PDF | Slides

Understanding Sources of Inefficiency in General-Purpose Chips
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz
ISCA '10: Proceedings of the 37th Intl. Symposium on Computer Architecture, June 2010.
Paper PDF | Slides

Evaluating Bufferless Flow Control for On-Chip Networks
George Michelogiannakis, Daniel Sanchez, William J. Dally, and Christos Kozyrakis
NOCS '10: Proceedings of the 4th ACM/IEEE International Symposium on Networks-on-Chip, May 2010.
Paper PDF | Slides

A Large-scale Architecture for Restricted Boltzmann Machines
Sang Kyun Kim, Peter L. McMahon, and Kunle Olukotun
FCCM '10: Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, May 2010.
Paper PDF

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures
Tayo Oguntebi, Sungpack Hong, Jared Casper, Nathan Bronson, Christos Kozyrakis, and Kunle Olukotun
FCCM '10: The 18th Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, May 2010.
Paper PDF | Slides

An Analysis of On-Chip Interconnection Networks for Large-Scale Chip Multiprocessors
Daniel Sanchez, George Michelogiannakis, and Christos Kozyrakis
TACO: ACM Transactions on Architecture and Code Optimization, April 2010.
Paper PDF

CCSTM: A Library-Based STM for Scala
Nathan G. Bronson, Hassan Chafi, and Kunle Olukotun
The First Annual Scala Workshop at Scala Days 2010, April 2010.
Paper PDF | Slides

Implementing and Evaluating a Model Checker for Transactional Memory Systems
Woongki Baek, Nathan G. Bronson, Christos Kozyrakis, and Kunle Olukotun
ICECCS '10: Proceedings of the 15th IEEE International Conference on Engineering of Complex Computer Systems, March 2010.
Paper PDF

Flexible Architectural Support for Fine-Grain Scheduling
Daniel Sanchez, Richard Yoo, and Christos Kozyrakis
ASPLOS '10: Proceedings of the 15th Intl. Conference on Architectural Support for Programming Languages and Operating Systems, March 2010.
Paper PDF

A Practical Concurrent Binary Search Tree.
Nathan G. Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun
PPoPP '10: Proceedings of the 15th Annual Symposium on Principles and Practice of Parallel Programming, January 2010.
Paper PDF | Slides

Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System
Richard M. Yoo, Anthony Romano, and Christos Kozyrakis
IISWC '09: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, October 2009.
Paper PDF

A Highly Scalable Restricted Boltzmann Machine FPGA Implementation
Sang Kyun Kim, Lawrence C. McAfee, Peter L. McMahon, and Kunle Olukotun
FPL '09: Proceedings of the IEEE Conference on Field Programmable Logic and Applications, September 2009.
Paper PDF

GRAMPS: A Programming Model for Graphics Pipelines
Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, and Pat Hanrahan
ACM Transactions on Graphics, January 2009.
Paper PDF

Feedback-Directed Barrier Optimization in a Strongly Isolated STM
Nathan G. Bronson, Christos Kozyrakis, and Kunle Olukotun
POPL '09: Proceedings of the 36th Annual ACM SIGPLAN-SIGACT Symposium on Principals of Programming Languages, January 2009.
Paper PDF | Slides

Members

Principal Investigator

Kunle Olukotun is the Cadence Design Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is well known as a pioneer in multicore processor design and the leader of the Stanford Hydra chip multiprocessor (CMP) research project. Olukotun founded Afara Websystems to develop high-throughput, low-power multicore processors for server systems. The Afara multicore processor, called Niagara, was acquired by Sun Microsystems. Niagara derived processors now power all Oracle SPARC-based servers.

Olukotun currently directs the Stanford Pervasive Parallelism Lab (PPL), which seeks to proliferate the use of heterogeneous parallelism in all application areas using Domain Specific Languages (DSLs). Olukotun is a member of the Data Analytics for What’s Next (DAWN) Lab which is developing infrastructure for usable machine learning. Olukotun is an ACM Fellow and IEEE Fellow for contributions to multiprocessors on a chip and multi-threaded processor design and is the recipient of of the 2018 IEEE Harry H. Goode Memorial Award. Olukotun received his Ph.D. in Computer Engineering from The University of Michigan.

PhD Students

Raghu Prabhakar

David Koeplinger

Christopher R. Aberger

Stefan Hadjis

Yaqi Zhang

Matthew Feldman

Tian Zhao

Tushar Swamy

Matthew Villim

Alex Rucker

Nathan Zhang

Daisy

Postdoc

Muhammad Shahbaz

Research Associates

Luigi Nardi

Ardavan Pedram

Visiting Researcher

Rekha Singhal

Alumni

Projects

Spatial

A high level language for programming accelerators

Argon

An experimental framework for creating Staged DSLs embbedded in Scala

HypperMapper

A multi-objective autotuner tool for optimizing software and hardware parameters

Talks

Fall 2018:

DateSpeakerTopicLunch Buyer
9/27 N/A Status Updates Alex
10/4 Rekha Overview of Projects at Tata Group Tushar
10/11 Alex Tungsten: A Simulator for Data-Dependent Plasticine Applications Tian
10/18 Luigi ASPLOS rebuttal Matt V.
10/25 Tushar N/A Nathan
11/1 Matt V. N/A Yaqi
11/8 Nathan N/A Matt F.
11/15 Shahbaz N/A Luigi
11/22 Thanksgiving No Talk N/A
11/29 Stefan N/A Rekha
12/6 Yaqi N/A Stefan
12/13 Rotation Students N/A Dan

Previous Talks

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form