TileLang is a deep learning compiler designed for efficient and scalable machine learning systems. Currently, I'm a core contributor to this project, responsible for
the following parts:
Its multi-backend support part. We will design a unified plugin interface to support multiple backends.
Its IR (Intermediate Representation) design and implementation, with tight collaboration with the TVM community. We will embrace TIR-X, which is the next generation tensor IR in TVM.
Host part of the compiler, including mitigating the overhead of host overhead through Host CodeGen。
Tutorials (TileLang-Puzzles), documentation and user-friendly supports.
Twilight targets at optimizing the attention mechanism in LLMs through sparse attention.
In this project, we have organized the general paradigm of Top-K attention and implemented high-performance kernels to enable performance reproducibility.
ParrotServe is a distributed, multi-tenant serving system for various LLM applications. It's the open source of our OSDI'24 paper.
This project is done during my undergraduate internship in MSRA. As the project leader,
I implement the core part and many important algorithms in the system.
This system highlights
the following techniques around the Semantic Variable abstraction:
Automatically parallelize and batch LLM requests in complex LLM applications. Asynchronous communication between dependent requests.
Performance objective deduction and DAG-aware scheduling.
Context-aware scheduling. Sharing prefix with optimized attention kernel.
TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. In this sub-project, I worked on the TVM Unity compiler and Relax IR (The new generation graph-level IR) and successfully developed an end-to-end framework
to train models in Relax IR. (TVMCon'23 |
PRs)
I developed an algorithm of automatic
differentiation
(AD) which is implemented as an IR transformation/pass. Also, many useful tools (like loss function, Optimizer) are integrated.
I help to build the infrastructure of Relax IR, including many basic operators.
A Compiler from a C++ & Java like to RV32I Assembly, with many optimizations on LLVM
IR, e.g. ADCE, CSE, SCCP, LICP.
Its performance is close to GCC O2 on testcases.
Received a perfect score in
two different compilation courses.
Also check out the sub-project for JIT (Just-in-time) complication. It's a virtual machine for LLVM
(quite simple):
[DarkSwordVM].
A RISC-V CPU implemented in Verilog HDL. It uses Tomasulo algorithm for dynamic
scheduling
and
supports
at most 120MHz clock rate to pass all testcases. It runs on Basys3 FPGA Board
(XC7A35T-ICPG236C).
I modified the traditional framework of this course project so that it can
operate two RAMs, which makes it possible to separate RAM into IRAM and DRAM.
And there are two simulators written in C++ for debugging and understanding the algorithms:
[Simulator-Pipeline],
[Simulator-Tomasulo]
A File System Escape game. The file system is implemented by Linux FUSE.
Use `cd` to move, find `exit` in this randomly generated
file system. Welcome to try breaking the SPEEDRUN record if you are a proficient
shell user!
Alice is an [A]utomatic [Li]near Temporal Logic [C]hecking Syst[e]m implemented in C++.
It implements serveral model checking algorithms described in class. And it can
check the correctness of given LTL formulas.
DHTengu is "DHT (Distributed Hash Table)" + "Tengu (a supernatural spirit)". It implements two
DHT protocols (Chord and Kademlia) in Go-lang. Then I developed an application based on it, which
enables
P2P
file sharing and music playing.
A system for train ticket selling. It's the first collaborating projects in my B.S, which includes two
part: Frontend (Web) and Backend (Database).
For backend, the data query is implemented by the B+ Tree data structure; For frontend, we use
Flask framework to develop a nice Web
application for users.
A toy Python 3 Interprter written in C++. The frontend parser is powered by Antlr framework.
It doesn't support all syntax of Python programming language。