Computer Systems

Overview

Computer systems research investigates how the layers between a user's program and the physical machine — compilers, architecture, operating systems, and databases — fit together to determine performance, correctness, and energy. These four pillars are usually taught separately, but in practice they constantly interact: a compiler optimization that ignores the memory hierarchy is wasted; a database engine that ignores OS-level I/O scheduling cannot meet its latency targets; a CPU pipeline that does not anticipate the kind of code modern compilers emit will idle under real workloads.

My interest in this thread comes from one observation — every other research area I work on (AI security, federated learning, quantum, medical AI, autonomous driving) ultimately runs on a stack of systems software. Understanding that stack deeply is what turns a model on paper into a system that actually works in the world.

1. Compilers

Compilers are the bridge between high-level program intent and the physical hardware that executes it. My work spans four complementary directions of compiler technology:

AI / ML Compilers — graph-level optimization passes for deep learning models (TVM, XLA, MLIR), operator fusion, layout transformation, kernel scheduling, and autotuning to extract maximum throughput from accelerators.
Hardware-aware Compilation — lowering computational graphs onto FPGAs, GPUs, and emerging NPUs, with explicit modeling of memory hierarchies, dataflow patterns, and polyhedral loop transformations.
Quantum Compilers — circuit optimization, gate decomposition, qubit routing, and noise-aware transpilation that adapts logical quantum circuits to real NISQ devices.
Classical Compiler & PL Theory — LLVM-based static analysis, SSA transformations, and formal semantics that make optimizations provably correct.

2. Computer Architecture

Computer architecture studies the design and behavior of the processor itself — pipelines, caches, memory systems, and the parallelism available across cores and accelerators. Even modest workload optimizations only pay off if the underlying machine actually delivers what its instruction set advertises.

Memory hierarchy and cache behavior — cache-aware data structures, blocking, and prefetching strategies that exploit temporal and spatial locality.
Instruction-level and thread-level parallelism — out-of-order execution, branch prediction, SIMD vector units, and multi-core / multi-socket scaling.
Accelerator architectures — GPUs (warps, tensor cores), TPUs (systolic arrays), NPUs, and how their execution models shape software design.
Energy- and reliability-aware architecture — DVFS, approximate computing, and resilience to soft errors / bit-flips — directly relevant to AI security research on fault injection.

3. Operating Systems

The operating system is the layer that turns a piece of physical hardware into a programmable platform — managing processes, memory, files, and network I/O, and providing the isolation guarantees that everything above it depends on.

Process and thread scheduling — CFS, real-time schedulers, and how scheduling decisions show up as latency variance in user-space workloads.
Virtual memory and address translation — page tables, TLBs, huge pages, and the cost of memory mapping for large model weights.
File systems and I/O — block layer, page cache, I/O schedulers, and modern storage stacks (NVMe, io_uring) that matter when reading TB-scale datasets.
Virtualization and containers — KVM, namespaces, cgroups, and the OS primitives behind every modern ML training cluster.
Kernel security — privilege separation, SMAP / SMEP, and side-channel mitigations — overlapping with my AI security work on adversarial robustness.

4. Databases

Database research sits at the intersection of algorithms, systems, and storage. Modern AI workloads have made databases doubly important: training data pipelines, vector search, and online inference all increasingly look like database problems.

Query processing and optimization — relational algebra, cost-based optimizers, join algorithms, and adaptive query execution.
Transactions and consistency — ACID, MVCC, isolation levels, and distributed consensus (Paxos / Raft).
Storage engines — B+ trees, LSM-trees, columnar storage, compression, and how storage layout determines query performance.
Vector and ML-native databases — approximate nearest-neighbor indexes (HNSW, IVF-PQ), hybrid SQL + vector workloads, and serving infrastructure for retrieval-augmented generation (RAG).
Distributed databases — sharding, replication, and the CAP-PACELC trade-offs that decide whether a multi-region system can stay responsive under failure.

Why study these four together?

Modern ML systems already are systems projects: a training pipeline touches every one of these layers at once. A compiler decision about operator fusion changes what the CPU pipeline sees; an OS scheduling policy changes how data loaders compete for I/O; a database choice between a row store and a vector store changes which optimizations the compiler can even apply. Treating these four threads as one research area is what makes it possible to reason about end-to-end performance honestly rather than locally.

Tools & Frameworks

LLVM / MLIR TVM XLA gem5 Intel VTune perf / eBPF Linux kernel QEMU / KVM PostgreSQL RocksDB DuckDB FAISS Milvus

Overview

1. Compilers

2. Computer Architecture

3. Operating Systems

4. Databases

Why study these four together?

Tools & Frameworks

Reading List

Compilers

Computer Architecture

Operating Systems

Databases