Blog — Peripatos
Thinking is done in motion.
-
LoRA Without Regret
An exploration of when low-rank adaptation matches full fine-tuning, covering learning rate invariance, capacity limits, and the geometry of parameter updates.
-
Dissecting ThunderKittens: Anatomy of a Compact DSL for High-Performance AI Kernels
A deep dive into Stanford's embedded DSL for CUDA, exploring how its 16×16 tile abstractions map directly to Tensor Cores, shared memory, and warp-group MMA on Hopper.