About — Prosopon
The face one turns to the world.
I am Achille Triomphe, a research engineer focused on high-performance computing for deep learning. I design GPU kernels, optimize distributed training systems, and build compiler tools that bridge the gap between research ideas and hardware-efficient execution.
My work sits at the intersection of systems engineering, machine learning, and computer architecture. I believe that understanding hardware deeply is the only way to build software that truly scales.
Currently thinking about: warp-group MMA scheduling on Hopper, the geometry of low-rank adaptation, and how to make kernel DSLs as expressive as they are fast.
Experience
-
Research Engineer
IndependentDesigning and optimizing high-performance AI kernels, GPU memory hierarchies, and distributed training systems. Focus on CUDA, Tensor Cores, and compiler toolchains.
-
GPU Systems Researcher
Academic / Open SourceDeep dives into CUDA GEMM optimization, PTX assembly, and embedded DSLs for kernel generation. Contributed to open-source HPC tooling and profiling infrastructure.
Capabilities
Languages
Frameworks
Tools
Domains
Publications
- LoRA Without Regret Scientific Worklog, 2026
- Dissecting ThunderKittens Scientific Worklog, 2026