About — Prosopon

The face one turns to the world.

I am Achille Triomphe, a research engineer focused on high-performance computing for deep learning. I design GPU kernels, optimize distributed training systems, and build compiler tools that bridge the gap between research ideas and hardware-efficient execution.

My work sits at the intersection of systems engineering, machine learning, and computer architecture. I believe that understanding hardware deeply is the only way to build software that truly scales.

Currently thinking about: warp-group MMA scheduling on Hopper, the geometry of low-rank adaptation, and how to make kernel DSLs as expressive as they are fast.

Experience

  • 2024 — Present

    Research Engineer

    Independent

    Designing and optimizing high-performance AI kernels, GPU memory hierarchies, and distributed training systems. Focus on CUDA, Tensor Cores, and compiler toolchains.

  • 2022 — 2024

    GPU Systems Researcher

    Academic / Open Source

    Deep dives into CUDA GEMM optimization, PTX assembly, and embedded DSLs for kernel generation. Contributed to open-source HPC tooling and profiling infrastructure.

Capabilities

Languages

C++20CUDA C++PythonPTX/SASS

Frameworks

PyTorchTritonThunderKittensCUTLASS

Tools

NSight ComputeLLVMGDBGit

Domains

GPU KernelsDistributed TrainingCompilersDeep Learning

Publications

Contact