I Wrote an MXFP4 Quantization Kernel and Ranked #1 on Tensara

Sun, 05 Apr 2026 00:00:00 +0000

Why I Did This

I’m building an FP4 fused attention kernel for consumer Blackwell GPUs (SM120). That means I spend my days thinking about how to squeeze 32-bit numbers into 4 bits without losing too much information.

Tensara is a platform where you submit GPU kernels and compete on real hardware. They had an MXFP4 quantization problem with almost no submissions. I figured: I already know this format inside out on SM120, how hard can it be to write a standalone quantization kernel?

Quantization on Florian Mattana

I Wrote an MXFP4 Quantization Kernel and Ranked #1 on Tensara

Why I Did This