I’m Florian Mattana, GPU kernel engineer based in France.

I write CUDA kernels at the PTX level for LLM inference: fused attention, quantized GEMM, online softmax. You can read about the technical work on the blog and on GitHub.

I got into GPU computing through a weird path. Started in finance (Sorbonne master’s), then built a crypto mining rig around 2015 and got hooked on understanding why memory bandwidth matters more than clock speed. That led to production CUDA work at Geopost, Airbus, and Melexis, and eventually to writing inference kernels full time.

I’ve lived in five countries (South Korea, Spain, France, the UK, and Russia), which taught me how to work with anyone, adapt fast, and communicate across cultures and time zones. I hold PMP and Agile certifications from years of shipping under heavy production constraints, so I know how to scope work, hit deadlines, and push back when a plan doesn’t make sense.

When I’m not staring at NCU reports, I’m watching RC Lens lose in creative ways, rewatching Arcane for the fourth time, or playing Hunt: Showdown. I like building things from scratch, understanding how they work at every level, and explaining what I learned along the way.

Contact

If you want to discuss GPU kernel work, inference optimization, or have a project where low-level CUDA expertise would help, reach out on LinkedIn or Twitter.

I’m open to kernel engineering roles at companies working on inference, GPU compilers, or high-performance computing. Full remote or relocation, EU citizen. If you’re hiring for that, let’s talk.