<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Quantization on Florian Mattana</title>
    <link>https://florianmattana.com/tags/quantization/</link>
    <description>Recent content in Quantization on Florian Mattana</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sun, 05 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://florianmattana.com/tags/quantization/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>I Wrote an MXFP4 Quantization Kernel and Ranked #1 on Tensara</title>
      <link>https://florianmattana.com/posts/mxfp4_article/</link>
      <pubDate>Sun, 05 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://florianmattana.com/posts/mxfp4_article/</guid>
      <description>&lt;h2 id=&#34;why-i-did-this&#34;&gt;Why I Did This&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m building an FP4 fused attention kernel for consumer Blackwell GPUs (SM120). That means I spend my days thinking about how to squeeze 32-bit numbers into 4 bits without losing too much information.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://tensara.org&#34;&gt;Tensara&lt;/a&gt; is a platform where you submit GPU kernels and compete on real hardware. They had an MXFP4 quantization problem with almost no submissions. I figured: I already know this format inside out on SM120, how hard can it be to write a standalone quantization kernel?&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
