<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Posts on Matthew Penney</title><link>https://matthewpenney.net/posts/</link><description>Recent content in Posts on Matthew Penney</description><generator>Hugo -- gohugo.io</generator><language>en-gb</language><lastBuildDate>Sat, 09 May 2026 17:50:18 +0100</lastBuildDate><atom:link href="https://matthewpenney.net/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Reverse Engineering my CPU's Cache Sizes</title><link>https://matthewpenney.net/posts/cache_size/</link><pubDate>Sat, 09 May 2026 17:50:18 +0100</pubDate><guid>https://matthewpenney.net/posts/cache_size/</guid><description>&lt;p&gt;In this investigation, I aim to estimate the sizes of my CPU&amp;rsquo;s L1D and LLC
caches.&lt;/p&gt;
&lt;h2 id="general-approach-and-hypothesis"&gt;General Approach and Hypothesis&lt;/h2&gt;
&lt;p&gt;The approach I took was to process arrays of varying length and measure median
cache miss rates for each array size.&lt;/p&gt;
&lt;p&gt;Once the array exceeds the capacity of the CPU&amp;rsquo;s cache, I expect the cache
misses to sharply increase since the array no longer fits entirely in the
cache, and some reads will need to go to RAM or the next-level cache.&lt;/p&gt;</description><content type="html"><![CDATA[<p>In this investigation, I aim to estimate the sizes of my CPU&rsquo;s L1D and LLC
caches.</p>
<h2 id="general-approach-and-hypothesis">General Approach and Hypothesis</h2>
<p>The approach I took was to process arrays of varying length and measure median
cache miss rates for each array size.</p>
<p>Once the array exceeds the capacity of the CPU&rsquo;s cache, I expect the cache
misses to sharply increase since the array no longer fits entirely in the
cache, and some reads will need to go to RAM or the next-level cache.</p>
<p>Thus, by plotting cache the cache miss rate against array size, I should be
able to estimate the size of my cache.</p>
<h2 id="methods">Methods</h2>
<h3 id="workload">Workload</h3>
<p>I use the <code>STRIDED_ARRAY</code> workload for this investigation.</p>
<p>With this workload I can &ldquo;process&rdquo; an array, setting the number of elements
at runtime with the <code>array-elements</code> workload param.</p>
<p>I keep the array stride (<code>stride-bytes</code> param) constant at the default value of
64 Bytes, resulting in arrays of (64 * <code>array-elements</code>) Bytes.</p>
<p><em>We want to have one element per cache line (64 Bytes) because it is cache
lines that are cached, rather than individual Bytes.</em></p>
<p>During the processing of the array, the workload uses a random-access pattern
to access array elements instead of processing them sequentially.
This defeats the prefetcher, which would otherwise reduce the clarity of the
results.</p>
<h3 id="metric-groups">Metric Groups</h3>
<p>I use the following metric groups to record cache miss rates:</p>
<ul>
<li><code>L1D_READS</code> for L1 data cache</li>
<li><code>LLC_READS</code> for L3 cache</li>
</ul>
<h3 id="warmup-runs">Warmup Runs</h3>
<p>For each array size, I perform several warmup runs (e.g. <code>-u 5</code>) to initialise
the caches.
This avoids the first few recorded runs of the batch having elevated miss rates
due to cold caches.</p>
<h3 id="final-cyclops-commands">Final Cyclops Commands</h3>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>./cyclops <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -u <span style="color:#ae81ff">5</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -r <span style="color:#ae81ff">5</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -w STRIDED_ARRAY <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -m L1D_READS <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -s array-elements<span style="color:#f92672">=</span>100:50000:100
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>./cyclops <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -u <span style="color:#ae81ff">5</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -r <span style="color:#ae81ff">5</span> <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -w STRIDED_ARRAY <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -m LLC_READS <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span>    -s array-elements<span style="color:#f92672">=</span>50000:1000000:10000
</span></span></code></pre></div><h2 id="results">Results</h2>
<p><img src="/img/estimate_cache_size.png" alt="L1D and LLC miss rate curves"></p>
<p>The figure above does show clear increases in cache miss rates:</p>
<ul>
<li>between 2*10^4 and 4*10^4 Bytes for L1D</li>
<li>between 2*10^6 and 4*10^6 Bytes for LLC (L3)</li>
</ul>
<p>These ranges align with the actual cache sizes for my CPU:</p>
<ul>
<li><strong>L1D:</strong> 32KiB per physical core</li>
<li><strong>L3:</strong> 3MiB</li>
</ul>
<p><em><strong>SUCCESS!</strong></em></p>
<h2 id="questions">Questions</h2>
<p>Here are some questions and ideas for future investigations I have:</p>
<ul>
<li>Why does the LLC curve spike at ~3MiB, then go down, then follow a smooth
curve upwards towards 100% misses?</li>
<li>For both curves, why is there a smaller spike right before the main spike?</li>
</ul>
]]></content></item></channel></rss>