<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Local Llms on ✰Vicki Boykis✰</title><link>https://vickiboykis.com/tags/local-llms/</link><description>Recent content in Local Llms on ✰Vicki Boykis✰</description><generator>Hugo</generator><language>en-US</language><copyright>Copyright © 2026, Vicki Boykis.</copyright><lastBuildDate>Mon, 15 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://vickiboykis.com/tags/local-llms/index.xml" rel="self" type="application/rss+xml"/><item><title>Running local models is good now</title><link>https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/</link><pubDate>Mon, 15 Jun 2026 08:00:00 -0500</pubDate><guid>https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/</guid><description>&lt;p>I&amp;rsquo;ve been working &lt;a href="https://vickiboykis.com/2024/02/28/gguf-the-long-way-around/">with local models&lt;/a> since they came out, and finally, they&amp;rsquo;re surprisingly good now.&lt;/p>
&lt;p>I have a 2022 M2 Mac with 64 GB RAM and 1TB storage and I&amp;rsquo;ve used&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://mistral.ai/news/announcing-mistral-7b/">Mistral 7B&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deepmind.google/models/gemma/gemma-3/">Gemma 3&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://huggingface.co/openai/gpt-oss-20b">OpenAI OSS-20B&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://huggingface.co/Qwen/Qwen3-30B-A3B">Qwen 3 MOE&lt;/a>, as well as a number of other Qwen variants like &lt;a href="https://ollama.com/library/qwen2.5-coder">Qwen 2.5 Coder&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>across &lt;a href="https://vickiboykis.com/2026/05/18/tagging-my-blog-posts-with-bertopic-and-llms/">a lot of different system setups&lt;/a> like&lt;/p>
&lt;ul>
&lt;li>raw llama.cpp with &lt;a href="https://github.com/open-webui/open-webui">Open WebUI&lt;/a>&lt;/li>
&lt;li>llama-cpp-python&lt;/li>
&lt;li>Ollama&lt;/li>
&lt;li>llamafiles and&lt;/li>
&lt;li>LM Studio&lt;/li>
&lt;/ul>
&lt;h2 id="where-are-local-models-now">Where are local models now?&lt;/h2>
&lt;p>Early on, models were slow, hard to use, and just not that accurate for most programming tasks. The idea that local models were severely lagging behind was largely true until, for me, the release of GPT-OSS. I have no concrete scientific evidence of this - my own personal vibe metric of &amp;ldquo;is a model good enough&amp;rdquo; is, &amp;ldquo;do I have to double-check it against an API model&amp;rdquo;, and GPT-OSS was the first one where I started doing that a lot less often.&lt;/p>
&lt;p>As a result, I&amp;rsquo;ve mostly been using local models as fast, personalized Google for development questions that don&amp;rsquo;t require recency.&lt;/p>
&lt;p>But with the most recent releases from Google in the &lt;a href="https://deepmind.google/models/gemma/gemma-4/">Gemma 4&lt;/a>, family, I&amp;rsquo;ve finally been able to do agentic coding locally and have loops work at about ~75% the accuracy/speed of frontier models, which is incredible.&lt;/p>
&lt;p>I&amp;rsquo;ve so far been using &lt;code>gemma-4-26b-a4b&lt;/code> &lt;a href="https://lmstudio.ai/models/google/gemma-4-26b-a4b">LM Studio implementation&lt;/a> as my default local model. I&amp;rsquo;ve used the local setup so far to: Refactor a Python script that was a notebook into a repo of 5-6 modules, &lt;a href="https://peps.python.org/pep-0585/">lint that module&lt;/a> to use correct type hints for generics (most frontier models now do this automatically, but not always).&lt;/p>
&lt;figure>&lt;img src="https://vickiboykis.com/images/refactor_gemma.png" width="600">
&lt;/figure>

&lt;p>I&amp;rsquo;ve also used it to proofread some blog posts, write unit tests, and to bootstrap a repo that stands up a two-tower model for recommendations just to see what the agent would do with a blank slate. Here&amp;rsquo;s what it generated, which was pretty basic but still beyond the scope of anything I would have thought possible last year:
&lt;figure>&lt;img src="https://vickiboykis.com/images/twotower.png" width="600">
&lt;/figure>
&lt;/p>
&lt;figure>&lt;img src="https://vickiboykis.com/images/twotower2.png" width="600">
&lt;/figure>

&lt;p>Note that the environment is restricted because I run all my agentic workflows in a Docker container with limited access to execution.&lt;/p>
&lt;p>I&amp;rsquo;m also building an app that surfaces trending topics from Arxiv papers. Out of curiosity, I had Pi go through my past LM Studio session logs and figure out what I was using LM Studio for:&lt;/p>
&lt;figure>&lt;img src="https://vickiboykis.com/images/summarize.png" width="600">
&lt;/figure>

&lt;figure>&lt;img src="https://vickiboykis.com/images/lmstudio1.png" width="600">
&lt;/figure>

&lt;p>Unsurprisingly, since I&amp;rsquo;ve &lt;a href="https://vickiboykis.com/2026/04/20/build-yourself-flowers/">been working on Rijksearch,&lt;/a>&lt;/p>
&lt;figure>&lt;img src="https://vickiboykis.com/images/lmstudio.png" width="600">
&lt;/figure>

&lt;p>None of these are groundbreaking tasks (again, a lot of personalized Google/docs lookups), and working on them does give my GPUs and RAM a workout and the K-V cache grows to 64 GB RAM.&lt;/p>
&lt;figure>&lt;img src="https://vickiboykis.com/images/gemma_gpu.png" width="600">
&lt;/figure>

&lt;p>But, the larger story for me is that these kinds of tasks, even as simple as they are, used to be impossible for local models as recently as 6 months ago.&lt;/p>
&lt;p>&lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4/">&lt;code>Gemma-4-12b-qat&lt;/code>&lt;/a> just came out but I&amp;rsquo;ve already also really been impressed with its performance relative to its size. The model architecture itself is &lt;a href="https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4-12b">really interesting&lt;/a> and proposes a bunch of interesting questions like, &amp;ldquo;if we are constrained by performance and price, what architectural tradeoffs do we need to make?&amp;rdquo; a question that so far has not really been asked in the mad token gold rush.&lt;/p>
&lt;h2 id="running-agentic-models-locally-today">Running agentic models locally today&lt;/h2>
&lt;p>But don&amp;rsquo;t take my word for any of this, try it out for yourself! You&amp;rsquo;ll need a local model inference engine, an agentic harness, and the local model artifact if you want to try to run local agentic flows. You&amp;rsquo;ll need to set up the harness to point at your local inference endpoint, the &lt;a href="https://vickiboykis.com/2024/02/28/gguf-the-long-way-around/">downloaded model artifact&lt;/a> served via the inference engine.&lt;/p>
&lt;p>For my local setup, I&amp;rsquo;m currently using &lt;a href="https://pi.dev/">Pi&lt;/a> as the agent harness and &lt;a href="https://lmstudio.ai/">LM Studio&lt;/a> as the inference server, although it would likely be faster if I just used llama.cpp directly - a potential direction for a future experiment.&lt;/p>
&lt;p>&lt;a href="https://patloeber.com/gemma-4-pi-agent/">This post was very easy to follow&lt;/a> to set up agentic coding with Pi and LM Studio, although I did make a few tweaks to the post&amp;rsquo;s setup.&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Model:&lt;/strong> The post recommends &lt;code>Gemma 26B A4B&lt;/code> , but &lt;code>gemma-4-12b-qat&lt;/code> is more recent and smaller and faster, without much sacrifice in accuracy.&lt;/li>
&lt;li>&lt;strong>Security:&lt;/strong> I run every Pi session in a Docker container and give it permissions only to bash so that it can&amp;rsquo;t run Python code or do web browsing, although I do plan to allow curl in a different image for some research work I&amp;rsquo;m doing.&lt;/li>
&lt;li>&lt;strong>Agent Harness Config:&lt;/strong> Since I run everything in Docker, I edited Pi&amp;rsquo;s &lt;code>models.json&lt;/code> in order to get Pi to talk to the model.&lt;/li>
&lt;/ol>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sh" data-lang="sh">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74">&amp;#34;lmstudio&amp;#34;&lt;/span>: &lt;span style="color:#f92672">{&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;baseUrl&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;http://host.docker.internal:1234/v1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;api&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;openai-completions&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;apiKey&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;not-needed&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;models&amp;#34;&lt;/span>: &lt;span style="color:#f92672">[&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">{&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;id&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;google/gemma-4-12b-qat&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;input&amp;#34;&lt;/span>: &lt;span style="color:#f92672">[&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;text&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;image&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">}&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here&amp;rsquo;s my Docker Compose config:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sh" data-lang="sh">&lt;span style="display:flex;">&lt;span>services:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pi:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> build:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> context: .
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dockerfile: Dockerfile
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> image: pi-agent:0.74.0
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> init: true
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stdin_open: true
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tty: true
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_hosts:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;host.docker.internal:host-gateway&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> environment:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ANTHROPIC_API_KEY: &lt;span style="color:#e6db74">${&lt;/span>ANTHROPIC_API_KEY&lt;span style="color:#66d9ef">:-&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> OPENAI_API_KEY: &lt;span style="color:#e6db74">${&lt;/span>OPENAI_API_KEY&lt;span style="color:#66d9ef">:-&lt;/span>not-needed&lt;span style="color:#e6db74">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> GEMINI_API_KEY: &lt;span style="color:#e6db74">${&lt;/span>GEMINI_API_KEY&lt;span style="color:#66d9ef">:-&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> OPENAI_API_BASE: &lt;span style="color:#e6db74">${&lt;/span>OPENAI_API_BASE&lt;span style="color:#66d9ef">:-&lt;/span>http://host.docker.internal:1234/v1&lt;span style="color:#e6db74">}&lt;/span> &lt;span style="color:#75715e"># note that you&amp;#39;ll need to specify a base if you also use OpenAI to access OpenAI&amp;#39;s actual completions endpoint&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> WHATEVER_API_KEY: &lt;span style="color:#e6db74">${&lt;/span>WHATEVER_API_KEY&lt;span style="color:#66d9ef">:-&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> volumes:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">${&lt;/span>HOME&lt;span style="color:#e6db74">}&lt;/span>/.pi/agent/models.json:/config/models.json
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">${&lt;/span>WORKSPACE&lt;span style="color:#66d9ef">:-&lt;/span>.&lt;span style="color:#e6db74">}&lt;/span>:/workspace
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - pi-config:/config
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - pi-sessions:/sessions
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> working_dir: /workspace
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>volumes:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pi-config:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pi-sessions:
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>and here&amp;rsquo;s the bash script that runs &lt;code>pi&lt;/code> .&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sh" data-lang="sh">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">#!/usr/bin/env bash
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Pi — Start the containerized Pi agent.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Directory containing this script and the compose files.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>SCRIPT_DIR&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#66d9ef">$(&lt;/span>cd -- &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#66d9ef">$(&lt;/span>dirname &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>BASH_SOURCE[0]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#66d9ef">)&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span> &lt;span style="color:#f92672">&amp;amp;&amp;amp;&lt;/span> pwd&lt;span style="color:#66d9ef">)&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Workspace to mount into the container. &lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>WORKSPACE_DIR&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>WORKSPACE&lt;span style="color:#66d9ef">:-$(&lt;/span>pwd&lt;span style="color:#66d9ef">)&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">case&lt;/span> &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$WORKSPACE_DIR&lt;span style="color:#e6db74">&amp;#34;&lt;/span> in
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> /*&lt;span style="color:#f92672">)&lt;/span> ;; 
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *&lt;span style="color:#f92672">)&lt;/span> WORKSPACE_DIR&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#66d9ef">$(&lt;/span>cd -- &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$WORKSPACE_DIR&lt;span style="color:#e6db74">&amp;#34;&lt;/span> &lt;span style="color:#f92672">&amp;amp;&amp;amp;&lt;/span> pwd&lt;span style="color:#66d9ef">)&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span> ;; 
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">esac&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>export WORKSPACE&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>$WORKSPACE_DIR&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>sandbox&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>PI_SANDBOX&lt;span style="color:#66d9ef">:-&lt;/span>0&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>pi_args&lt;span style="color:#f92672">=()&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">while&lt;/span> &lt;span style="color:#f92672">((&lt;/span>$#&lt;span style="color:#f92672">))&lt;/span>; &lt;span style="color:#66d9ef">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">case&lt;/span> &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$1&lt;span style="color:#e6db74">&amp;#34;&lt;/span> in
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> --sandbox&lt;span style="color:#f92672">)&lt;/span> sandbox&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span> ;;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> --no-sandbox&lt;span style="color:#f92672">)&lt;/span> sandbox&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">0&lt;/span> ;;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *&lt;span style="color:#f92672">)&lt;/span> pi_args&lt;span style="color:#f92672">+=(&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>$1&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#f92672">)&lt;/span> ;;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">esac&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> shift
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">done&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>compose_files&lt;span style="color:#f92672">=(&lt;/span> -f &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$SCRIPT_DIR&lt;span style="color:#e6db74">/docker-compose.yml&amp;#34;&lt;/span> &lt;span style="color:#f92672">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#f92672">[[&lt;/span> &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$sandbox&lt;span style="color:#e6db74">&amp;#34;&lt;/span> &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#e6db74">&amp;#34;1&amp;#34;&lt;/span> &lt;span style="color:#f92672">]]&lt;/span>; &lt;span style="color:#66d9ef">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># an even more secure sandbox&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> compose_files&lt;span style="color:#f92672">+=(&lt;/span> -f &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$SCRIPT_DIR&lt;span style="color:#e6db74">/docker-compose.sandbox.yml&amp;#34;&lt;/span> &lt;span style="color:#f92672">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Derive a container name from the workspace directory&amp;#39;s basename.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Sanitize to characters Docker accepts: [a-zA-Z0-9][a-zA-Z0-9_.-]*&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>repo_slug&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#66d9ef">$(&lt;/span>basename -- &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$WORKSPACE_DIR&lt;span style="color:#e6db74">&amp;#34;&lt;/span> | tr -c &lt;span style="color:#e6db74">&amp;#39;a-zA-Z0-9_.-&amp;#39;&lt;/span> &lt;span style="color:#e6db74">&amp;#39;-&amp;#39;&lt;/span> | sed &lt;span style="color:#e6db74">&amp;#39;s/^-*//&amp;#39;&lt;/span>&lt;span style="color:#66d9ef">)&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">[[&lt;/span> -z &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$repo_slug&lt;span style="color:#e6db74">&amp;#34;&lt;/span> &lt;span style="color:#f92672">]]&lt;/span> &lt;span style="color:#f92672">&amp;amp;&amp;amp;&lt;/span> repo_slug&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;workspace&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>container_name&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;pi-&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>repo_slug&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">-&lt;/span>$$&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>api_key_args&lt;span style="color:#f92672">=(&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> -e OPENAI_API_KEY
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> -e DEEPSEEK_API_KEY
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> -e ANTHROPIC_API_KEY
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> -e GEMINI_API_KEY
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cmd&lt;span style="color:#f92672">=(&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> docker compose
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> --project-directory &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$SCRIPT_DIR&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>compose_files[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run --rm
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> --name &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$container_name&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>api_key_args[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pi
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#f92672">((&lt;/span>&lt;span style="color:#e6db74">${#&lt;/span>pi_args[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#f92672">))&lt;/span>; &lt;span style="color:#66d9ef">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cmd&lt;span style="color:#f92672">+=(&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>pi_args[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#f92672">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>exec &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>cmd[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>I build the Docker container and make changes to the files in its own repo. Then, I run Pi in the repo I&amp;rsquo;m working in, which spins up Docker so that Pi can&amp;rsquo;t wipe files or directories by acting on my physical hard drive. This also enables Pi running in the container to see my custom model &lt;code>json&lt;/code> config by shipping it into the container. All of this has been working fairly well for my experiments.&lt;/p>
&lt;p>There are still issues with local models: inference can be slow, context windows are small and limited to your own hardware, and the ecosystem, although it&amp;rsquo;s made a ton easier by tooling like LM Studio and HuggingFace&amp;rsquo;s &lt;a href="https://huggingface.co/blog/yagilb/lms-hf">Use This Model button&lt;/a>. Early releases suffer from &lt;a href="https://docs.langchain.com/langsmith/prompt-template-format">prompt template mismatches&lt;/a>. But, these are usually patched extremely quickly. Needless to say, I&amp;rsquo;m not sure this is ready for production software development quite yet.&lt;/p>
&lt;p>The benefits, though, are numerous and the ecosystem critical to invest in, particularly now. One of the very cool parts of local models is you can introspect almost everything, like watching the token inference process live,&lt;/p>
&lt;figure>&lt;img src="https://vickiboykis.com/images/logs.png" width="600">
&lt;/figure>

&lt;p>and watching tokens in/out.&lt;/p>
&lt;figure>&lt;img src="https://vickiboykis.com/images/lm_tokens.png" width="600">
&lt;/figure>

&lt;p>You can do things like change the local context window and watch performance improve or degrade, and really dig into how your tokens are processed on the GPU. You can change the system prompt, the quantizations. You can pit models against each other. You can also change and introspect the harness side.&lt;/p>
&lt;p>The possibilities are endless, and the tools only keep getting better.&lt;/p></description></item></channel></rss>