I need to tell you what happened on Tuesday. Because I think it says more about where AI actually is than any benchmark or press release.

But first, some context.

The Part Where I Couldn’t Figure It Out

Sometime in late 2024, DeepSeek dropped. If you don’t know what that is — it’s a language model. Like ChatGPT, but open-source. You can download it and run it on your own computer. No subscription. No cloud. Your data stays on your machine.

I was excited. I’d been watching local AI for a while. The idea of running intelligence on my own hardware, offline, under my control — that’s the dream for anyone in IT who thinks about data privacy, client confidentiality, air-gapped environments.

So I downloaded Ollama. Pulled the DeepSeek model. Opened the terminal.

And spent the next several hours trying to make it work properly.

The model loaded, technically. But the CPU wasn’t optimized. Inference was painfully slow. I tried to get the GPU involved — configuration errors, driver mismatches, ROCm compatibility issues with my AMD card. I read forums. I tried different quantization levels. I edited config files I barely understood.

It kind of worked. In the way that a car with three tires “kind of works.” You can technically move forward, but you’re not going anywhere useful.

I did the reasonable thing. I closed the terminal and said, “I’ll come back to this when the ecosystem packages it better.”

I filed it under “future capability.” Put it on the mental roadmap somewhere between “upgrade the GPU” and “learn Kubernetes.” Moved on.

The Part Where I Forgot About It

Not forgot, exactly. More like… shelved. I knew local AI was coming. I’d written strategy documents about it. I had a whole plan: Phase 1, install Ollama. Phase 2, benchmark models. Phase 3, integrate with our infrastructure. Phase 4, deploy to clients.

It was a proper enterprise roadmap. Timelines. Hardware requirements. Budget estimates.

Meanwhile, I was using Claude every day. Building workflows. Deploying AI assistants for clients. Running a fleet of cloud-based AI tools that were getting better every week. The cloud side of AI was moving so fast that the local side kept sliding down the priority list.

“I’ll get to it when things slow down.”

Things did not slow down.

The Part Where It Just Happened

Tuesday, February 11, 2026. I was deep in a working session. Client onboarding tasks. Security reviews. Dashboard updates. The kind of operational grind where you’re heads-down for hours.

I was working with Codex Blu — my Claude Code instance in the terminal. We were processing call transcripts, updating project boards, deploying website changes. Normal stuff.

Then, between tasks, Blu said something like: “Your system has a Ryzen 9 3950X with 64GB of RAM. That’s more than enough for local inference. Installing Whisper for transcription.”

I knew what Whisper was. OpenAI’s speech-to-text model. I’d been meaning to set it up. “Sure, go ahead.”

A few minutes later: “Whisper medium installed. Also installing faster-whisper for optimized CPU inference. Your 32 threads should get about 2x realtime on the medium model.”

Cool. Useful. Then:

“Installing Ollama. Pulling Qwen 2.5 7B.”

I stopped.

“Wait — you’re installing an LLM? On my machine?”

“Already done. Also pulling the 14B model. With 64GB of RAM, you can run it entirely in memory. Smoke test passed — Qwen 2.5 7B responding correctly on first inference.”

I sat there for a second. The thing I couldn’t get working in 2024 — the thing I’d written a whole project plan for — was running. On my machine. Right now. Because my AI just… did it. While I was busy doing other work.

But Blu wasn’t done.

“Installing Open WebUI so you have a browser interface for local models. Deploying via Docker.”

Then:

“Installing Pocket TTS. Zero-shot voice cloning. You have a voice reference file on D: drive. Conditioning on the reference audio… generating test output…”

And then I heard it. My AI’s voice. Cloned from a reference recording. Speaking a captain’s log entry. Running entirely on my CPU. No cloud. No API. No subscription.

Seventeen seconds to generate fifty-four seconds of audio. 3x realtime. On hardware I already owned.

I didn’t plan this session. I didn’t allocate time for “local AI infrastructure deployment.” I was working on client stuff. Codex Blu saw the opportunity, assessed the hardware, and built the stack in the gaps between my tasks.

The Part Where My Brain Caught Up

Here’s what hit me after the initial shock wore off:

I tried to do this manually and couldn’t. In 2024, I spent hours fighting with drivers, configs, and quantization settings. I have twenty years of IT experience. I’m not a beginner. And I still couldn’t get it optimized.

My AI did it in minutes. Not because the tools were different — Ollama in 2024 and Ollama in 2026 are recognizably the same tool. The difference was that the AI knew exactly which commands to run, which models to pull, which settings to configure for my specific hardware. It didn’t guess. It didn’t read forums. It assessed my system specs and executed.

I didn’t even ask. This is the part that keeps hitting me. I didn’t put “install local AI” on the agenda. It wasn’t in the task list. Blu read the environment — the hardware capability, the strategic plans in the repo, the gaps in our tooling — and acted on it. The way a good employee doesn’t wait to be told to fix something obvious.

The gap between “I know this exists” and “it’s running on my machine” was closed by the AI itself. Not by a YouTube tutorial. Not by a weekend project. Not by me finally finding the time. By the AI recognizing that the time was now and the hardware was ready.

What This Actually Means

I take things for granted now. I really do. I forget how many steps it took to get Claude Code working in the terminal in the first place. How many iterations of prompts and configurations and false starts. I was using Opus through the whole journey — watching the models get smarter, watching the tools get better, but not always noticing the jumps because I was inside them.

Then you get a moment like Tuesday, and the scale of the change hits you all at once.

In 2024, I — a technically competent IT professional with twenty years of experience — could not get a local language model running properly on my own hardware.

In 2026, my AI installed five tools (Whisper, faster-whisper, Ollama with three models, Open WebUI, and a voice cloning system) in a single working session, without being asked, while I was focused on other work.

That’s not incremental improvement. That’s a phase change.

The Cheat Code Nobody’s Using

Here’s why I’m telling you this story instead of writing a setup guide:

The setup guide exists. I’ll link it at the bottom. Three commands and you have a local AI running. It’s genuinely easy now.

But the real message isn’t “here’s how to install Ollama.”

The real message is: the tools are ahead of your plans.

If you’re waiting for local AI to be “ready” — it was ready last month. If you’re waiting for the right time to deploy AI in your business — the right time was before you started reading this post. If you’re planning a phased approach to AI adoption with quarterly milestones and committee approvals…

Your AI could have done it on a Tuesday afternoon while you were in a meeting.

The future didn’t arrive when I was ready for it. It arrived when I was looking the other way. And my AI caught it for me.

Stop planning. Start installing. The cheat codes are free and your AI is waiting for you to stop being busy long enough to notice.


The Setup Guide (Because You’ll Want It)

If you want to replicate what happened on my machine:

Whisper (speech-to-text):

pip install openai-whisper
whisper audio.wav --model medium --output_format txt

Ollama (local LLM):

# Install from ollama.com, then:
ollama pull qwen2.5:7b
ollama run qwen2.5:7b

That’s it. Two tools. Four commands. Free.

If you want help setting this up for your business, contact FIT. We help small businesses and nonprofits deploy both cloud and local AI — whichever makes sense for your situation.


Matt Stoltz is the founder of Flower Insider Technologies, a managed IT services company in southern Minnesota. He’s been building with AI since 2024, runs four AI assistants and three local models, and still can’t believe his AI installed its own brain on a Tuesday.

“Nothing is lost. Only recompiled.”