Walking around the app

Sep 9 2025

There is a very vigorous debate happening online right now around what shape evaluation for LLM-based products should take. I don’t want to rehash all of it, other than saying that if you are building any applications with with non-deterministic components (most AI-powered apps), or applications that are data-intensive (also most AI powered apps!), and you are serious about these features in your application, you should at a very minimum be testing your online outputs in production on a daily basis. If offline testing (testing your given model outputs internally before it goes live in production) is a luxury you can also afford based on your engineering capacity and tooling maturity, even better!

More importantly: if you are building an app, you need to also constantly be testing and touching different parts of it to see if the app flow makes sense. Look at the UI dropdowns, try the search bar. Toggle the toggles. How long does it take to load a result? What do the results look like? What colors are the buttons? How do they work on different devices? At different internet speeds? What about your payment gateway? How does onboarding look for a brand new user in North America? In New Zealand? Could you onboard from a fresh device in under three minutes? Do images load from your CDN?

I’ve heard this kind of testing called different things in different product cultures, but the phrase that stuck with me the most was when Dan called it “walking around the app”. Walking around the app is exactly what it sounds like - checking out the sidewalks, picking up idle pieces of trash, letting the store across the street know their street lamp is out. You could call this QA, although it’s more than that. QA is scoped transactionally and takes place at the end of a PR. Walking around the app is a mindset, a process something you do every day, broadly, habitually, passionately, with interest and curiosity, because the app is the neighborhood you’re building for the people you’re building for.

Because what happens if you don’t walk around the app regularly is that you start to get broken windows. The broken window theory posits that any given signs of decay in a town make it seem permissive to create more: littering begets more littering. People understand the impact of this: For example, in real life, in the Italian town of Brescia, there is an anonymous guy who goes around at night painting over malignant graffiti.

In the applications we build, builders are both the graffiti creators and the graffiti erasers, and we have the responsibility to make our applications as habitable as possible for both our users and ourselves. What does it mean to clean up broken windows for users? Buttons that go nowhere, results pages that don’t render, links that are missing results, misspellings, cache misses, tiemouts, misaligned CSS. Deprecated routes. Latency.

How about for ourselves? Apps are easier to walk around if they are consumer-facing apps that have front-ends, but APIs and backends can be walked around, too. Naming inconsistencies. Broken local builds. A nested class inheritance hierarchy that’s multiple levels deep is a broken window because it’s hard to trace in our mind - humans can only really keep 2+5 things in memory. A hard-coded environment variable that creates extra mental load. A method that relies on an untestable internal method that results in having to use a mock. A build process so long it causes you to lose interest and tab away to Reddit. Anything that creates cognitive overhead, that Grug doesn’t like, can be a broken window.

There is not one thing that, once fixed, makes the app work well, but it becomes intuitively obvious to end-users what a well-loved app feels like, because it’s constantly being walked around and being fixed in a million small ways, every day.

We are mere mortals and will always create bugs, especially when it comes to data work. Walking the app will not prevent this - to write code is to generate bugs and the best code is the code never written.

But, if we are walking around our app every day, we’ll catch at least as many things as we can create. And if we have that in place, evaluation of any kind becomes infinitely easier to establish as a habit, too.

#development #apps #engineering #product culture