Build yourself flowers
I first wrote a draft of this talk by hand. This part took 2 months.
I then recorded myself giving a version of this talk with MacWhisper, and transcribed it with Whisper locally. This part took 45 minutes (the total time of my practice run.)
Then, I ran it through Gemini Flash 2.5 running in Pi to break into paragraphs. I also had Gemini break up my slide deck from a PDF I generated from Google Slides, into individual images to insert into the blog post and optimize the image format to webp for blog rendering. This part took about 10 minutes.
Then I went through the text paragraph by paragraph manually to make corrections, remove redundant phrasing and pauses, and added clarifications to make it more legible than a talk. This part took 3 hours.
Before all that, I generated the content for this talk. This part took 13 years.
I’m Vicki, and I build machine learning systems.

I debated for a long time how to introduce myself. Am I a data scientist? Am I still a machine learning engineer? Am I an AI engineer now? I’m not really sure. I think, like a lot of people over the past six months in the industry, I’ve been having existential angst. Like “what even am I? What am I doing” type angst. So I’ll go with it. I build machine learning systems.

I’ve built and broken systems at Tumblr, at Automatic, at Duo, at Mozilla.ai. Now, I build systems at Malachyte, working on realtime personalization and search in e-commerce.

As I’ve been doing machine learning, something I’ve been wondering is, what is the state of machine learning engineering as an industry today? The question I’ve had as I was thinking about how to introduce myself is this big existential question for a machine learning conference, which is, is it still worth doing machine learning engineering?

We’ve had a lot of different conversations in all of the corners of the internet about how we should incorporate LLMs into machine learning workflows. And the question that came to me is, if all we’re doing is generative AI, where does traditional machine learning even fit in here? Are we still doing machine learning at all?
And the second question that came to me was, not only is it still worth doing ML, but, in an era where we’re generating a lot of code, is it still worth doing machine learning well?

It seems like the modus operandi is mainly speed as opposed to code that is good, quality code. The most important thing is for us to ship quickly, to get to a prototype quickly.
So hopefully we’ll be able to definitively solve all of my existential questions in 45 minutes, but until then, I wanted to take a step back and talk about flowers.

In 2024, the lovely folks at PyData Amsterdam, invited me to give a keynote there. I called it Build and Keep Your Context Window. ChatGPT was still fairly new, out for just about a year and a half, and everyone was freaking out. I talked about how important it is to build and keep your own context window - the software development skills you’ve developed historically, so that we can use the new tools and understand where they fit into the context of computer development.

The analogy I used was the ChatGPT UI. At the time, everybody was using the text interface. So you have a sidebar and you have the big text box. And in the sidebar live all your chats as sessions, and the sessions had titles. And I talked about that as being part of a context window. The context window is the amount of text that a model can recall at any given time, as measured in tokens.

If the context window is too long, the model loses the thread of the conversation and can’t reply coherently anymore. And so what determines the context window in reply is the attention mechanism in the Transformers model. And the attention mechanism is really a cache, which is really a hash map. So if you understand all these four things and how they fit together, it’s not perhaps as surprising that we get the emergence of LLMs.
Before I gave this talk, I got to spend a few days in Amsterdam (I was told they were the only two sunny days in the Netherlands in September) and particularly in the Rijksmuseum, which is the national Dutch museum of art.

It’s absolutely enormous, you could spend days there. The headline piece in the Rijksmuseum is the Night Watch by Rembrandt. When I went to see it, it was covered by this big plastic window and there was machinery set up all around it. I was thinking, what is this? Why is this blocking the painting?
When I went to PyData Amsterdam, Robert Erdmann, who was working at the Rijksmuseum, gave an incredible talk about how he was using machine learning and especially imaging, pixel detection, and high resolution photography to see details in art at the museum that nobody was able to see before.
So for example, for The Night Watch, they set up all this equipment to be able to photograph it many times in many resolutions. And as a result, they were able to get high resolution pictures like the figure one on the left, who is believed to be actually Rembrandt, the artist, in a cameo. And if you really zoom in, further, further, you can see in his eye there’s a little white sliver.

And so the question is, wow do you become good enough that you know if you put just a little bit of a white dot, it will render as an iris of an eye to somebody? And so those are the kinds of questions, around the mystery of mastery, that he was now able to explore as a result of this photography.
Rachel Ruysch
One of the artists whose paintings are also on display at the Rijksmuseum is Rachel Ruysch, one of the premier flower painters during the Dutch Golden Age. To be a premier painter of anything good during the Dutch Golden Age was a huge accomplishment, and that’s who I really want to talk about, because she’s someone that I’ve been using as a guide to navigate this crazy time where we’re doing a lot of code generation with LLMs.

So who was she? Rachel was a premier flower painter during the Dutch Golden Age and all she did was paint flowers. Lots and lots of different compositions of flowers in all sorts of shapes and sizes, starting from when she was 15, well into her 80s.

And why the Dutch Golden Age? What made this era of creativity possible? What made this era of creativity possible? In the late 1500s, the Dutch won independence from the Spanish government. The political stability was followed by economic stability and the rise of the Dutch empire. As Dutch life got significantly easier and more people joined the growing middle class, they wanted to buy art. Art, which had previously been mostly religious in theme, became secular, and people started painting scenes from real life, science, and nature. Art was seen as an important aspect of the Dutch national character. All of this resulted in hundreds of thousands of paintings, many of them scientific or representative of everyday life.
What people wanted to see was pictures of everyday life that were accurate, that were true to what was going on around them, of local scenes, of scientific discovery. Of themes of religion and mortality, but conveyed in a different way.
So they painted stuff like gentlemen smoking and playing backgammon in a tavern. They wanted to paint stuff like people skating in the winter.

They wanted to paint stuff like people getting rowdy in their house while the mistress was asleep.

And if you look at all these paintings, you can see something come through, which is that the colors are down to earth. They’re very normcore.
Another thing people wanted was a lot of different paintings of flowers, hundreds and thousands of paintings of flowers.

Out of the five to ten million paintings that we think were created during the Dutch Golden Age, thousands of them were flowers. Why flowers? People wanted flowers in their homes that wouldn’t wilt over the long Dutch winters, and that showcased how worldly and interesting the homeowners were. Exotic and unusual flowers in particular (such as tulips of the famous Dutch tulip mania) were prized, and flower painters sought to accommodate this trend. There were many very famous Dutch flower painters and hundreds to thousands more, all producing and focusing on flowers. For artists, painting flowers accurately and lifelike was an important test of mastery.
And so Rachel Ruysch came into her skills at a time when people demanded this. And so the question came about, if you have all these people painting flowers, why is it important to paint good flowers? Why do you want to stand out?

And the same question of, in a world where it’s easy and fast to write code, why is technical excellence still important?

And as I was researching this, I came across a report from NASA
NASA has been on my mind lately with Artemis II and their orbit of the moon and their triumphant return home. Because spaceflight is so hard and so enormously error-prone NASA has been continuously on a loop to improve quality, because it’s already very dangerous to be an astronaut at NASA. I’d seen somewhere that the Artemis crew knew they had a 1 in 30 chance of dying.
But NASA is always looking at risk, and they did an investigation in 2012 and came out with this report of what happens when things fail. Why do they fail? And they came up with these five problems. And the one that really struck out to me was the first one, shifting engineering excellence to insight and oversight.

So there were periods when NASA outsourced a lot of their engineering to subcontractors. And what happened was that the engineers who understood the design and the system design based on actual experience left. And the shift that resulted eliminated independent analysis and testing and understanding of the systems. And this was what led to degradation of engineering quality.

And if it sounds familiar, this is because it’s a very easy analogy for what happens when we outsource our work as engineers and data scientists and machine learning engineers entirely to AI.

So, striving for technical excellence in industry saves us from engineering, product, and cost problems. Quality is important.
The second reason is that software engineering is a craft, just like painting.

And it takes time to get good at painting. It takes time to understand where to put the little iris. It takes time to get good at painting flowers. And so it’s important to understand that because it takes time to get to quality.
And finally, technical excellence is important because striving for mastery is just about what it means to be human.

We like when people do things well. We appreciate people who are competent. We like it when systems are built to help us. It feels good for us to be part of systems that work.
And, one tangential idea that I’ve seen going around the internet is that you shouldn’t talk down to LLMs, or call them “stupid”. Not because they understand, they’re just models, but because it’s bad for us as humans. In that same way, it’s important for us to strive for good things as humans.

Okay, so if we want to paint good flowers, how do we become good at painting good flowers? What made Rachel technically excellent? What might we be able to replicate?
The first one is that she was in an artistic family. So she was in one of the premier families of Amsterdam where all her great uncles were painters. Her grandfather was involved in art. Her father was a great artist and botanist, Frederick Ruysch. You can see him here with this creepy skull. He collected a lot of really creepy dead things, which is where she learned about scientific accuracy, embalming, preserving flowers, etc. It was here that she learned discipline: how to capture minute details, She was known for her technical skill, and for gathering flowers in botanic gardens and pressing them so she could have different compositions for different seasons, surprising the Dutch, who were generally orderly and used to seeing only seasonal bouquets. During her life, her paintings sold for 3x Rembrandt’s.
And then when she was 15, she was apprenticed to William van Aelst, who was one of the premier painters in Amsterdam at the time. It was tradition at the time for masters to take on apprentices who would live with them and start with stuff like washing their paintbrushes and work all the way up to painting, and then finally produce a work of art which, if the master accepted, the apprentice could become a master in their own right. So she had a very big support system and she was steeped in a culture of mentorship being extremely important.

Second, she painted her whole life. She was keenly interested in botany, in science, and made sure that her art reflected the true appearance of nature. She spent a long time - over 60 years, honing and working on her career, not even slowing down to have ten children. For her, her career was a vocation. She would get inspiration from many different places, and she would do the same thing, again and again, but differently.

And finally, she herself remixed and experimented. So you can see in the background of this painting, there’s a cactus. And it’s not easy to get a cactus in Amsterdam in the summer or the winter. So she got this from the New World because she had connections to Amsterdam botanical societies and she knew people who were coming and going all the time. And so she could barter for and preserve exotic flowers. And she also remixed flowers from different seasons by drying them. She collected them all the time. She was always changing something about how she was working.

So, given all of this, how can we, as engineers, be like Rachel?

Building Rijksearch
So I thought about all this and decided to see how I could apply it to a concrete side project that I was working on. And the side project is called Rijksearch, a semantic search engine.

A semantic search engine searches by intent rather than keyword. So what you do is you type in “chill dude” and it’ll return you this portrait of, for example, a man with a dancing dog. Which I think is actually a really good example. Or like a confidential chat or man smoking pipe, which is a really chill dude. So it’s a search engine that works by intent.
Or, you type in “pensive” and it will return a philosopher in his study, which is definitely pensive, a boy with a golf club probably not, you don’t want toddlers to start thinking too hard about what they can do with a golf club. But like, ok, these results are not bad!

Or “flower party”, which didn’t work that well because you can see it’s doing more keyword matching for “flower” than semantic matching for “party”, but this is where the craft of tuning semantic search comes into play.

So there are already a lot of wonderful tools that the Rijksmuseum itself has that you can build with on public data and that they’ve built with like Art Explorer.
But I decided to build my own because I wanted to remix stuff. Rachel learned how to remix as an apprentice and from the artists she talked to. I learned it from working at Tumblr where the reblog feature was one of the original places on the social web that implemented the functionality of you posting something and somebody else riffing on top of it.

So what was I thinking about remixing? Oh, I had a lot of stuff going in my head. First, over the past couple years, I’ve been thinking about embeddings a lot. In fact, I’m thinking about embeddings right now during this talk.

In search and recommendations, embeddings, vector representations of text or images, matter a lot. For working in embedding systems, they also matter within LLMs themselves because positional embeddings are how we decide what concepts are related to each other when we train the models. And so I wrote a really long text about this called What Are Embeddings?
Second, I’ve also been working with Redis for a while now. It’s one of my favorite pieces of software in the world because it just works. It’s extremely cleanly designed. It’s really fast. It’s extremely reliable. And the way that it was thought about is just very elegantly put together. So I had worked with Redis for search before, and I wanted to see if I could use it again. So at the time, about a year back, Salvatore Sanfilippo, antirez, who initially wrote Redis, picked up working on it again and created a new data structure called vector sets, Vector sets in Redis work much like sorted sets, which are collections of unique, non-repeating strings sorted by a score. Only in the case of vector sets the string elements are associated with a vector. For anyone working with embeddings, this unlocks a huge amount of very fast, efficient vector operations, and they allow for vector similarity using HNSW, a popular and efficient approximate nearest neighbor search algorithm.
Third, I’ve also been hacking around AtProto, which is the protocol that powers BlueSky the social network that has become popular in the past couple years. It turns out that a lot of the Bluesky codebases are open, and when I started looking around, it turned out that a lot of Bluesky services were written in Go. (And TypeScript, but I scoped myself.)
Fourth, I’ve worked in Python (and Java and Scala, unfortunately), before, and I was curious to explore a new language that was meant for high concurrency and was the spiritual successor to Java. I did a couple projects with atproto where I created a feed of content in Go, so I wanted to see if I could do something else with Go on my own.
Of course, all the other stuff swirling around in my head was LLMs.

It’s impossible not to hear about them. I wish I could block the news out because it does feel like it distracts me enormously in my day-to-day life. What I’m particularly interested in with LLMs are, there’s like a whole world of stuff, but I’m interested in the power of small, local models. So when llama.cpp came out and you could use it in places like LM Studio to serve GGUF artifacts, I became interested in that.
I also became interested in hosted embedding APIs because before I’d been working with hosted models that you either train or pre-train models that you have to serve. And OpenAI made those accessible as API endpoints, and Gemini started serving new multimodal embedding models, so I became interested in those.

And then finally, the humanities. It’s something I’m always interested in. A lot of my projects revolve around the humanities. A lot of the stuff that I write ties back in some way to the humanities because I don’t think you can really have a balanced view of what developing a good product looks like if you’re just looking at tech. And, I like art. So I’m always down to look at art.

Build the same thing again and again
So all of this stuff percolated for a year or so in my mind, and the result was that, at some point, I realized I wanted to re-implement Viberary, which is a semantic search engine.

A quick note on search engines - they come in several different forms. There’s keyword search, which is if you type in Apple, you get back results for Apple varieties. And there’s semantic search, which is you type in Apple and you get back iPhone, and maybe also Android, depending on what you’re looking for, which is why the “chill dude” query works.

And there’s many different ways to implement semantic search. The way that I had implemented it before, one of the ways, was in Viberary.

Viberary was a side project that I’d done in 2023 that was based on text-only representation of books from Goodreads and all their reviews. And the idea was that you type in a phrase like sci-fi, funny sci-fi, or moody drama, and you get back a list of books that are recommended based on vibe rather than the books themselves, which is something that I always wanted.
I’m not always in the mood for a particular title, but I’m sometimes in the mood for a particular type of book. And so you can see for this one, funny Sci-Fi Space with Feeling, one of the results you get back is Asimov, which I guess, yes, kind of, and Haruki Murakami, which kind of also makes sense? So these vibes kind of fit.
What allows us to do this is that we’re not comparing just text or images. We project them into the embedding space by creating dense vectors out of them and comparing the vectors, the numerical representations.

That way, we can compare dog and bird and fly and see that bird and fly are closer to each other because they’re semantically similar. And this is also basically the way the transformer attention mechanism also works on a much, much, much larger scale.
It used to be the case that people only learned or fine-tune their own embeddings, but now all of the major AI providers have their own hosted sets of embeddings. Particularly common ones are OpenAI, with Cohere and Nomic close behind. Google also recently released embeddings for Gemini, with both API and local versions being available.
Two Tower Models
Building semantic search again involves implementing a two tower architecture, a really common paradigm in both search and recommendations. The two tower architecture and semantic search are for me what flowers were for Rachel.

There are generally 3 steps to creating a two-tower model. Embed the items once, in batch. Embed your query in realtime at inference. Compare the distance between the two. But there are just so many decisions to make in this process that only become apparent once you start looking at the constraints of the data and the engineering space you’re working on.
So when I search for “chill dude” it generates an embedding for that phrase and then we do a similarity computation, usually using cosine similarity or dot product, or there are other distance measures you can look at, that says “this embedding is most similar to the embeddings and then it surfaces those results just as we do here.
This is a very simple structure but it’s very elegant because it allows for fast indexing and fast lookup on a large, coarse dataset. When we do any kind of information retrieval whether it’s for recommendations or for search, our goal is to get from tens of millions of items the entire internet a whole catalog worth of 800, 000 paintings like the Rijksmuseum has down to tens that you can look at in a carousel or in a screen or in a list of results.
Two towers is one way that allows us to pre-compute the embedding for our entire catalog and then it allows us to filter them quickly using approximate nearest neighbor search, which is a fast algorithm which says this thing is closer to this group of things but not this other group of things.
That allows us to do a first pass from tens of millions to tens of thousands or hundreds, and then we can use more expensive models to rank the smaller result sets and really think about how we would present those last, smaller result sets, to the user.

Many large companies then expand on this architecture to create search and recommendations data sources for different product surfaces.
Here’s a couple examples. For example, the one on the left is YouTube’s recommender system where they take millions of videos that they could recommend you and narrow them down to a few hundred and then rank the ones that are actually on display.

The input features for their custom trained model include “videos you’ve watched before” and then “videos you’ve searched for”, “geography, " and then metadata about you. And the query is the video in question. And then the model similarity lookup generates a list of candidates to that video based on all this input data. Facebook Search also uses embeddings.
It contains the search text plus the query context, like geography or device. The document side includes names, titles, graph-derived signals like relationships to other people and relationships, geographic relationships, which surfaces results that you’re looking for in the search bar.
So, there are many versions of these systems. How does this look in practice for Rijksearch in particular? We have our input query, which comes from the front end as happy flowers. The embeddings are from the Gemini API. And then we also already have all our document metadata that we’ve embedded before. The indexing app is in Go, writing into Redis vector sets, and the serving web app is written in pure Go. Redis is served as a docker container that connects to the web app with everything served in docker-compose.
It’s not a particularly original idea: there are now a lot of semantic image/vibe search engines out there today. But it’s the one that I personally wanted to do because I’d developed a connection to the Rijksmuseum, and because the amount of data they generously make public made it lovely to work with.

Getting Data for Our Semantic Search System
Something that you’ll notice is missing here if you’ve worked in data before is the actual source data. Where do we actually get this image data? How do we collect it? And that’s its own hard problem that we’ll talk about in a minute.
But okay, so how do we build this, given that we’ve already built Viberary and other versions of this system before? The way that I thought about it is that every company gets about three innovation tokens. This is from an amazing and often-cited blog post by Dan McKinley, which says that every company gets three of these tokens.

You can spend those however you want, but then the supply is fixed. That’s also true for any project, whether it’s at work or a side project. For a specific project, a way to think about innovation tokens is that you can try new things in several dimensions:
- The domain area
- The language stack
- The app architecture
- How/where you deploy it
If you do three new things that you can’t do or don’t know how to do, you’re just going to get held up with your product. So where should we spend our innovation tokens?

We should spend them on what helps us grow, because humans want to reach mastery. So we should spend them on what helps us reach mastery.

Taking a look at the same steps for Viberary and Rijksearch, you can see what I implemented differently. So there’s a couple steps. First, we have to collect the data. Then we have to think about what embedding model we want to use and how we want to embed it. Then we need to index our embeddings. This is an important step. And then we need to serve our embeddings and create a web server so we can actually stand up the front end and return results.

So the first time I created Viberary by using text-only data from Goodreads, a locally-hosted embeddings model artifact, indexing using ft.search from Redis, and Python’s Flask to serve the app.
So I lied earlier about using 3 innovation tokens: I cheated a bit and used four here.
For Rijksearch data, I used multimodal text and image data because I was interested in creating text and image embeddings. This made processing the data trickier because I needed to associate a given set of text metadata to a link of an image, as well as process the images in an efficient way so they’d download quickly and non-disruptively.
For the actual embedding inference, I changed from a hosted model to an API, which was easier in some ways, but now required cost and latency considerations. Additionally, test and image processing together was tricker because you have to think about text and image aggregation together..
For indexing, I switched to vector sets, though since vector sets don’t require the construction of a search index and are a native structure. This makes it easier in that you don’t need to update fields, but vector sets also don’t support metadata updates in the same way that search indices do do because vector sets are schema-free (though you can add JSON through VSETATTR ), so there is a tradeoff.
For serving, I changed from Flask to Go, which was a pretty big change because I had to relearn web development in a new language. Go has a lot of tooling for web development out of the box, as well as support for concurrency, and for processing web data. This was probably the innovation token that took me the longest time to overcome.
And, interestingly throughout this process, I also tracked the steps where AI helped. The yellow parts are where I included AI or I had AI assistance.
I consider myself a fairly normcore AI user. I’ll usually wait a couple months for something to come on the market to try it. If something has been on the market for 6 months and has lasted, it probably makes sense to keep. And so in 2023, really what was on the market was MidJourney for generating some of my logos, ChatGPT for answering some questions, and then some of the open models. Mistral was making good open models at the time and I was using it locally via llama-cpp-python, though getting limited results beyond summarization and tagging.
When I did Rijksearch, I found several new use cases for AI. For example, I used Claude, as well as GPT-OSS 20B locally to help me figure out some of the paradigms for data collection from the Rijksmuseum, as well as write some quick data analysis scripts. I used Gemini APIs for embeddings and Perplexity to help me decide which embeddings to use. And I used Claude Code to generate some of the front-end code, and Qwen Coder and GPT-OSS and Perplexity to help me learn about Go.
Data Sense

For Rijksearch, data collection was also a lot of work, even more than scaffolding the app. But as ML and data professionals, what helps us here is the same thing that helped Rachel. She had help and mentorship from her family. As data practitioners, what we have from doing this for a while, and being surrounded by data-intensive environments, is data sense. We have been in the trenches of collecting data for a long time. We understand the pain points. We understand NaNs and what loopholes to look for in non-deterministic data flows, such as the ones generated by LLMs. We understand the pleasures and pain of “looking at the data.”

And this is something that Rob Pike said. If you’ve chosen the right data structures and organized things well, the algorithms will almost always become self-evident around the data. Data structures, not algorithms, are essential to programming. And this was really important for me because good similarity matching between images and text requires good data for both.

Luckily, the Rijksmuseum has a lot of data, and it’s all open and accessible to the public. It’s an amazing resource.

They have metadata for 800, 000 objects, high-resolution photographs. But something that was new to me in working with this data was working with OAI-PMH, the data format they offer their artifacts in. It’s available through an API, but the API is very different from what I’ve generally seen. OAI-PMH is an XML-based protocol over HTTP, which is different from most of the JSON-based REST APIs I’ve worked with.
And LLMs helped me quite a bit here because I needed to understand, for example, ListSet records I needed. And, since this was open data it was easy to traverse the API results and get an understanding of what lives in that data set.

And it turns out there’s a lot of data to figure out.

First, I pulled all the paintings we needed from the record files. The data’s stored across multiple collections, so I chose to use the “Dutch Paintings of the Seventeenth Century in the Rijksmuseum” set because that collection had the most data and the . From there, I extracted each entry, making sure to keep the full metadata (title, artist, date, medium, etc.) instead of just a short snippet. This gave me a clean, complete list to work with for the rest of the project.

How do we actually collect that data into our app? Here’s where Go really helps. Go structs are typed collection fields, much like Pydantic models, and they really allow you to do this well. If you have an XML snippet, you can either run it through an LLM to get results, or use a site like XML to Go, no code generation needed. But, of course, LLMs do really well with Go because it is so strongly typed.

It really takes a while to wrangle all the data into the correct format, then, from collecting it as XML, to writing it to disk in Redis.
Then the fun starts, because you collect these records and you start to see that not every piece of data is labeled correctly. This is true of any given dataset, but particularly true here, because this data has been collected over lots and lots of time and who amongst us has not had the wrong language for a title or duplicate data or anything in any of our data. This is where data sense and really really looking at and being hands-on with the data really helps.

Picking Embedding Models
The second part of data is picking an embedding model. For Viberary, I usedMiniLM just because it was small, fast, and it was standard. Now we have larger models that are hosted as APIs. Generally, depending on your use-case, it may make sense to use a domain-specific mode, or fine-tune. But for a side project, a general model was just fine.

There are a lot of other factors we care about when selecting an embedding model, such as inference costs, storage costs, and the overhead of setting up a new account. For me, multimodality was also important (whether the model was trained on both images and text) because that becomes relevant for retrieval.
To help narrow down my options, I used Perplexity to generate a comparison grid and verify the information. One thing I like about Perplexity is that it cites its sources, so you can click through and confirm that the model didn’t hallucinate anything.
And so I came to three choices based on price, storage costs. What was interesting is that what really unblocked me was I already had a Google account with access to Google AI. And so I could just use and set up an account key easier than I could for Cohere and Mongo, although I’ve heard good things about both.
And then you have to pick which vector index you’d like to use. The index is the data structure that helps you quickly find related items.

Vector sets already implement HNSW by default. That makes the application really fast at retrieval time, which is what I care about. I didn’t really care as much as how long batch indexing takes, because at the end I only had something like 2, 000 paintings that I needed to index at a time. If I had a large selection, the choice between HNSW and IVF, which is great for data sets with large memory constraints, would have become less clear.

So then you embed everything, and then you start to get your results. And of course, because this is a data-intensive project, you get this thing where you query for “boat”, and you get the same thing for every single result, which doesn’t look like a boat at all. But there’s a pyramid, and you’re like, okay, why is this happening?
So then you have to go through and debug step by step, and it turns out the issue is not in the data structure, and not in the ingest, and not in the Gemini embeddings, and not in the indexing, and not in the web server, but in the fact that you picked an art category that only has 100 artifacts and your query is similar to all of them. And that’s all part of the process of looking at your data, too.
Where AI helped and where it didn’t
Rachel painted her cactus. This was her remixing and tastefully adding small, exotic elements to the larger whole. For me, this was AI and really evaluating where it might help or hurt my process to create good things and get better at what I do. So, where did I end up?

I used AI to build the front end, which turned out nice. It wasn’t anything out of the ordinary, and I am not sure how solid the actual code is, and whether it’s production-ready, but it was nice to get going since I’m not a front-end person at all and we always want to get to a good demo. Last time, I used a bootstrap template for this that I tweaked slightly, which I guess is not that much different.
LLMs also helped me with Go, but they didn’t help me by writing all of the code. How I’ve been learning Go is step by step: first, by reading dead tree books, then by trying out Go by Example, and finally, by asking about different Go concepts from the LLMs, or working with a very small, method-level snippet at a time.
Because what happened when I was having the model write Go directly is I found, due to cognitive offloading, that I couldn’t understand what “I’d” written, at all, and when I got up from the computer, I couldn’t replicate it. But it helped me answer questions like how structs work, what you should put in a struct, what a channel does. The kind of stuff that you would have to constantly bug a senior person about, because Go is a new language for me, the LLM will just answer. And particularly in some cases where LLM systems have RAG results, it will lead you to the source documentation because Golang has really, really excellent documentation, and that’s really what we want out of learning by reinforcement.
LLMs also helped with data formatting and exploration. So writing all of the matplotlib methods, the pandas methods, all that stuff where you need to constantly look stuff up or you need to get data from the API. A lot of that glue stuff, the models, any of them really, are good with.

So where did it not help? Largely with architecture changes and decisions. Understanding that I needed a two-tower model, what two-tower models were, how they’d been used previously in search and recommendations, understanding my latency constraints, knowing that serving this app has to be fast and and that the indexing job’s speed didn’t matter as much: all of that came from my head and past experience rather than AI.
Picking vector sets as a storage and retrieval primitive was similar. I only learned they existed because I pay close attention to what’s happening in the vector space. (haha) You could probably build a bot to synthesize that kind of knowledge for you, but it was much easier to verify for myself.
More complex concepts like Go channels , which I haven’t gotten to, were also a weak spot. I couldn’t verify whether what it gave me in Go was actually correct, and had a lot of non-idiomatic code until I got an actual experienced Go developer to take a look and review “my” code. And of course, AI didn’t help in writing this talk. It took a very long time, and none of it was written with AI. It all came from ideas that had been slowly combining in my head. Which brings me to the final point: it takes time to build good things.
It takes time to build good things

It took Rachel 60 years to get good at painting.

You can see one of her first paintings on the left, and then one of her later, much, much later paintings on the right. Take a look at how much more complex the composition is, how the flowers are behind each other, how much more delicate the background is, the movement of the flowers in the space. It takes so much time to get good at that stuff. And we have to remind ourselves in the world of speed that that’s true for programming too.

Peter Norvig wrote this post a long time ago about teaching yourself programming in 10 years. There is no shortcut for deliberative practice, which is what we’re trying to do here when we build ourselves flowers.

For me, I found it takes time because I was going through my tweet archive to see what I could find for this talk. In 2013, I wrote about how I was learning Python and counting words and just really struggling through it. And it’s like, it’s not very optimal and I don’t really understand what’s going on and what RGV is, etc. And then 10 years later I was able to do Viberary just through deliberate practice. And three years later, I did it again, in Go, with multimodal embeddings.

And the other thing is, I didn’t work on my craft alone. The other thing I’ve written about is write with your alphabet radio on. And this is a concept from Bird by Bird by Anne Lamott, which is a book on how to write, but also on how to be creative.
Anne Lamott talks about this alphabet radio that plays in your head. Of all the past people that have either criticized or praised you and how you have to tune some of them out selectively when you work. For me, it’s all the people that have reviewed my code and that I’ve worked with and the ones that I respect are always echoing in my head. That kind of feedback comes from good mentorship. That’s the apprenticeship that Rachel went through as well right so it takes time to get good at that.

It also takes time to teach others. So this is another small example from another side project I did before called Gandinsky, where I tried to train a generative adversarial network to paint pictures in the style of Vassily Kandinsky. At the time, this didn’t work well. This was like eight years ago, and it didn’t work because you need thousands of samples to train a GAN and Kandinsky unfortunately in all his wisdom only painted 200 paintings.
Now, you can just get Nano Banana to do style replication for almost anything for free in an API call because we’ve been working on these models for such a long time. But it’s taken years and years for the models to get this good.
And finally, it takes time to own your own vision, as in this quote is from antirez and his recent thoughts on agentic programming.

Programming as an automatic vision is not yet. That only comes from doing the work over and over and over again. So building a semantic search engine really took me about a month once I buckled down and did it (though I’m still not entirely done). But it also took me 10 years to do this. And I think that’s important to keep in mind.

So, okay, so now we’re at the end of the talk, and the question is, does this still matter? Yes, I think so. I think data sense matters, which we all inherit from having worked in machine learning. I think mastery matters. I think craft still matters.

So let’s try to build ourselves flowers. Thank you.

#machine learning #art #engineering #search #go #ai #software craft #recommendations