★❤✰ Vicki Boykis ★❤✰

What we don't talk about when we talk about building AI apps

Every day I open my LinkedIn and Twitter (and Mastodon and Bluesky and Threads….) and am innundated with the same messages: LLMs are sent to us from above, they make everyone’s life easier, we are quantizing and pruning, going faster, getting smaller, they will change education, they will write our poetry, they will outlive us all and overthrow humanity and build a happy fruitful LLM robot society, generating art and text, a society where humans exist solely to bring them cyberdrinks with small digital umbrellas.

I am currently building a semantic search application, that doesn’t use OpenAI, but uses the same architectural patterns as most vector search applications - a query tower that encodes text to embeddings and does KNN lookup against a model with pre-encoded vectors, returning the top semantically similar results by cosine similarity using BERT sentence transformers.

From the technical perspective, here’s what we don’t discuss when we talk about deep learning applications. Or, more specifically, here are pain points I’ve come across that I don’t see being discussed a lot - if you are developing deep learning apps and don’t have any of these issues, let me know which god you offered a sacrifice to in the woods under the pale light of the waning crescent:

People spend an inordinate amount of time engineering these Docker images to work correctly in the cloud and on production environments

Build times are one of the top productivity killers in 2023, and deep learning is at the forefront of it.

Which leads me to more Docker issues.

If I do somehow get it to work, I’ll need to develop two Docker images now: one for local development and one for my staging server that runs on DigitalOcean, which means more GitHub actions. And you can’t test GitHub actions locally, anyway. Maybe kind of? But not entirely.

I don’t think I’m alone here. My project is just a small hobby project, but the normcore problems of neural nets and LLMs in production are a whole genre of issues that are not getting coverage proportional to what people are actually experiencing. Just take a look at this logbook of researchers training an LLM, OPT-175 at Meta, which reads like a thriller.

This also doesn’t include any of the traditional, normcore concerns of machine learning in production: concept drift, SLAs for large services, working with Kubernetes, guarantees of distributed systems, and YAML config file drift. Finally, we haven’t talked at all about the UI involved for users navigating search bars and text boxes, which I’ve spent just as much time thinking about as the model itself.

I am not blaming anyone who develops in the ecosystem for these problems. Everyone is working with a very deep and cloudy stack and doing the best they can, particularly in the pressure cooker environment of open-source ML tooling that has only accelerated since the release of ChatGPT.

But I think it’s important to talk about these issues publicly - they are not sexy and they are not LinkedIn-worthy, but these are the problems that make up our days and our lives on the bleeding edge.