How I search in 2024
We are now in a very weird liminal space in information retrieval for consumers, particularly those attuned to trends in search and working on the bleeding edge of LLMs.
On the one hand, we have the fall of old companies. Broadcast-based centralized social media, which steadily served as a newsfeed and realtime search for a small, vocal minority, is basically dead, or on its last legs. Search, namely Google, is basically a useless pile of ads and SEO gamification at this point and a stopping point for Reddit results. Everyone has written about it and covered this extensively.
On the other hand, we have the meteoric rise of these weird LLM things, which are really just statistical text generators but that people are using for information retrieval through paradigms like RAG which are a little like using a hammer to go fishing.
On the third hand (a very apt metaphor for generative art), on the heels of the large companies of the last 15 years declining, we have a new indie search engine scene emerging, hungry, armed with AI tooling, and ready to take back quality on the web.
I’ve noticed myself using more and more of these new tools, both for work and out of a personal interest in information retrieval, and finding good stuff online, so I thought I’d share what I’m doing lately with search:
Kagi: I’ve been using DuckDuckGo for at least the last 8, and I want to say 10 years, as my default search engine, and I loved their initial mission but the results have never been great, and I’d always have to append g!, which meant that I was doing a Google search anyway.
I’ve been hearing more and more about Kagi and finally decided to try it out in January. The results were so good that I immediately bought a subscription and switched my default search engine in my web and mobile browser to Kagi. They’ve had some controversy lately, among other things, for their very varied product sprawl, but the search results are so very good, the business model is really pro-user and I can’t see myself going back to Google unless something very bad happens (please stick around, Kagi.)
I also really love their summarization tool, and Small Web. I’ve been browsing Small Web serendipitously non-stop for the past couple days and have been super surprised and refreshed to find actual people writing actual content that’s not bots, commercials, SEO spam, or aggrandizing. One time, I got a post about a guy sailing his boat down the coast of Mexico, and another about the history of Unix packaging. It’s pleasant and relaxing.
The only thing Kagi is NOT good for yet is local results, finding out when restaurants or parks are open, and directions, which I’m fine with, because localization means personalization, which I don’t want from this new crop of search engines…yet, or in the same way that I previously wanted personalization in my products.
Marginalia - I love everything about Marginalia, which is a search engine with its own index, run by a single dude, initially on a server in his apartment. Marginalia surfaces only non-commerical content, and as the search engine says, “This search engine isn’t particularly well equipped to answering queries posed like questions, instead try to imagine some text that might appear in the website you are looking for, and search for that.” The idea is that you find not exactly what you’re looking for, but something related on pages that have long been removed from relevance due to the merciless PageRank.
Examples of things I’ve searched for include “transformers architecture”, not because I was looking to understand transformers, but because I wanted to see what sources that were not Medium posts or the usual mainstream sources had to say about it. The way I think about Marginalia is a way to find content outside of the mainstream, but content that is genuine and deep. Like taking a cool dip in the true pool of internet knowledge.
Viktor also has amazing technical write-ups of what it’s like to run your own search engine and optimize it for scale, and all the code is open.
Perplexity - This one has been making headlines and I’ve been reluctant to weigh too heavily on any new LLM hype cycle products just yet, but Perplexity has been surprisingly good for the use-case of when I want to look something technical up and also want the citations and the links because I don’t trust LLMs to just generate the answer for me. An example is, “how does a solar eclipse work?” Another thing I use it for is summarizing PDFs of Arxiv papers.
The TL;DR is that we are now in the unbundling of search engines and information retrieval to some extent into task-specific elements, and I’m excited to see what else emerges in the aftermath of the era of Big Search.