In the last 12 months there has been a proliferation of vector DB startups. I’m not here to debate the specific design tradeoffs of any of them. Instead, I want to push back on several common approaches to what a vector database is, what it’s for, and how you should use one to solve problems.
Vector databases aren’t memory
Many vector databases frame their basic utility as solving the problem of language models lacking long term memory, or the fact that you can’t place all of the context for a question into your prompt.
However, vector search is ultimately just a particular kind of search. Giving your LLM access to a database it can write to and search across is very useful, but it’s ultimately best conceptualized as giving an agent access to a search engine, versus actually “having more memory”.
Imagine you’re a company that wants to build an LLM-powered documentation experience. If you think of a vector database as just providing an expanded memory to your language model, you might just embed all of your company’s product docs, and then let users ask questions to your bot. When a user hits enter, you do a vector search for their query, find all of the chunks, load them into context, and then have your language model try to answer the question. In fact, that’s the approach we initially took at Stripe when I worked on their AI docs product.
Ultimately though, I found that approach to be a dead-end. The crux is that while vector search is better along some axes than traditional search, it's not magic. Just like regular search, you'll end up with irrelevant or missing documents in your results. Language models, just like humans, can only work with what they have and those irrelevant documents will likely mislead them.
If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves. This likely something your organization has considered before, and if it doesn’t exist it’s because building a good search engine has traditionally been a significant undertaking.
The good news
You’ve sat down and decided to build good search, how do you actually do it? It turns out that in this case LLMs can actually save the day.
Embeddings, for all that they aren’t a magic wand, are still pretty amazing. High-quality embedding search will have a lower false negative rate than keyword search, and combining the two results in much better performance than any pure fulltext search (Google has been doing this for years with BERT). However, both embeddings themselves and the tools needed to use them in large-scale search, have improved by leaps and bounds. There are plenty of battle-tested databases that let you combine keyword and vector search, and I highly recommend using one of these (at Elicit we use Vespa, but vector databases like Chroma now often support this as well).
Once you’ve improved your overall search by blending embeddings with more traditional methods, you get to the fun stuff. A savvy human trying to find information via a search engine knows how to structure their query in order to ensure they find relevant information (Google-fu used to be a powerful art form), language models can do the same. If your model wants to find “what’s the latest news on malaria vaccines,” you could have a language model construct a query that includes a date filter. There is a ton of low hanging fruit here, and after that an almost endless amount of tweaking that can be done to result in incredible quality search. Like in many other cases, similar things were possible in the world before LLMs, but they took a lot of specialized skill and effort. Now you can get competitive performance with a few hours of your time and some compute.
The final stage in the traditional search pipeline is re-ranking. It used to be the case that to do re-ranking you would train a relevancy model on signals like which items a user clicks on for a given search results page, and then use that model to sort your top results. If you’re not a whole team structured around building a search engine, this isn’t a viable problem to tackle. Now with language models, you can provide some details on a query:result pair to a model and get a relevancy score that will beat out all but the best purpose-built systems.
Ultimately, recent advancements in AI make it much easier to build cutting-edge search, using orders of magnitude less effort than once required. Because of that, the return on sitting down and seriously building good search is extremely high.
If you want to build a RAG-based tool, first build search.
Postscript (The bad news)
You’ve built a nice search engine using the above techniques, now it’s time to deploy it. Unfortunately, language models don’t let you avoid the other half of building a search engine: evaluating it.
Specifically, this means being able to answer questions like:
- “When is doing a search appropriate?”
- “When you do a search, what content are you actually trying to locate?”
- “How high does that content rank in your results?”
Answering any of those questions requires building evaluation and monitoring infrastructure that you can use to iterate on your search pipeline and know whether the changes you make are improvements. For a followup on evaluating search engines, I recommend this excellent series of posts.