Embeddings

You Don't Need a Vector Database

⏱ 6 min read · LLM Infrastructure

Every AI startup's tech stack includes Pinecone, Weaviate, or Qdrant. Vector databases are the new must-have infrastructure. Except for most applications, they're overkill. A simple in-memory array and cosine similarity will handle 90% of semantic search use cases.

Vector databases are powerful, but they're also complex, expensive, and often unnecessary. Before you add one to your stack, ask: do I actually need this?

What Vector Databases Actually Do

A vector database stores embeddings and provides fast similarity search. Instead of comparing your query vector to every document vector sequentially, it uses indexing structures (HNSW, IVF, etc.) to find approximate nearest neighbors quickly.

This matters at scale. Comparing a query to 10 million vectors sequentially takes seconds. Using an indexed vector database takes milliseconds. The speedup is real and necessary — if you have millions of vectors.

But if you have 10,000 vectors? Or 100,000? Sequential comparison is fast enough. Modern CPUs can compute millions of cosine similarities per second. You don't need specialized infrastructure until you're well past 100K vectors.

The In-Memory Alternative

Store your embeddings in a simple array. When a query comes in, compute cosine similarity between the query embedding and every stored embedding. Sort by similarity. Return the top N results.

This is O(n) complexity, which sounds bad. But for n < 100,000, it's fast enough. The entire operation takes under 100ms on a modern CPU. And the code is trivial — no external dependencies, no infrastructure to maintain.

You can optimize further with numpy or similar libraries that vectorize the similarity computation. This brings the time down to 10-20ms for 100K vectors. Still no specialized database needed.

Vector databases solve a real problem, but only at scales most applications never reach.

When You Actually Need a Vector DB

Millions of vectors. Real-time updates. Multi-tenant isolation. Distributed search across multiple nodes. These are the use cases where vector databases justify their complexity.

If you're building a search engine for a large corpus, you need a vector database. If you're building RAG for a chatbot with 10,000 documents, you probably don't.

The threshold isn't precise, but a good rule of thumb: if your embeddings fit comfortably in RAM and your search latency is acceptable with sequential comparison, you don't need a vector database yet.

The Hidden Costs

Vector databases aren't free. Pinecone charges based on the number of vectors stored and queries per second. Self-hosted options like Weaviate require infrastructure, monitoring, and maintenance.

There's also complexity cost. Vector databases introduce new failure modes, new monitoring requirements, and new operational overhead. Your application now depends on external infrastructure that can fail, degrade, or require updates.

For a startup moving fast, this complexity is a tax. Every additional piece of infrastructure is something that can break, something that needs to be learned, something that slows down iteration.

The SQLite Approach

SQLite with a JSON column for embeddings is surprisingly effective. Store your documents in SQLite, store embeddings as JSON arrays, and do similarity search in application code.

This gives you persistence, transactions, and familiar SQL tooling. The search isn't as fast as a specialized vector database, but it's fast enough for many use cases. And you avoid adding new infrastructure.

Postgres with pgvector is another option. It's not as fast as dedicated vector databases, but it's fast enough for moderate scale, and you're probably already running Postgres.

When to Upgrade

Start simple. Use in-memory arrays or SQLite. Measure your latency. If search takes under 100ms and you're not hitting memory limits, you're fine.

When you cross 100K vectors, start benchmarking. If latency is still acceptable, keep your simple solution. If latency becomes a problem, consider a vector database.

The key is to let actual performance problems drive infrastructure decisions, not anticipated scale. Most applications never reach the scale where vector databases are necessary.

The Right Tool for the Job

Vector databases are excellent at what they do. For applications that need sub-millisecond search across millions of vectors, they're essential. But that's a small fraction of AI applications.

For most use cases — RAG chatbots, document search, recommendation systems with moderate catalogs — simpler solutions work fine. Don't add complexity before you need it.

Start with the simplest thing that works. Measure. Optimize when you have real performance problems. This approach keeps your stack lean and your iteration speed high.

Experiment with semantic search using LLM Utils Embeddings Tool — see how cosine similarity works without any database infrastructure.