AI & ML Advanced By Samson Tanimawo, PhD Published Dec 27, 2026 5 min read

Vector Index Types: HNSW, IVF, ScaNN, DiskANN

Vector databases hide the index type. Knowing what each index is doing matters when you scale past a few million vectors.

HNSW

Hierarchical Navigable Small World. The dominant in-memory vector index. Builds layered graphs where higher layers have longer-distance edges and lower layers have shorter ones. Search starts at the top, descends. Excellent recall and speed; the default choice for most production vector stores.

The mechanics. Multiple layers, each with progressively more nodes. Top layer has few nodes with long edges; bottom layer has all nodes with short edges. Search starts from a top-layer entry point; greedy walk to nearest node; descend; repeat. The hierarchy converges fast.

The performance characteristics. Sub-millisecond query latency for moderate-sized indexes (millions of vectors). Recall@10 above 95% with reasonable parameters. Memory overhead: 1.5-2x the raw vector size. Build time: linear in dataset size with logarithmic factor.

The parameter knobs. M (graph degree, 16-64 typical), efConstruction (build-time accuracy, 100-500), efSearch (query-time accuracy, 10-200). Each trades quality for speed/memory. Tune for your workload.

The when-it-fits. Datasets that fit in RAM. Latency-critical applications. Workloads where recall and speed both matter. The default choice for most teams; deviate only when specific constraints force you to.

IVF

Inverted File Index. Cluster vectors using k-means; build inverted index from cluster to vectors. Query: find nearest clusters; search vectors within. Trades recall for speed at scale; combined with PQ (product quantisation) for compression. Used in FAISS, Milvus.

The mechanics. Cluster N vectors into K clusters. Each vector belongs to its nearest cluster. To search, compute query distance to all K cluster centroids, pick the closest few clusters, exhaustively search vectors within those clusters. K=sqrt(N) is a common choice.

The recall-speed trade-off. Fewer clusters searched per query = faster but lower recall (might miss the right vector if it's in an unsearched cluster). More clusters = slower but higher recall. The "nprobe" parameter controls this; tune to your latency budget and recall requirement.

The PQ pairing. IVF often combines with Product Quantisation for compression. PQ splits each vector into sub-vectors; quantises each sub-vector; reconstructs approximately. Memory drops 4-32x; recall drops moderately. IVF+PQ scales to billion-vector datasets that pure HNSW couldn't fit.

The when-it-fits. Datasets too big for RAM. Workloads where moderate recall is acceptable. Cost-sensitive deployments. The compression+approximation trade-off is right when you can't afford full-precision vectors.

ScaNN

Google's research-derived index. Asymmetric hashing + scoring. Excellent recall-speed at large scale (10s of millions of vectors). Less commonly available than HNSW; rising in adoption.

The mechanics. Anisotropic vector quantisation: compress vectors with awareness of which directions matter most for inner-product similarity. Hash-based first pass narrows candidates; exact rescoring on the candidates. Combines benefits of hashing (fast first pass) with exact search (high accuracy on small candidate set).

The performance state. ScaNN often beats HNSW on Google's benchmarks at larger scales. The advantage is most pronounced at 10M-100M vector range. Below 1M, HNSW is competitive; above 100M, hybrid disk-based approaches like DiskANN may win.

The deployment reality. Available as a Python library; integrated into some vector stores (Vertex Vector Search uses ScaNN under the hood). Less ubiquitous than HNSW in third-party stores; the gap is closing.

The when-it-fits. Large-scale (10M+) datasets where HNSW's memory overhead is constraining. Workloads with budget for the engineering integration. Google-stack teams that have ScaNN available natively.

DiskANN

Microsoft's billion-scale graph index. Stores graph on SSD; uses RAM for hot vectors. Achieves billion-scale at fraction of HNSW's memory cost. Higher latency than RAM-only indexes (10-50ms vs sub-ms) but enables datasets nothing else can serve cheaply.

The mechanics. Build an HNSW-like graph but optimise edge structure for SSD access patterns. Store the graph on disk; cache hot nodes in RAM. Query traverses the graph with strategic caching to hide SSD latency.

The cost advantage. RAM costs 100-1000x per byte vs SSD. For billion-vector datasets, RAM cost is the dominant cost. DiskANN cuts RAM requirement by 10-100x; total cost falls similarly.

The latency cost. SSD access adds 10-100µs per node access. Total query latency 10-50ms, much higher than HNSW's <1ms. For latency-tolerant use cases, the trade-off is favorable; for latency-critical, RAM-only wins.

The when-it-fits. Billion-vector datasets. Cost-sensitive deployments. Latency budgets in the 10-100ms range. Search applications that prioritise scale over speed.

Picking one

Heuristic. Under 1M vectors and you have RAM: HNSW. 1-100M and budget-conscious: IVF-PQ in FAISS or Milvus. 100M+ and SSDs: DiskANN. The big managed vector stores (Pinecone, Weaviate, Qdrant) abstract the index choice; you mostly just pick the store. Don't agonise over index type for sub-100M datasets.

The managed-store reality. Most teams don't pick an index; they pick a store. Pinecone, Weaviate, Qdrant, pgvector with extensions all manage indexes for you. The store's defaults are usually right; tune only when needed.

The pgvector option. PostgreSQL with vector extension. Combines vector search with traditional SQL. Lower performance than dedicated stores but operationally simple, one database. For sub-10M datasets, often the right choice.

The benchmark reality. Vendor benchmarks favor the vendor. Run your own benchmark on your data, your queries, your hardware. The numbers will surprise; pick based on your evidence, not theirs.

The hybrid pattern. Combine vector search with metadata filtering (find products in this category that are similar to X). Most stores support this; some better than others. Verify hybrid query performance, pure vector benchmarks don't reflect production needs.

Common antipatterns

Picking an index by reading papers. Workload matters more than algorithm. Benchmark on YOUR data.

Tuning index parameters without an eval. The right tune depends on what "good enough" means for your application. Define eval first.

Over-investing in index choice for small datasets. For sub-1M vectors, HNSW is fine. Spend the optimisation time elsewhere.

Ignoring metadata-filter performance. Pure vector benchmarks miss the case where most queries combine vector and filter. Test hybrid queries.

What to do this week

Three moves. (1) Compute your dataset size. The classification (small/medium/large) immediately rules out some choices. (2) Define your latency budget. <10ms rules out DiskANN; 10-100ms opens it up. (3) Run a 1-day benchmark of 2-3 candidate stores on your real data shape. The benchmark gives you the answer; agonising about theory doesn't.