Skip to content

HNSW

This backend uses Hierarchical Navigable Small World graphs for approximate nearest neighbour search. It is the recommended choice for high-dimensional data.


When to use

  • The data has a large number of features, as HNSW handles high-dimensional spaces better than FAISS IVF, which degrades as dimensionality increases
  • It performs best when a good balance of build time, query speed, and recall on large datasets is needed
  • It performs worst on small datasets where the graph overhead is not justified and preset="exact" would be simpler

How it works

At fit time, HNSW builds a multi-layer graph where each node is a validation sample. M controls how many edges each node has — higher values produce a denser graph with better recall at the cost of more memory and longer build times. ef_construction controls the search width during graph construction, where higher values produce a higher quality graph. At query time, ef_search controls the search width, where higher values improve recall at the cost of query speed.

Two backends are supported: hnswlib (default) and nmslib.


Presets

Preset M ef_construction ef_search
high_dim_balanced 32 400 200
high_dim_fast 16 200 100

Parameters

Parameter Type Default Description
k int 10 Number of neighbours
space str "l2" Distance metric: "l2", "cosine", or "ip"
M int 32 Number of graph connections per node
ef_construction int 400 Search width during graph construction
ef_search int 200 Search width at query time
backend str "hnswlib" Underlying library: "hnswlib" or "nmslib"

Dependencies

hnswlib>=0.7


Example

from deskit.des.knorau import KNORAU

# Using a preset
router = KNORAU(task="classification", metric="accuracy", mode="max", k=20,
                preset="high_dim_balanced")
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)

# Custom configuration
router = KNORAU(task="classification", metric="accuracy", mode="max", k=20,
                preset="custom", finder="hnsw", M=48, ef_construction=600, ef_search=300)
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)

Notes

For datasets with a large number of samples, ef_construction below 300 may produce a low-quality graph — a warning will be emitted in that case.

nmslib is an alternative backend to hnswlib that supports additional distance spaces. It requires a separate install: pip install nmslib.