Skip to content

Annoy

This backend uses Spotify's Annoy library for approximate nearest neighbour search. It is memory-efficient and supports persisting the index to disk, making it useful when the index needs to be saved and reloaded across sessions.


When to use

  • Memory efficiency is a priority
  • It performs best when you need to persist the index to disk and reload it without rebuilding
  • It performs worst on very low-dimensional data where the tree structure can degenerate

How it works

At fit time, Annoy builds a forest of n_trees random projection trees over the validation features. More trees produce better recall at the cost of more memory and longer build times. At query time, search_k controls how many nodes are visited across all trees — higher values improve recall at the cost of slower queries. By default, search_k is set to n_trees * k.


Parameters

Parameter Type Default Description
k int 10 Number of neighbours
n_trees int 100 Number of trees in the forest. Higher = better recall, more memory
metric str "euclidean" Distance metric: "euclidean", "angular", "manhattan", "hamming", "dot"
search_k int n_trees * k Nodes visited per query. Higher = better recall, slower queries

Dependencies

annoy>=1.17


Example

from deskit.des.knorau import KNORAU

router = KNORAU(task="classification", metric="accuracy", mode="max", k=20,
                preset="custom", finder="annoy", n_trees=100)
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)

Notes

Annoy has a known bug on Apple Silicon (M1/M2/M3) where the index silently returns only 1 neighbour regardless of k. deskit detects this at fit time and raises a RuntimeError with instructions to switch to preset="fast" or preset="exact" instead.

Annoy does not have a built-in preset, and it is only available via preset="custom" with finder="annoy".