Annoy

This backend uses Spotify's Annoy library for approximate nearest neighbour search. It is memory-efficient and supports persisting the index to disk, making it useful when the index needs to be saved and reloaded across sessions.

When to use

Memory efficiency is a priority
It performs best when you need to persist the index to disk and reload it without rebuilding
It performs worst on very low-dimensional data where the tree structure can degenerate

How it works

At fit time, Annoy builds a forest of n_trees random projection trees over the validation features. More trees produce better recall at the cost of more memory and longer build times. At query time, search_k controls how many nodes are visited across all trees — higher values improve recall at the cost of slower queries. By default, search_k is set to n_trees * k.

Parameters

Parameter	Type	Default	Description
`k`	int	10	Number of neighbours
`n_trees`	int	100	Number of trees in the forest. Higher = better recall, more memory
`metric`	str	`"euclidean"`	Distance metric: `"euclidean"`, `"angular"`, `"manhattan"`, `"hamming"`, `"dot"`
`search_k`	int	`n_trees * k`	Nodes visited per query. Higher = better recall, slower queries

Dependencies

annoy>=1.17

Example

from deskit.des.knorau import KNORAU

router = KNORAU(task="classification", metric="accuracy", mode="max", k=20,
                preset="custom", finder="annoy", n_trees=100)
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)

Notes

Annoy has a known bug on Apple Silicon (M1/M2/M3) where the index silently returns only 1 neighbour regardless of k. deskit detects this at fit time and raises a RuntimeError with instructions to switch to preset="fast" or preset="exact" instead.

Annoy does not have a built-in preset, and it is only available via preset="custom" with finder="annoy".