FAISS
This backend uses Facebook's FAISS library for approximate nearest neighbour search. It is the recommended default for most datasets and ships with three presets.
When to use
- Use as the default choice for most low-to-medium dimensional datasets
- It performs best when datasets are large and an accuracy loss is worth the reduced computation
- It performs worst on very high-dimensional data
How it works
At fit time, FAISS builds an index over the validation features. The index type determines
the tradeoff between speed and recall. flat performs exact search via FAISS's optimised
L2 scan. ivf (Inverted File Index) partitions the dataset into cells using k-means, then
at query time searches only a subset of cells determined by n_probes, where more probes means
higher recall but slower queries. n_cells is set automatically to sqrt(n_samples) if
not specified, capped at 4096.
Presets
| Preset | index_type | n_probes | Recall |
|---|---|---|---|
balanced |
ivf |
50 | ~98% |
fast |
ivf |
30 | ~95% |
turbo |
flat |
— | 100% |
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
k |
int | 10 | Number of neighbours |
index_type |
str | "flat" |
Index type: "flat", "ivf", or "hnsw" |
n_cells |
int | sqrt(n_samples) |
Number of IVF cells. Only used when index_type="ivf" |
n_probes |
int | 50 | Number of cells searched per query. Only used when index_type="ivf" |
hnsw_M |
int | 32 | HNSW graph connections per node. Only used when index_type="hnsw" |
hnsw_efConstruction |
int | 400 | HNSW build-time search width. Only used when index_type="hnsw" |
hnsw_efSearch |
int | 200 | HNSW query-time search width. Only used when index_type="hnsw" |
Dependencies
faiss-cpu>=1.7
Example
from deskit.des.knorau import KNORAU
# Using a preset
router = KNORAU(task="classification", metric="accuracy", mode="max", k=20, preset="balanced")
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)
# Custom IVF configuration
router = KNORAU(task="classification", metric="accuracy", mode="max", k=20,
preset="custom", finder="faiss", index_type="ivf", n_probes=80)
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)
Notes
FAISS IVF requires a minimum of 40 samples per cell to train. If the validation set is too
small, the library will automatically reduce n_cells and emit a warning.
FAISS Flat may have floating-point precision issues on data with 2 or fewer feature; a warning will be emitted in that case.