DEWS-T
This algorithm is designed to be consistent and flexible for both classification and regression tasks. It is a variation of DEWS-I that fits a weighted trend line over each model's scores across the K neighbours, extrapolating to estimate competence at the test point itself rather than averaging. It uses soft blending between the top experts in a certain competence region to compute a set of weights for the models.
When to use
- DEWS-T is currently the general recommendation when you want a consistent single algorithm across both classification and regression. It works best with soft metrics, so it works for regression classification with confidence scores, but not as well with hard predictions
- It performs best when competence regions have a smooth, directional structure, so model quality changes linearly with distance from the test point
- It performs worst for homogeneous datasets, noisy neighborhoods, and classification with hard predictions
How it works
When fit is called, DEWS-T fits a KNN algorithm on the validation data and builds a criterion score matrix.
For MAE and MSE, signed residuals are stored instead of raw metric values so that
directional information is preserved across neighbors.
When predict is called, it finds the K nearest neighbors from the test point. For each model, it fits
a weighted least squares line over the K neighbors, using inverse-distance weights so closer neighbors
pull the fit more strongly. The line is then extrapolated to distance = 0 to estimate the model's competence
at the test point.
The quality of each trend line is evaluated using weighted R². If R² is below the r2 threshold, the algorithm falls back to DEWS-I. This fallback happens per model per sample, so some models may use the trend while others fall back on the same test point.
Afterwards, it normalizes the average scores using min-max normalization and removes the models under a threshold. Finally, it takes the remaining models and creates weights with their scores using softmax with temperature.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
task |
str | — | "classification" or "regression" |
metric |
str or callable | — | Scoring function per sample. Built-ins: accuracy, mae, mse, rmse, log_loss, prob_correct. Custom callables (y_true, y_pred) -> float are accepted |
mode |
str | — | "max" if higher is better, "min" if lower |
k |
int | 10 | Number of neighbours |
threshold |
float | 0.5 | Competence cutoff |
temperature |
float | 0.1/1.0 for regression/classification | Defines how smooth the model blend is |
r2_threshold |
float | 0.7 | Minimum weighted R² for the trend line to be trusted |
preset |
str | "balanced" |
ANN backend preset |
finder |
str | —, optional | Only if the preset is "custom"; Options: "knn", "faiss", "annoy", "hnsw" |
Example
# Regression
from deskit.des.dewst import DEWST
router = DEWST(task="regression", metric="mae", mode="min", k=20)
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)
# Classification
from deskit.des.dewst import DEWST
router = DEWST(task="classification", metric="log_loss", mode="min", k=20)
router.fit(X_val, y_val, val_preds)
weights = router.predict(x)
Notes
A lower temperature is recommended for regression because regression metrics tend to produce scores on a continuous scale where differences can be large, so a low temperature sharpens the softmax to reflect that. In contrast, classification metrics tend to produce scores that are closer together, so a higher temperature keeps the blend soft.
The r2 threshold controls how often the trend line is trusted over the DEWS-I fallback. A higher value means the algorithm is more conservative and only extrapolates when the evidence is strong, converging toward DEWS-I behavior on noisy datasets. A value of 0.7 is recommended as it filters out weak trends while still exploiting genuine directional structure when present.