k-Nearest Neighbors (kNN)


The first similarity-based model we’ll look at is k-nearest neighbors (kNN), which follows a rather naive and straightforward approach, but nevertheless often achieves a good performance on complex problems.

Main idea

For a new sample, identify the k most similar training data points and predict the average of their target values / their most frequent class:


This kind of approach is also called Lazy Learning, since the model doesn’t actually learn any kind of internal parameters, but all the real computation only happens when we make a prediction for a new data point.
(When calling the fit-method on the sklearn model, a search tree is built to efficiently identify the nearest neighbors for a new data point.)

from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier

Important Parameters:

  • n_neighbors: How many nearest neighbors to consider when making a prediction.

  • metric: How to compute the similarity between the samples (default: Euclidean distance).

  • weights: By setting this to 'distance' instead of the default 'uniform', the labels of the nearest neighbors contribute to the prediction proportionally to their distance to the new data point.

  • Intuitive approach.

  • Results completely depend on the similarity measure.