k-Nearest Neighbors (kNN)

image

The first similarity-based model we’ll look at is k-nearest neighbors (kNN), which follows a rather naive and straightforward approach, but nevertheless often achieves a good performance on complex problems.


Main idea

For a new sample, identify the k most similar training data points and predict the average of their target values / their most frequent class:

image

This kind of approach is also called Lazy Learning, since the model doesn’t actually learn any kind of internal parameters, but all the real computation only happens when we make a prediction for a new data point.
(When calling the fit-method on the sklearn model, a search tree is built to efficiently identify the nearest neighbors for a new data point.)

from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier

Important Parameters:

  • n_neighbors: How many nearest neighbors to consider when making a prediction.

  • metric: How to compute the similarity between the samples (default: Euclidean distance).

  • weights: By setting this to 'distance' instead of the default 'uniform', the labels of the nearest neighbors contribute to the prediction proportionally to their distance to the new data point.

Pros
  • Intuitive approach.

Careful
  • Results completely depend on the similarity measure.