A Practitioner’s Guide to Machine Learning

k-Nearest Neighbors (kNN)

The first similarity-based model we’ll look at is k-nearest neighbors (kNN), which follows a rather naive and straightforward approach, but nevertheless often achieves a good performance on complex problems.

Main idea: For a new sample, identify the k most similar training data points and predict the average of their target values / their most frequent class:

This kind of approach is also called Lazy Learning, since the model doesn’t actually learn any kind of internal parameters, but all the real computation only happens when we make a prediction for a new data point.
(When calling the fit-method on the sklearn model, a search tree is built to efficiently identify the nearest neighbors for a new data point.)

from sklearn.neighbors import KNeighborsRegressor, KNeighborsClassifier

Important Parameters:

→ n_neighbors: How many nearest neighbors to consider when making a prediction.
→ metric: How to compute the similarity between the samples (default: Euclidean distance).
→ weights: By setting this to 'distance' instead of the default 'uniform', the labels of the nearest neighbors contribute to the prediction proportionally to their distance to the new data point.

Pros

Intuitive approach.

Careful

Results completely depend on the similarity measure.