Detecting outlier datapoints is referred to as Anomaly Detection

How does Spotify make such great playlists on the fly just based on a single song? How do credit card companies detect fraud from hundreds of thousands of accounts without using training data? Unsupervised learning! Unlike supervised learning where we train out algorithm to label data based on previous training sets, unsupervised learning can help us glean information from our data that would otherwise be hidden. I’ve put together a notebook that takes you through K-means clustering (with cluster count optimization) to identify how samples may fall into groups. You’ll also learn about Gaussian Mixture models, and how they can help us with anomaly detection.

You can run all the code in Colab. Enjoy!