Stay organized with collections
Save and categorize content based on your preferences.
Clustering overview
Clustering is an unsupervised machine learning technique you can use to group
similar records together. It is a useful approach for when you want to
understand what groups or clusters you have in your data, but don't have
labeled data to train a model on. For example, if you had unlabeled data about
subway ticket purchases, you could cluster that data by ticket purchase time to
better understand what time periods have the heaviest subway usage. For more
information, see
What is clustering?
By using the default settings in the CREATE MODEL statements and the
inference functions, you can create and use a clustering model even
without much ML knowledge. However, having basic knowledge about
ML development, and clustering models in particular,
helps you optimize both your data and your model to
deliver better results. We recommend using the following resources to develop
familiarity with ML techniques and processes:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-03-05 UTC."],[[["Clustering is an unsupervised machine learning technique that groups similar records together, useful for understanding data patterns without labeled training data."],["K-means models, a widely used clustering method, can be used with `ML.PREDICT` to cluster data or with `ML.DETECT_ANOMALIES` for anomaly detection."],["K-means models utilize centroid-based clustering, and information about a model's centroids can be obtained using the `ML.CENTROIDS` function."],["While you can create and use clustering models with default settings without extensive machine learning knowledge, basic familiarity with ML and clustering models can improve results."]]],[]]