Data clustering method and apparatus based on k-nearest neighbor and computer readable storage medium

    公开(公告)号:US11210348B2

    公开(公告)日:2021-12-28

    申请号:US16396682

    申请日:2019-04-27

    Abstract: The present disclosure provides a data clustering method based on K-nearest neighbor, which sorts data points to be clustered in ascending order according to the maximum radiuses of K-nearest neighbors of the data points, that is, according to the density, and perform the first pass across the data points after sorting the data points in ascending order to incorporate the data points that conform to the statistical similarity into the same cluster; then perform the second pass across the data points with smaller cluster density according to the scale required during the clustering to find out all noise points and incorporate non-noise points into the nearest large-density cluster, so as to realize data clustering, which has the benefits of no need to preset the number of clusters and know the probability distribution of the data and convenience to set parameters.

    Data Clustering Method and Apparatus Based on K-Nearest Neighbor and Computer Readable Storage Medium

    公开(公告)号:US20190251121A1

    公开(公告)日:2019-08-15

    申请号:US16396682

    申请日:2019-04-27

    Abstract: The present disclosure provides a data clustering method based on K-nearest neighbor, which sorts data points to be clustered in ascending order according to the maximum radiuses of K-nearest neighbors of the data points, that is, according to the density, and perform the first pass across the data points after sorting the data points in ascending order to incorporate the data points that conform to the statistical similarity into the same cluster; then perform the second pass across the data points with smaller cluster density according to the scale required during the clustering to find out all noise points and incorporate non-noise points into the nearest large-density cluster, so as to realize data clustering, which has the benefits of no need to preset the number of clusters and know the probability distribution of the data and convenience to set parameters.

Patent Agency Ranking