Python代写 | CS 506 – HW1 Clustering and Visualization
Package Limitations: Problem 1: Numpy is allowed; Problem 2: You may
use libraries such as Sklearn that implements clustering methods; Problem 3:
Suggested packages: Folium, Pandas; Problem 4: Suggested packages: K-means
in CS506 python package or K-means implemented in Problem 1.
In this exercise, you will implement the K-means clustering algorithm. You will
start on an example 2D dataset that will help you gain an intuition of how the
K-means algorithm works. You will be using k means clustering.py for this part
of the exercise.
The K-means algorithm is a method to automatically cluster similar data ex-
amples together. Concretely, you are given a training set and want to group
the data into a few cohesive clusters. The intuitionbehind K -means is an iterative
procedure that starts by guessing the initialcentroids, and then refines this guess
by repeatedly assigning examples to their closest centroids and then recomputing
the centroids based on the assignments.
The inner loop of the algorithm repeatedly carries out two steps:
Assigning each training example x to its closest centroid
Recomputing the mean of each centroid using the points assigned to it.
The K-means algorithm will always converge to some final set of means for
the centroids. Note that the converged solution may not always be ideal and
depends on the initial setting of the centroids. Therefore, in practice the K-
means algorithm is usually run a few times with different random initializations.
One way to choose between these different solutions from different random ini-
tializations is to choose the one with the lowest cost function value (distortion).
You will implement the two phases of the K-means algorithm separately in the
In the cluster assignment phase of the K-means algorithm, the algorithm assigns
every training example x1 to its closest centroid, given the current positions of
centroids. Specifically, for every example x1
where ci is the index of the centroid that is closest to xi; and j is the position
(index) of the j-th centroid.
Your task is to complete the code in function find closest centroids. This
function takes the data matrix samples and the locations of all centroids inside
centroids and should output a one-dimensional array of clusters that holds the
index (a value in（1,…,K）; where K is total number of centroids) of the closest
centroid to every training example. You can implement this using a loop over
every training example and every centroid. Once you have completed the code
in find closest centroids, you can run it and you should see the output [0; 2; 1; ]
corresponding to the centroid assignments for the first 3 examples.
Please take a look at Figure 1 to gain an understanding of the distribution
of the data. It is two dimentional, with x1 and x2.