# Python代写 | CS 506 – HW1 Clustering and Visualization

本次美国代写是Python聚类和可视化相关的一个Homework

Package Limitations: Problem 1: Numpy is allowed; Problem 2: You may

use libraries such as Sklearn that implements clustering methods; Problem 3:

Suggested packages: Folium, Pandas; Problem 4: Suggested packages: K-means

in CS506 python package or K-means implemented in Problem 1.

In this exercise, you will implement the K-means clustering algorithm. You will

start on an example 2D dataset that will help you gain an intuition of how the

K-means algorithm works. You will be using k means clustering.py for this part

of the exercise.

The K-means algorithm is a method to automatically cluster similar data ex-

amples together. Concretely, you are given a training set and want to group

the data into a few cohesive clusters. The intuitionbehind K -means is an iterative

procedure that starts by guessing the initialcentroids, and then refines this guess

by repeatedly assigning examples to their closest centroids and then recomputing

the centroids based on the assignments.

The inner loop of the algorithm repeatedly carries out two steps:

Assigning each training example x to its closest centroid

Recomputing the mean of each centroid using the points assigned to it.

The K-means algorithm will always converge to some final set of means for

the centroids. Note that the converged solution may not always be ideal and

depends on the initial setting of the centroids. Therefore, in practice the K-

means algorithm is usually run a few times with different random initializations.

One way to choose between these different solutions from different random ini-

tializations is to choose the one with the lowest cost function value (distortion).

You will implement the two phases of the K-means algorithm separately in the

next sections.

In the cluster assignment phase of the K-means algorithm, the algorithm assigns

every training example x1 to its closest centroid, given the current positions of

centroids. Specifically, for every example x1

we set

where ci is the index of the centroid that is closest to xi; and j is the position

(index) of the j-th centroid.

Your task is to complete the code in function find closest centroids. This

function takes the data matrix samples and the locations of all centroids inside

centroids and should output a one-dimensional array of clusters that holds the

index (a value in（1,…,K）; where K is total number of centroids) of the closest

centroid to every training example. You can implement this using a loop over

every training example and every centroid. Once you have completed the code

in find closest centroids, you can run it and you should see the output [0; 2; 1; ]

corresponding to the centroid assignments for the first 3 examples.

Please take a look at Figure 1 to gain an understanding of the distribution

of the data. It is two dimentional, with x1 and x2.