代码代写|CISC/CMPE 452/COGS 400 Assignment 3 – Unsupervised Learning (10 points)

这是一篇来自加拿大的关于使用Numpy来构建模型的代码代写

Files need to be uploaded for this assignment: A3.ipynb,

output.wav, and output.csv

Part 1 Principle Component Analysis Network (5

points)

The dataset “data/sound.csv” contains two sounds recorded by the two microphones. The goal of this assignment is using PCA network to fifind the approximation of the fifirst principal component.

Build a PCA network (refer to Principal Component Analysis slide #22 and #23) to reduce the number of features from 2 to 1 (3 points)

Train the model and generate the processed data (1 point)

Save the data into output.wav and output.csv fifiles (1 point)

Compare the sound_o.wav (audio with noise) and output.wav (audio is denoised)

In [ ]: import numpy as np

import pandas as pd

from scipy.io import wavfile

samrate = 8000

# read csv into array

txtData = np.genfromtxt(‘data/sound.csv’, delimiter=‘,’)

txtData.shape

# save array to WAV file

scaledData = np.int16(txtData * samrate)

wavfile.write(‘data/sound_o.wav’, samrate, scaledData)

# read WAV file into array

# The data in sound.csv is processed

# If you use the data generated here, you need to process the data by adding wavData = wavData / samrate samrate, wavData = wavfile.read(‘data/sound_o.wav’)

samrate, wavData.shape

# save array to csv file

np.savetxt(‘data/sound_o.csv’, txtData, delimiter=‘,’)

# build PCA model and only Numpy can be used

class PCA(object):

def __init__(self, lr, epoch):

def train(self, x, n_components):

# initialize and train the model

# save the data

Part 2 K-Means Clustering Algorithm (5 points)

The dataset is Palmer Archipelago (Antarctica) penguin data

(https://www.kaggle.com/datasets/parulpandey/palmer-archipelago-antarctica-penguin-data) which has 6 features and 1 label called species (Chinstrap, Adélie, or Gentoo)

The dataset is saved in the “data/penguins_size.csv” fifile and preprocessed into x_train,

x_test, y_train, y_test

Build a K-Means clustering algorithm (refer to Unsupervised Learning slide #29) to cluster the preprocessed data (2 points)

Standardize the data and train the model with the training set (1 point)

Evaluate the model and print the confusion matrixes with both training and test sets (2

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

# load the dataset

data = pd.read_csv(‘data/penguins_size.csv’)

data.head()

# data preprocessing

data = data.dropna()

data = data[data[‘sex’] != ‘.’]

cleanup_nums = {“species”: {“Adelie”: 0, “Chinstrap”: 1, “Gentoo”: 2},

“island”: {“Biscoe”: 0, “Dream”: 1, “Torgersen”: 2},

“sex”: {“MALE”: 0.0, “FEMALE”: 1.0}}

data = data.replace(cleanup_nums)

data.head()

x = np.array(data.drop([‘species’], axis=1).copy())

y = np.array(data[‘species’].copy()).astype(int)

# data standardization

x =

# split the dataset

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2

x_train.shape, x_test.shape, y_train.shape, y_test.shape

# calculate the confusion matrix

def evaluator(y, y_pred):

confusion_matrix =

print(‘Confusion matrix:\n’, confusion_matrix)

# setup a baseline model

from sklearn.cluster import KMeans

km = KMeans(n_clusters=3) # n_clusters – the number of clusters

km.fit(x_train)

y_pred = km.predict(x_train)

evaluator(y_train, y_pred)

y_pred = km.predict(x_test)

evaluator(y_test, y_pred)

# build K-means model and only Numpy can be used

class KMeans(object):

def __init__(self):

def train(self, x, y, x_test, y_test, learning_rate, n_iters):

def predict(self, x):

# initialize and train the model

# evaluate the model and print the confusion matrixes for both training