CMPSC 448: Machine Learning and AI HW1

Instruction

CMPSC 448: Machine Learning and AI

Homework 1 (Due 02/14/2021 11:59 PM)

Problem 1 [20 points] In this problem, you are given two matrices A, B ∈ R2×2 and a vector x ∈ R2 􏰂1 2􏰃 􏰂1 2􏰃 􏰂2􏰃

A=24,B=34,x=1

Problem 2 [10 points] For this problem, we use the following notation for random variables: • X ∼ N(μ,σ2): X is a Gaussian random variable with mean μ and variance σ2
• X ∼ Bern(p): X is a {0, 1}-valued Bernoulli random variable with expectation p.
• E[X]: the expected value of random variable X

(a) If X ∼ N (1, 2), then what is E[X]? What is (E[X])2 − E[X2]? 1

Problem 3 [5 points] What is the rank of the following matrix and why?

1 2 1 1 0 3 112

Problem 4 [5 points] Use either numpy.linalg or scipy.linalg to find the eigendecomposition of the following matrix:

3 1 1 X=2 4 2

−1 −1 1 Problem5[5points]Forthefunctionf(x)=ln􏰀1+e−2x􏰁,whatisitderivativef′(x)=df(x) =?.

dx
Problem 6 [10 points] Let x ∈ Rd be a vector in d dimensional space and define the vector valued function

f : Rd → R by
where A ∈ Rd×d is a symmetric matrix and b ∈ Rd is a fixed vector. Using the definition of gradient show

f(x)= 12x⊤Ax+b⊤x, ∇f(x) = Ax + b

that
Problem 7 [5 points]

(a) Whatisthemaximizerofg:[−4,4]→Rgivenbyg(x)=1×3−1×2−6x+27? 222

(b) What is 􏰅 1 g(x)dx for g defined above? 0

Exploratory Data Analysis with pandas

Problem 8 [40 points] The goal of this problem is to do basic data analysis on a simple data set using pandas package in Python (no machine learning for now). As it has been emphasized in the lectures, we need to have a good understanding of data before training a machine learning model. In this assignment, you are asked to analyze the UCI Adult data set. The Adult data set is a standard machine learning data set that contains demographic information about the US residents. This data was extracted from the census bureau database found at: http://www.census.gov/ftp/pub/DES/www/welcome.html. The data set contains 32561 instances and 15 features (please check the notebook for possible values of each feature) with different types (categorical and continuous).

The data is provided as a csv file and can be loaded into panda’s DataFrame object as shown: data = pd.read_csv(‘adult.data.csv’)