2023年12月3日

机器学习代写 | Naive Bayes Classifier

In this assignment you will implement the Naive Bayes Classifier. Before starting this assignment, make sure you understand the concepts discussed in the videos in Week 2 about Naive Bayes. You can also find it useful to read Chapter 1 of the textbook.

Also, make sure that you are familiar with the numpy.ndarray class of python’s numpy library and that you are able to answer the following questions:

Let’s assume a is a numpy array.

You can answer all of these questions by

The UC Irvine machine learning data repository hosts a famous dataset, the Pima Indians dataset, on whether a patient has diabetes originally owned by the National Institute of Diabetes and Digestive and Kidney Diseases and donated by Vincent Sigillito. You can find it at https://www.kaggle.com/uciml/pima-indians-diabetes-database/data. This data has a set of attributes of patients, and a categorical variable telling whether the patient is diabetic or not. For several attributes in this data set, a value of 0 may indicate a missing value of the variable. It has a total of 768 data-points.

Report the accuracy of the classifier on the 20% evaluation data, where accuracy is the number of correct predictions as a fraction of total predictions.

Report the accuracy of the classifier on the held out 20%

The UC Irvine’s Machine Learning Data Repository Department hosts a Kaggle Competition with famous collection of data on whether a patient has diabetes (the Pima Indians dataset), originally owned by the National Institute of Diabetes and Digestive and Kidney Diseases and donated by Vincent Sigillito.

You can find this data at https://www.kaggle.com/uciml/pima-indians-diabetes-database/data. The Kaggle website offers valuable visualizations of the original data dimensions in its dashboard. It is quite insightful to take the time and make sense of the data using their dashboard before applying any method to the data.

First, we will shuffle the data completely, and forget about the order in the original csv file.

Some of the columns exhibit missing values. We will use a Naive Bayes Classifier later that will treat such missing values in a special way. To be specific, for attribute 3 (Diastolic blood pressure), attribute 4 (Triceps skin fold thickness), attribute 6 (Body mass index), and attribute 8 (Age), we should regard a value of 0 as a missing value.

Therefore, we will be creating the train_featues_with_nans and eval_features_with_nans numpy arrays to be just like their train_features and eval_features counter-parts, but with the zero-values in such columns replaced with nans.

Consider a single sample $(x, y)$ , where the feature vector is denoted with $x$ , and the label is denoted with $y$ . We will also denote the $j^{t h}$ feature of $x$ with $x^{(j)}$ .

According to the textbook, the Naive Bayes Classifier uses the following decision rule:

“Choose $y$ such that

is the largest”

However, we first need to define the probabilistic models of the prior $p (y)$ and the class-conditional feature distributions $p (x^{(j)} | y)$ using the training data.

Write a function log_prior that takes a numpy array train_labels as input, and outputs the following vector as a column numpy array (i.e., with shape $(2, 1)$ ).

Try and avoid the utilization of loops as much as possible. No loops are necessary.

Hint: Make sure all the array shapes are what you need and expect. You can reshape any numpy array without any tangible computational over-head.

程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail:vipdue@outlook.com 微信:vipnxx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

机器学习辅导

C语言代写｜APS 105 Lab #8: Reversi Game-Playing Program Verilog代写 | CMPEN 331 – Computer Organization and Design