2023年12月3日

Python数据挖掘代写 | CSE 158/258 Fall 2021: Homework 1

本次是数据挖掘和预测分析的一个python代写Homework

Please submit your solution by the beginning of the week 3 lecture (Oct 11). Submissions should be made on gradescope. Please complete homework individually.

This specification includes both questions from the undergraduate (CSE158) and graduate (CSE258) classes.
You are welcome to attempt questions from both classes but will only be graded on those for the class in which you are enrolled.

You will need the following files:

GoodReads Fantasy Reviews :
https://cseweb.ucsd.edu/classes/fa21/cse258-b/data/fantasy_10000.json.gz
Beer Reviews : https://cseweb.ucsd.edu/classes/fa21/cse258-b/data/beer_50000.json The above is a json formatted dataset. Data can be read using the json.loads function in Python, or by using eval.

Code examples : http://cseweb.ucsd.edu/classes/fa21/cse258-b/code/week1.py (regression) and http:
//cseweb.ucsd.edu/classes/fa21/cse258-b/code/week2.py (classification)

Executing the code requires a working install of Python 2.7 or Python 3 with the scipy packages installed.

Please include the code of (the important parts of) your solutions.

First, using the book review data, let’s see whether ratings can be predicted as a function of review length, or by using temporal features associated with a review.

1. (CSE158 only) What is the distribution of ratings and review lengths in the dataset? Report the number of 1-, 2-, 3-star (etc.) ratings, and show the relationship with length (e.g. via a scatterplot) (1 mark).

2. Train a simple predictor that estimates rating from review length, i.e.,
star rating ‘ 0 + 1 [review length in characters]
Report the values 0 and 1, and the Mean Squared Error of your predictor (on the entire dataset) (1 mark).

3. Extend your model to include (in addition to the length) features based on the time of the review. You can parse the time data as follows:

import dateutil.parser
> t = dateutil.parser.parse(d[‘date_added’])
> t.weekday(), t.year # etc.
Using a one-hot encoding for the weekday and year, write down feature vectors for the rst two examples
(1 mark).

4. Train models that

• use the weekday and year values directly as features, i.e.,star rating ‘ 0 + 1 [review length in characters] + 2 [t.weekday()] + 3 [t.year]

• use the one-hot encoding from Question 3.
Report the MSE of each (1 mark).

5. Repeat the above question, but this time split the data into a training and test set. You should split the data randomly into 50%/50% train/test fractions. Report the MSE of each model separately on the training and test sets.

6. (CSE258 only) Show that for a trivial predictor, i.e., y = 0, the best possible value of 0 in terms of the Mean Absolute Error is the median of the label y. Hint: compute the derivative of the model’s MAE and solve for 0

程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail:vipdue@outlook.com 微信:vipnxx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

Python压缩算法代写 | Block Model Compression Algorithm Software Engineering Project Spark代写 | CS435 Introduction to Big Data, Fall 2020