代码代写|FIT2086 Assignment 3

这是一篇来自澳洲的需要用代码解决一些问题的作业3代码代写

Question 1 (8 marks)

This question will require you to analyse a regression dataset. In particular, you will be looking at predicting the fuel effiffifficiency of a car (in kilometers per litre) based on characteristics of the car and its engine. This is clearly an important and useful problem. The dataset fuel.ass3.2022.csv contains n = 500 observations on p = 9 predictors obtained from actual fuel effiffifficiency tables for car models available for sale during the years 2017 through to 2020. The target is the fuel effiffifficiency of the car measured in kilometers per litre. The higher this score, the better the fuel effiffifficiency of the car. The data dictionary for this dataset is given in Table 1. Provide working/R code/justififications for each of these questions as required.

[2 marks]

(a) Use your BIC model to predict the mean fuel effiffifficiency for this new car. Provide a 95% confifidence interval for this prediction. [1 mark]

(b) The current car that you own has a mean fuel effiffifficiency of 11km/l (measured over the life time of your ownership). Does your model suggest that the new car will have better fuel effiffifficiency than your current car? [1 mark]

Variable name Description Values

Model.Year

Year of sale

2017 2020

Eng.Displacement

Engine Displacement (litres, l)

0.9 8.4

No.Cylinders

Number of Cylinders

3 16

Aspiration

Engine Aspiration (Oxygen intake)

N: Naturally

OT: Other

SC: Supercharged

TC: Turbocharged

TS: Turbo+supercharged

Number of Gears

1 10

Lockup.Torque.Converter

Lockup torque converter present?

Nand Y

Drive.Sys

Drive System

4: 4-wheel drive

A:All-wheel

F:Front-wheel

P:Part-time 4-wheel

R:Rear-wheel

Max.Ethanol

Maximum % of Ethanol allowed

10 85

Fuel.Type

Type of Fuel

G: Regular Unleaded

GM: Mid-grade Unleaded Recommended

GP: Premium Unleaded Recommended

GPR: Premium Unleaded Required

Fuel Effiffifficiency (km/l)

4.974 26.224

Table 1: Fuel effiffifficiency data dictionary. The denotes the reference category for each categorical variable.

Question 2 (18 marks)

In this question we will analyse the data in heart.train.ass3.2022.csv. In this dataset, each observation represents a patient at a hospital that reported showing signs of possible heart disease.

The outcome is presence of heart disease (HD), or not, so this is a classifification problem. The predictors are summarised in Table 2. We are interested in learning a model that can predict heart disease from these measurements. To answer this question you must:

When answering this question, you must use the rpart package that we used in Studio 9. The wrapper function for learning a tree using cross-validation that we used in Studio 9 is contained in the fifile wrappers.R. Don’t forget to source this fifile to get access to the function.

However, if you examine the tree structure in its textural representation on the console, you can determine the probabilities of having heart disease (see Question 2.3 from Studio 9 as a guide) in each leaf (terminal node). Take a screen-capture of the plot of the tree (don’t forget to use the “zoom” button to get a larger image) or save it as an image using the “Export” button in R Studio.

Then, use the information from the textual representation of the tree available at the console and annotate the tree in your favourite image editing software; next to all the leaves in the tree,add text giving the probability of contracting heart disease. Include this annotated image in your report fifile. [1 mark]

Contrast and compare the two models in terms of the various prediction statistics? Does one seem better than the other? Justify your answer. [2 marks]

(a) the tree model found using cross-validation; and

(b) the step-wise logistic regression model.

How do the predicted odds for the two models compare? [2 marks]

Using these intervals, do you think there is any evidence to suggest that there is a real difference in the population odds of having heart disease between these two individuals? [2 marks]