R语言辅导 | SIT718 Real World Analytics Assessment Task 3


SIT718 Real World Analytics Assessment Task 3: Problem Solving
ASSESSMENT DETAILS Submission details
• No more than 7 A4 sides, including Figures, Tables, Appendices and References. The report should be typed. Use minimal font 11pt and 2.5cm side margins. If the page limit is exceeded only the first 7 pages will be marked.
• Assignment (a report in pdf format, software code and/or data) must be submitted via the assignment folder in the unit site (accessed via the unit Program page)
• No e-mail or hardcopy submissions are accepted.
Extension requests
Using aggregation functions for data analysis Download SIT718_Assessment-Task_3-T2_2019-dataandscript.zip
• it contains the data file [ Energy19.txt ] and
• the R code [ AggWaFit718.R ] to use with the following tasks, include these in your
R working directory.
Total Marks 100, Weighting 30% Energy Prediction of Domestic Appliances Dataset
The given dataset, “Energy19.txt”, can be used to create models of energy use of appliances in a energy-efficient house. The dataset provides the Energy use of appliances (denoted as Y) using 671 samples. It is a modified version of data used in the studyi. The dataset includes 5 variables, denoted as X1, X2, X3, X4, X5, and Y, described as follows:
X1: Temperature in kitchen area, in Celsius
X2: Humidity in kitchen area, given as a percentage
X3: Temperature outside (from weather station), in Celsius
X4: Humidity outside (from weather station), given as a percentage
X5: Visibility (from weather station), in km
Y: Energy use of appliances, in Wh
Assignment Tasks
1. Understand the data [20 marks]
(i) Use the txt file (Energy19.txt) file (you downloaded now from FutureLearn) and add it to your R working directory.
(ii) Assign the data to a matrix, e.g. using:
the.data <- as.matrix(read.table(“Energy19.txt “))
© Copyright Deakin University 3

(iii) The variable of interest is Energy use of appliances (Y). To investigate Y, generate a subset of 300 data, e.g. using:
my.data <- the.data[sample(1:671,300),c(1:6)]
(iv) Using scatter plots and histograms, report on the general relationship between each of the variables X1, X2, X3, X4, X5 and the variable of interest Y. Include 5 scatter plots, 6 histograms, and 1 or 2 sentences for each of the variables, including the variable of interest Y.
2. Transform the data
[10 marks]
(i) Choose any four from the five variables (X1, X2,… ,X5). Make appropriate transformations to the chosen four variables and the variable of interest Y so that the values can be aggregated in order to predict the variable of interest. Assign your transformed data along with your transformed variable of interest to an array (it should be 300 rows and 5 columns). Save it to a txt file titled “name- transformed.txt” using:
where “name” is replaced with your name – you can use your surname or first name.
(ii) Briefly explain the transformations applied for the selected four variables and the variable of interest. (1- 2 sentences each)
3. Build models and investigate the importance of each variable [30 marks]
(iii) (iv)
Use the AggWaFit718.R file (you downloaded from Future Learn) and add it to your working directory and load into the R workspace using, source(“AggWaFit718.R”)
Use the fitting functions to learn the parameters for
• A weighted arithmetic mean (WAM)
• Weighted power means (WPM) with p = 0.5, and p = 2,
• An ordered weighted averaging function (OWA), and
• A Choquet integral.
Include two tables in your report – one with the error measures and correlation coefficients, and one summarising the weights/parameters and any other useful information learned for your data.
Compare and interpret the data in your tables. Comment on
a. How good the model is?
b. The importance of each of the variables (the four variables that you have selected),
c. Any interaction between any of those variables (are they complementary or redundant?) and
d. Better models favour higher or lower inputs. (1-3 paragraphs for part 3(iv))
© Copyright Deakin University 4

4. Use your model for prediction [20 marks]
(i) Choose your best fitting model. Using your best fitting model, predict the Energy use of appliances for the following input X1=18; X2=44; X3=4; X4=74.8; X5=31.4.
(ii) Give your result and comment on whether you think it is reasonable. (1-2 sentences).
(iii) Comment on the best conditions (in terms of your chosen four variables) under
which a low Energy use of appliances will occur. (1-2 sentences).
5. Comparing with a linear regression model [20 marks]
Linear regression is used to predict the value of an outcome variable Y based on one or more input predictor variables X. The equation is ? = ?? + ???? + ???? + ⋯ ???? + ?.
The built-in function lm() is used to fit linear models in R.
(i) Build your linear model using the same dataset in Question 3 and describe the
summary statistics for your model using the function summary().
(ii) Compare the performance of the linear model you got with your best fitting model in Question 4. Visualise the predicted Y values of both models on the 300 data and compare them with the true Y values.
(iii) Give your comment on the differences between the linear model and your best fitting model. (2-4 sentences).
Submit to SIT718 CloudDeakin.
Your final submission must include the following three files (your assignment will not be
assessed if we cannot reproduce your results with your R code and the data file): 1. “name-report.pdf”
• •
2. 3.
A report, in pdf format (created in any word processor), covering all of the items in above (where “name” is replaced with your name -you can use your surname or first name). With plots and tables, it should be up to 8 pages.
A data file named – just to help us distinguish them!).
The R code file (that you have written to produce your results) named
(where “name” is replaced with your name – you can use your surname or first name).
Download SIT718_Assessment-Task_3-T2_2019-dataandscript.zip
it contains the data file [Energy19.txt ] and
the R code [AggWaFit718.R ] to use with these tasks, include these in your R working directory.
i Luis M. Candanedo, Veronique Feldheim, Dominique Deramaix. Data driven prediction
models of energy use of appliances in a low-energy house, Energy and Buildings, Volume 140, 1 April 2017, Pages 81-97, ISSN 0378-7788. http://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction
© Copyright Deakin University 5