数据分析代写 | ITEC3040 Introduction to Data Analytics Assignment ] 2

这个作业是修改基本决策树算法以考虑计数 每个广义数据元组(即每个行条目)的数量

ITEC3040 Introduction to Data Analytics
Assignment ] 2

1. Textbook, page 387, 8.7
(a) How would you modify the basic decision tree algorithm to take into consideration the count
of each generalized data tuple (i.e., of each row entry)?
(b) Use your algorithm to construct a decision tree from the given data.
(c) Given a data tuple having the values “systems”, “26. . . 30”, and “46–50K” for the attributes
department, age, and salary, respectively, what would decision tree classification of the status
for the tuple be?
(d) Construct the Na¨ıve Bayesian Classifier and do part c).
2. Given data set ‘SampledSeeds.csv’ sampled from the dataset ‘seeds Data Set’ and the attribute
information can be found at here.
(a) Use KNN to clssify the tuple with the values “16.17”, “15.38”, “0.8588”, “5.762”, “3.387”,
“4.286” and “5.703” for the attributes of ‘ area A’, ‘perimeter P’, ‘compactness C’, ‘length of
kernel’, ‘width of kernel’, ‘asymmetry coefficient’ and ‘length of kernel groove’ respectively.
What would the class attribute value for this tuple be? When K=3, 5, 7 and Euclidean
distance is in use.
(b) Use min max normalization method to normalized each attribute value into [0, 1] then redo
part a). Is there difference in classification? Why?