S2,2019 – Python辅导 | Introduction to Statistical Machine Learning A1
本次Python辅导机器学习主要是SVM相关的使用
Introduction to Statistical Machine Learning Semester 2, 2019 The University of Adelaide
Assignment 1
DUE 11:00 PM WED, 4 SEPT. 2019
Instructions and submission guidelines:
• The assignment consists of a report and a matlab (or Python) implementation of binary class linear Support Vector Machines (SVMs).
• Explain the key points in the report.
• Make sure that your writing is legible and clear, and the mathematical symbols are consistent.
• Make sure that your Matlab or Python code is well commented and can be executed directly.
• You should sign an assessment declaration coversheet to submit with your assignment. The
assessment declaration coversheet is included in the zip file.
• Submit your report and all your code and report via Canvas on the course web page.
Reading
We have briefly covered soft margin binary SVMs in Lecture 3. Please read the SVM tutorial [1] and the guide [3] for more details to complete the assignment.
Report
Please write down your understanding of binary class linear SVMs (within 3 pages) and experiment comparison of your code and an existing implementation of SVMs, libsvm [2] (within 3 pages) in the report. So in total, you have at most 8 pages for the report. This is no strict format of the report (rather than the page limit). The purpose of the report is to show what you have understood about SVMs and what you have done to make your code correct.
The report should at least cover the following key points (not limited to) :
• The primal form and its dual form for both hard margin and soft margin case;
• Concept of support vectors;
• Why max margin is good;
• Concepts of generalisation/test error;
• Experimental results including comparison between your implemetaiton and libsvm.
Code
• Please implement soft margin binary class linear SVMs by
– solving the primal problem. – solving the dual problem.
• You are encouraged to use the matlab optimisation tool cvx http://cvxr.com/cvx/ or Python tool https://cvxopt.org/ to solve the above two optimisation problems (the primal and dual)
• You are not allowed to directly call SVM API in matlab or python, apart from calling libsvm for comparison (Note that the formulations in the libsvm and yours might be slightly different, thus you will need to figure out the difference and correspondence.)
• The data for training and testing are included. Please check the README file inside the zip file. You need to run your SVM using this provided dataset
Marking criteria
Total score has 100 points with the following breakdown:
1. Define variables before you use them [5 points]
2. Primal and dual forms of hard and soft margin SVMs [20 points]
3. Concept of support vectors (two types) [10 points]
4. Discussions on why max margin is good [5 points]
5. Experiments
(a) compare your w, b obtained via solving the primal problem, with the w, b reconstructed by the dual variables obtained via solving the dual problem (both the results and the reconstruction formulation) [5 points]
(b) check duality gap of yours (both the result and the formulation) [5 points] (c) compare your w, b, α with those of libsvm [5 points]
(d) compare training and testing errors of your code and libsvm [5 points]
(e) code [40 points]
Please note that all responses/answers to all above checkpoints should be included in the report, not in the code.
References
[1] Christopher J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998.
[2] C.C. Chang and C.J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[3] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. A Practical Guide to Support Vector Classification, 2010.