# 机器统计学习代写｜Assignment 2 COMP9418 – Advanced Topics in Statistical Machine Learning

Description

In this assignment, you will write a program that plays the part of a “smart building”. This program will receive a real-time stream of sensor data and use this data to decide whether to turn on the lights in each room. Your goal is to minimise the cost of lighting in the building while also ensuring that the lights stay on if there are people in a room. Every 15 seconds, you will receive a new data point and have to decide whether each light should be turned on or off. There are several types of sensors in the building, each with different reliability and data output. You will receive two files called data1.csv and data2.csv containing two days of complete data with all sensor values and the number of people in each room.

You can approach this assignment in many different ways. We will not be giving any guidance on what algorithms are most appropriate.

Your solution must include a Probabilistic Graphical Model as the core component. This core component should be your own implementation and can include any tutorial code, but it cannot simply be a sklearn implementation. You can use any algorithm as part of your approach, including any algorithm available in Python’s sklearn library.

It is recommended you start this assignment by brainstorming several different possible approaches. Make sure you understand what information is available, what information is uncertain, and what assumptions it may be reasonable to make.

Every area on the floor plan is named with a string: r, c, or outside. The strings r and c stand for room and corridor, respectively.

Data

The files data1.csv and data2.csv contain 2 different complete data that is representative of a typical weekday in the office building. This data includes the output of each sensor as well as the true number of people in each room. This data was generated using a simulation of the building, and your program will be tested against many days of data generated by the same simulation. Because this data would be expensive to collect, you are only given two sets of 2400 complete data points, from two workdays. The simulation attempts to be a realistic approximation, so it includes many different types of noise and bias. You should treat this project as if the data came from a real office building, and is to be tested on real data from that building. You can make any assumptions that you think would be reasonable in the real world, and you should describe all assumptions in the report. Part of your mark will be determined by the feasibility of your assumptions, if applied to the real world.

The number of people who come to the office each day varies according to this distribution:

num_people = round(Normal(mean=20, stddev=3)).

This information was obtained from records of the number of workers present each day, and the empirical distribution of num_people was found to be identical to the above distribution.

Data format specification

Sensor data

Your submission file must contain a function called get_action(sensor_data), which receives sensor data in the following format:

sensor_data = {‘Motion_Sensor1’: ‘motion’, ‘Motion_Sensor2’: ‘motion’,

‘Motion_Sensor3’: ‘motion’, ‘Motion_Sensor4’: ‘motion’, ‘Motion_Sensor5’: ‘motion’,

‘Motion_Sensor6’: ‘motion’, ‘Motion_Sensor7’: ‘motion’, ‘Motion_Sensor8’: ‘motion’,

‘Motion_Sensor9’: ‘motion’, ‘Motion_Sensor10’: ‘motion’, ‘door_sensor1’: 0,

‘door_sensor2’: 0, ‘door_sensor3’: 0, ‘door_sensor4’:0, ‘door_sensor5’:0, ‘door_sensor6’:0,

‘door_sensor7’:0, ‘door_sensor8’:0, ‘door_sensor9’:0, ‘door_sensor10’:0, ‘door_sensor11’:0,

‘Camera_sensor1’: 0, ‘Camera_sensor2’: 0, ‘Camera_sensor3’: 0, ‘Camera_sensor4’: 0,

‘robot1’: (‘r1’, 0), ‘robot2’: (‘r8’, 0), ‘time’: datetime.time(8, 0)}

We have four different kinds of sensors for this building:

motion means the sensor captured a motion in the room, and no motion means the sensor did not capture anything. We know that the performance of these sensors is not very accurate, but these sensors are cheap and accessible.

For example, if the robot goes into r4 and counts 8 people, it would have the value (‘r4’,8). If it goes into corridor ‘c2’ and no one is present, it would have value (‘c2’,0).

If a sensor fails, it will read None. The value of time is a datetime.time object representing the current time.

Datapoints will be provided in 15 second resolution, i.e., your function will be fed data points from 15 second intervals from 8 am – 6 pm.

Training data

Both data1.csv and data2.csv have the same structure. They contain a column for each of the above sensors, as well as columns for each room, which tell you the current number of people in that room. The columns of data files are the following and can be divided into two groups:

Note that the first column of data files is the index, and has no name.

You should use this data to learn the parameters of your model. Also, you can save the parameters to csv files that can be loaded during testing.

Action data

get_action(sensor_data) must return a dictionary with the following format. Note that every numbered room named r in the building has lights that you can turn on or off. All other rooms/corridors have lights that are permanently on, which you have no control over, and which do not affect the cost.

actions_dict = {‘lights1’: ‘off’, ‘lights2’: ‘off’, ‘lights3’: ‘off’,

‘lights4’: ‘off’, ‘lights5’: ‘off’, ‘lights6’: ‘off’, ‘lights7’: ‘off’,

‘lights8’: ‘off’, ‘lights9’: ‘off’, ‘lights10’: ‘off’}

The outcome space of all actions is (on,off).

In the provided example_solution.py, there is an example code stub that shows an example of how to set up your code.

Figure 1 shows the floor plan specification. Please notice a door between room 3 and the outside area (entrance door), but there is no sensor in this door.

Cost specification

If a light is on in a room for 15 seconds, it costs you 1 cent. If there are people in a room and there is no light on, it costs you 4 cents per person every 15 seconds, because of lost productivity. The cost can be calculated exactly using the complete training data, so it is also based on an instantaneous count of the number of people in each room.

Your goal is to minimise the total cost of lighting plus lost productivity, added up over the whole day. You do not need to calculate this cost, the testing code will calculate it using the actions returned by your function,and the true locations of people (unavailable to you). The file example_test.py shows exactly how the cost is calculated.

Testing specification

Your program must be submitted as a python file called solution.py. During testing, solution.py will be placed in a folder with test.py. A simpler version of test.py has been provided (called example_test.py),so you can confirm that testing will work. A more elaborate version of test.py will be used to grade your solution.

There is a strict upper time limit for the final submission. If your submission runs for longer than 1800 seconds for 10 days (180s/day), it will be cut off, and you will receive 0 marks for the programming section of the assignment. You should aim to be much faster than this, to receive efficiency marks.

The file solution.py will be reloaded for each new day (using importlib.reload), so you may do any daily setup in that file outside of the get_action function, and it will still work.

We will run a test evaluation one week before the deadline, and release the cost and time of each students model. This is to help you confirm that your model does not have any major errors, and works with the evaluation system. To participate, make a submission for the assignment before 1 week before the final deadline. We will make an announcement to remind you. To make sure you have something to submit, try to make sure you have at least a minimum working model by this time.

Report

Your report should cover the following points:

The report must be less than 2000 words (around 4 pages of text). The only accepted format is PDF.

Marking Criteria

This assignment will be marked according to the following criteria:

Items 2 and 3 will be assessed using the report. Items 1 and 4 will be assessed using python files.

Please include the code you used for learning your parameters, even if that code is never called during test time and parameters are simply loaded from files.

Bonus Marks

Bonus marks will be given to the top 10 performing programs (10 percentage points for 1st place, 1 percentage point for 10th place). E.g. if you score 98% in the assignment, and come fifth in the final ranking, then you would receive 103% for the assignment.