编程代写|5011CEM Big Data Programming Project

这是一篇来自英国的关于大数据编程项目的编程代写

The report is grade out of 150 and contributes 10 credits towards the module. Resit marks are capped at 40%.
For detailed guidance on mark allocation, see the grading scheme below.
This is also available as a separate Excel document on Aula.

Resit Information

Your original submission has been graded and feedback provided. By considering the written feedback, along with the marks for each part you are required to improve your work before re-submitting for the re-sit assessment. For convenience, the details are repeated below.
Please note that work which has not been improved may attract lower marks at the second submission.

Assessment Overview

Over the course of this module you have been introduced to a range of techniques that may be used for programming a big data project. This assessment allows you to pull together these techniques in a realistic scenario to complete a big data analysis project. Below is a realistic project scenario. By using the techniques presented during class you are to carry out the project and write a final project report for your client.
In line with real world projects, where the client has rejected your work and requested improvements, work which is not improved in line with the feedback may be marked lower.

Project Scenario

You have been approached by a client who analyses atmospheric science and climate model data. They have developed a new analysis technique, but it takes too long to run for them to use it. They have asked you to investigate the use of big data techniques to reduce the processing time.

They have a large volume of data to process, and the analysis needs to be repeated frequently. They have the following basic requirements:
1.Current analysis time is approximately 2.5 hours to analyse the climate model output data for a 1-hour time period.
2.The data for a single day of model output is approximately 250MB. However, they have over 100 years’ worth of data to analyse making a total of over 9TB.
3.Each day, they need to analyse the new data set for that day, so they wish to complete the analysis of the data for a 24-hour period (25 data sets) in under 2 hours.
4.It is not possible to hold on this in memory at one time, so the new process should load only 1 hour of data for processing at a time. If parallel processing is to occur, then 1 hour of data per worker can be loaded as needed.

You have been tasked with investigating the use of parallel processing to achieve the analysis speed required, with the following expectations:
1.Test and compare the processing speed of sequential and parallel processing
2.Extrapolate your findings to indicate the number of processors required to achieve the target processing time.
3.Test how your code responds to common errors, e.g. data that is text instead of numeric, use of NaN in the data as an error code.
4.Run automated tests that allow your client to set the tests running and return later to see the results, without user intervention.
The data has been provided by the European Centre for Medium Range Weather Forecasts (ECMWF)

Continued over…

Project Deliverables

Your project should deliver the following:

1.Working code that demonstrates:

a.Loading of only the data required for the processing taking place

b.Sequential processing of the data

c.Parallel processing of the data

d.Plots of the comparisons between sequential processing and parallel processing with different numbers of workers

e.Automated testing of your code to deal with pre-defined data error types.

2.A formal project report for your client covering:
a.Comparisons between parallel and sequential data processing

b.Estimated number of processors required to achieve the goal of processing 24-hours of data in under 2 hours.

c.Testing the code to see how it deals with:
i.Text instead of numeric values
ii.NaN values indicating data errors.
iii.Note: it is not necessary to solve these problems to pass, but you should be able to suggest methods of dealing with these problems so code will not crash.

d.A summary of the evidence generated during your project and how it helps you arrive at your conclusions

e.Recommendations

f.References

g.Appendices containing:
i.Code flow charts
ii.Gannt chart for your project
iii.Logbook
iv.Specification items

3.VIVA / presentation. You will be expected to present your work in a formal presentation / VIVA. Details of this can be found in the VIVA assessment brief.
This assessment brief covers only parts 1 and 2. The assessment brief for part 3, VIVA, is found in a separate document.
Additional Information

1.You will be provided with NetCDF data files:
a.One complete, correct data file
b.One file containing instrument errors, recorded as NaN.
c.One file containing data storage error where the numerical values have been saved as text

2.You are provided with code files for the analysis technique. You should not edit this file in any way. You are required run the analysis, for timing purposes, but are not expected to analyse, display, report on, or deal with the results of the analysis in any way.
Continued over…

3.You are expected to define your project by means of a list of 5 SMART specification items. These should be included in an appendix.

4.You are expected to plan the work required for this project and provide a complete Gannt chart, including identifying the critical path. This should be included in an appendix.

5.This is a formal report and it is expected that appropriate formal grammar and language are to be used. Where this is not the case, a penalty of up to 10% may be applied to the marks for the report structure. For help with formal writing, please contact the Centre for Academic Writing.

1.You are expected to use the Coventry University APA style for referencing. For support and advice on this students can contact Centre for Academic Writing (CAW).

2.Please notify your registry course support team and module leader for disability support.

3.Any student requiring an extension or deferral should follow the university process as outlined here.

4.The University cannot take responsibility for any coursework lost or corrupted on disks, laptops or personal computer. Students should therefore regularly back-up any work and are advised to save it on the University system.

5.If there are technical or performance issues that prevent students submitting coursework through the online coursework submission system on the day of a coursework deadline, an appropriate extension to the coursework submission deadline will be agreed. This extension will normally be 24 hours or the next working day if the deadline falls on a Friday or over the weekend period. This will be communicated via your Module Leader.

6. You are encouraged to check the originality of your work by using the draft Turnitin links on Aula.

7.Collusion between students (where sections of your work are similar to the work submitted by other students in this or previous module cohorts) is taken extremely seriously and will be reported to the academic conduct panel. This applies to both courseworks and exam answers.

8.A marked difference between your writing style, knowledge and skill level demonstrated in class discussion, any test conditions and that demonstrated in a coursework assignment may result in you having to undertake a Viva Voce in order to prove the coursework assignment is entirely your own work.

9.If you make use of the services of a proof reader in your work you must keep your original version and make it available as a demonstration of your written efforts.

10.You must not submit work for assessment that you have already submitted (partially or in full), either for your current course or for another qualification of this university, with the exception of resits, where for the coursework, you maybe asked to rework and improve a previous attempt. This requirement will be specifically detailed in your assignment brief or specific course or module information. Where earlier work by you is citable, i.e. it has already been published/submitted, you must reference it clearly.  Identical pieces of work submitted concurrently may also be considered to be self-plagiarism.
Continued over…