2023年12月3日

数据分析代写｜Project Statement for Milestone 3

本次美国代写是一个数据分析可视化的assignment

In this milestone, you need to finish all required functions for the candidate project you chose
except the visualization part. You need to submit the printed version of your Jupyter notebook.
This notebook should show the code of each required function and the output of the code. This
milestone is worth 10% of your overall grade.

To print your notebook as a PDF, right-click the notebook and choose “print” -> “saveAsPDF”. If
you cannot save it as PDF, you can opt to do screenshots and paste them to a Word document.

Project 1: YouTube Analyzer (10%)

Video search (6%). You must use PySpark for the following functions, no Pandas or other
Python libraries are allowed.

– (2%) Categorized statistics: frequency histogram of videos partitioned by a search
condition: categorization, size of videos, view count, etc. For example, count of videos
per category.

– (2%) top k queries: (1) find top k categories in which the most number of videos are
uploaded; (2) top k most viewed videos;

– (2%) Range queries: (1) find all videos in categories X with duration within a range [t1,
t2]; (2) find all videos with size in range [x,y].

Graph analytics (4%): You must use Spark GraphX or GraphFrame

Hint: you can use PySpark “explode” function to create a proper edge DataFrame for
GraphFrame: https://stackoverflow.com/questions/40099706/splitting-a-row-in-a-pyspark
dataframe-into-multiple-rows

– (2%) Network aggregation: report the following statistics of Youtube video network: (1)
in-degree and out-degree of each video; (2) average degree, maximum and minimum
degree of the video datasets.

– (2%) Top-K Influence analysis: Use PageRank algorithms over the Youtube network to
compute the scores efficiently. Intuitively, a video with high PageRank score means that
the video is related to many videos in the graph, thus has a high influence. Find top k
most influence videos in Youtube network. Choose the initial values and number of
iterations as you see fit.

程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail:vipdue@outlook.com 微信:vipnxx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

数据库代写 | IFB105 Database Management Project – Part B Swift辅导 | CSC 214 Homework Assignment #8