数据分析代写|Data 102: Data, Inference, and Decisions Final Project

本次美国代写是Python数据分析可视化的一个Project

In this project, you will complete a guided analysis for a dataset of your choice. We have curated a
list of suggested datasets, but you are welcome to select an external dataset. Your analysis should
include the following steps:

1. Data Overview Introduce your dataset and describe the process under which it was gener
ated.

2. EDA Perform exploratory data analysis (EDA) and describe key features of your dataset. You
may want to complete this step before deciding on a research question below.

3. Research Questions List two research questions that you will explore in this project. Be
tween the two research questions, you should use at least two of the following four techniques
that you’ve learned this semester. You may use more than two techniques, and you may use
more than one technique for any particular question.

• Binary decision-making and hypothesis testing,
• Bayesian hierarchical modeling,
• Causal inference,
• Comparing generalized linear models (GLMs) to nonparametric methods.

At least one of your techniques should be either Bayesian hierarchical modeling or causal
inference. Please see Section 1 for examples and clarification.

4. Inference and Decisions Apply the two techniques you chose above to answer your research
questions, explaining your choices.

5. Conclusion Highlight key findings, identify potential next steps, and assess the strengths and
limitations of your analysis.

1 Research Question Examples

Here are some examples of research questions on hypothetical datasets. Note that all of them
use either Bayesian hierarchical modeling or causal inference, all of them use at least two of the
techniques described, and all of them answer at least two research questions.

• If you were looking at a dataset of Data 102 students, you might choose as your research
questions (1) does attending office hours cause an improvement in homework grades (causal
inference), and (2) Can we fit a Bayesian Gaussian mixture model to the distributions of
assignment grades by student year (Bayesian hierarchical modeling).

• If you were looking at a dataset involving jellybean consumption, acne, and other demo
graphics, you might choose as your research questions (1) does consuming different colors of
jellybeans cause acne (causal inference and multiple hypothesis testing), and (2) predicting
jelly bean consumption from personal demographics, using negative binomial regression and
random forests (prediction with GLMs and nonparametric methods).

• If you were looking at a dataset involving characters in a TV show (lines of dialogue, gender,
age, etc.), you might choose as your research questions (1) how well does character demographic
information predict lines of dialogue for each season, using a Bayesian GLM and nonparametric
methods (Bayesian hierarchical modeling and prediction with GLMs and nonparametrics);
and (2) for each season of the show, is there a significant association between gender and lines
spoken (multiple hypothesis testing).

2 Section Guidelines

Your report should include each of the following sections, and address the listed questions at min
imum. You should include additional, relevant discussion to each section that is specific to the
features of your dataset.

Depending on your research questions, you should choose at least two of the corresponding sections
for options A through D.

2.1 Data Overview

• How were your data generated? Is it a sample or census?
• If you chose to use your own data, describe the data source and download process.
• If you chose to add additional data sources, explain why.
• If your data represents a sample:

{ Compare the distribution of one of your variables to what is expected in the population.
For example, if your data has an age variable, compare it to the age structure of the
population.

∗ Do you notice any differences?
∗ How does this affect the generalizability of your results?
• If your data represents a census:

{ Are there any groups that were systematically excluded from your data?