# 数据分析代写｜Data 102: Data, Inference, and Decisions Final Project

本次美国代写是Python数据分析可视化的一个Project

In this project, you will complete a guided analysis for a dataset of your choice. We have curated a

list of suggested datasets, but you are welcome to select an external dataset. Your analysis should

include the following steps:

1. Data Overview Introduce your dataset and describe the process under which it was gener

ated.

2. EDA Perform exploratory data analysis (EDA) and describe key features of your dataset. You

may want to complete this step before deciding on a research question below.

3. Research Questions List two research questions that you will explore in this project. Be

tween the two research questions, you should use at least two of the following four techniques

that you’ve learned this semester. You may use more than two techniques, and you may use

more than one technique for any particular question.

• Binary decision-making and hypothesis testing,

• Bayesian hierarchical modeling,

• Causal inference,

• Comparing generalized linear models (GLMs) to nonparametric methods.

At least one of your techniques should be either Bayesian hierarchical modeling or causal

inference. Please see Section 1 for examples and clarification.

4. Inference and Decisions Apply the two techniques you chose above to answer your research

questions, explaining your choices.

5. Conclusion Highlight key findings, identify potential next steps, and assess the strengths and

limitations of your analysis.

**1 Research Question Examples**

Here are some examples of research questions on hypothetical datasets. Note that all of them

use either Bayesian hierarchical modeling or causal inference, all of them use at least two of the

techniques described, and all of them answer at least two research questions.

• If you were looking at a dataset of Data 102 students, you might choose as your research

questions (1) does attending office hours cause an improvement in homework grades (causal

inference), and (2) Can we fit a Bayesian Gaussian mixture model to the distributions of

assignment grades by student year (Bayesian hierarchical modeling).

• If you were looking at a dataset involving jellybean consumption, acne, and other demo

graphics, you might choose as your research questions (1) does consuming different colors of

jellybeans cause acne (causal inference and multiple hypothesis testing), and (2) predicting

jelly bean consumption from personal demographics, using negative binomial regression and

random forests (prediction with GLMs and nonparametric methods).

• If you were looking at a dataset involving characters in a TV show (lines of dialogue, gender,

age, etc.), you might choose as your research questions (1) how well does character demographic

information predict lines of dialogue for each season, using a Bayesian GLM and nonparametric

methods (Bayesian hierarchical modeling and prediction with GLMs and nonparametrics);

and (2) for each season of the show, is there a significant association between gender and lines

spoken (multiple hypothesis testing).

**2 Section Guidelines**

Your report should include each of the following sections, and address the listed questions at min

imum. You should include additional, relevant discussion to each section that is specific to the

features of your dataset.

Depending on your research questions, you should choose at least two of the corresponding sections

for options A through D.

**2.1 Data Overview**

• How were your data generated? Is it a sample or census?

• If you chose to use your own data, describe the data source and download process.

• If you chose to add additional data sources, explain why.

• If your data represents a sample:

{ Compare the distribution of one of your variables to what is expected in the population.

For example, if your data has an age variable, compare it to the age structure of the

population.

∗ Do you notice any differences?

∗ How does this affect the generalizability of your results?

• If your data represents a census:

{ Are there any groups that were systematically excluded from your data?