# 数据分析代写｜Data 102: Data, Inference, and Decisions Final Project

本次美国代写是一个数据分析的project

Here are some examples of research questions on hypothetical datasets. Note that all of them

use either Bayesian hierarchical modeling or causal inference, all of them use at least two of the

techniques described, and all of them answer at least two research questions.

• If you were looking at a dataset of Data 102 students, you might choose as your research

questions (1) does attending office hours cause an improvement in homework grades (causal

inference), and (2) Can we fit a Bayesian Gaussian mixture model to the distributions of

assignment grades by student year (Bayesian hierarchical modeling).

• If you were looking at a dataset involving jellybean consumption, acne, and other demo

graphics, you might choose as your research questions (1) does consuming different colors of

jellybeans cause acne (causal inference and multiple hypothesis testing), and (2) predicting

jelly bean consumption from personal demographics, using negative binomial regression and

random forests (prediction with GLMs and nonparametric methods).

• If you were looking at a dataset involving characters in a TV show (lines of dialogue, gender,

age, etc.), you might choose as your research questions (1) how well does character demographic

information predict lines of dialogue for each season, using a Bayesian GLM and nonparametric

methods (Bayesian hierarchical modeling and prediction with GLMs and nonparametrics);

and (2) for each season of the show, is there a significant association between gender and lines

spoken (multiple hypothesis testing).

Your report should include each of the following sections, and address the listed questions at min

imum. You should include additional, relevant discussion to each section that is specific to the

features of your dataset.

Depending on your research questions, you should choose at least two of the corresponding sections

for options A through D.

• How were your data generated? Is it a sample or census?

• If you chose to use your own data, describe the data source and download process.

• If you chose to add additional data sources, explain why.

• If your data represents a sample:

{ Compare the distribution of one of your variables to what is expected in the population.

For example, if your data has an age variable, compare it to the age structure of the

population.

∗ Do you notice any differences?

∗ How does this affect the generalizability of your results?

• If your data represents a census:

{ Are there any groups that were systematically excluded from your data?

• To what extent were participants aware of the collection/use of this data?

• What is the granularity of your data? What does each row represent? How will that impact

the interpretation of your findings?

• Are any of the following concerns relevant in the context of your data?

{ Selection bias

{ Measurement error

{ Convenience sampling

• Are there important features/columns that you wish you had, but are unavailable? What are

they and what questions would they help you answer?

• Your research questions should involve using the methods mentioned above (i.e., the ones you

learned in Data 102) to answer them. For each research question, describe:

{ What is your first research question? What real-world decision(s) could be made by

answering it?

{ Explain why the method you will use is a good fit for the question (for example, if you

choose causal inference, you should explain why causal inference is a good fit for answering

your research question).