数据分析代写|Data 102: Data, Inference, and Decisions Final Project

本次美国代写是一个数据分析的project

Here are some examples of research questions on hypothetical datasets. Note that all of them
use either Bayesian hierarchical modeling or causal inference, all of them use at least two of the
techniques described, and all of them answer at least two research questions.

• If you were looking at a dataset of Data 102 students, you might choose as your research
questions (1) does attending office hours cause an improvement in homework grades (causal
inference), and (2) Can we fit a Bayesian Gaussian mixture model to the distributions of
assignment grades by student year (Bayesian hierarchical modeling).

• If you were looking at a dataset involving jellybean consumption, acne, and other demo
graphics, you might choose as your research questions (1) does consuming different colors of
jellybeans cause acne (causal inference and multiple hypothesis testing), and (2) predicting
jelly bean consumption from personal demographics, using negative binomial regression and
random forests (prediction with GLMs and nonparametric methods).

• If you were looking at a dataset involving characters in a TV show (lines of dialogue, gender,
age, etc.), you might choose as your research questions (1) how well does character demographic
information predict lines of dialogue for each season, using a Bayesian GLM and nonparametric
methods (Bayesian hierarchical modeling and prediction with GLMs and nonparametrics);
and (2) for each season of the show, is there a significant association between gender and lines
spoken (multiple hypothesis testing).

Your report should include each of the following sections, and address the listed questions at min
imum. You should include additional, relevant discussion to each section that is specific to the
features of your dataset.

Depending on your research questions, you should choose at least two of the corresponding sections
for options A through D.

• How were your data generated? Is it a sample or census?
• If you chose to use your own data, describe the data source and download process.
• If you chose to add additional data sources, explain why.
• If your data represents a sample:

{ Compare the distribution of one of your variables to what is expected in the population.

For example, if your data has an age variable, compare it to the age structure of the
population.

∗ Do you notice any differences?
∗ How does this affect the generalizability of your results?

• If your data represents a census:
{ Are there any groups that were systematically excluded from your data?

• To what extent were participants aware of the collection/use of this data?

• What is the granularity of your data? What does each row represent? How will that impact
the interpretation of your findings?

• Are any of the following concerns relevant in the context of your data?

{ Selection bias
{ Measurement error
{ Convenience sampling

• Are there important features/columns that you wish you had, but are unavailable? What are
they and what questions would they help you answer?

• Your research questions should involve using the methods mentioned above (i.e., the ones you
learned in Data 102) to answer them. For each research question, describe:

{ What is your first research question? What real-world decision(s) could be made by
answering it?

{ Explain why the method you will use is a good fit for the question (for example, if you
choose causal inference, you should explain why causal inference is a good fit for answering
your research question).