The Data Scientist's Toolbox Quiz 3 (JHU) Coursera

github repo for rest of specialization: Data Science Coursera

Question 1

We take a random sample of individuals in a population and identify whether they smoke and if they have cancer. We observe that there is a strong relationship between whether a person in the sample smoked or not and whether they have lung cancer. We claim that the smoking is related to lung cancer in the larger population. We explain we think that the reason for this relationship is because cigarette smoke contains known carcinogens such as arsenic and benzene, which make cells in the lungs become cancerous.

This is an example of a causal data analysis.
This is an example of an inferential data analysis.
This is an example of an descriptive data analysis.
This is an example of a predictive data analysis.

Answer:
This is an example of an inferential data analysis.

Question 2

What is the most important thing in Data Science?

The data.
Hacking skills.
Knowing Hadoop and Pig.
Working with large data sets.
The question you are trying to answer.

Answer:
The question you are trying to answer.

Question 3

If the goal of a study was to relate Martha Stewart Living Subscribers to Our Site's Users based on the number of people that lived in each region of the US, what would be the potential problem?

There would be confounding because the number of people that live in an area is related to both Martha Stewart Living Subscribers and Our Site's Users.
We couldn't be sure whether subscribing to Martha Steward Living causes people to be Users of Our Site or the other way around.
We would be performing inference on the relationship between Martha Stewart Living Subscribers and Our Site's Users.
We wouldn't know the sensitivity of our predictions.

Answer:
There would be confounding because the number of people that live in an area is related to both Martha Stewart Living Subscribers and Our Site's Users.

Question 4

What is an experimental design tool that can be used to address variables that may be confounders at the design phase of an experiment?

Using regression models.
Fixing variables.
Data cleaning.
Using data from a database.

Possible Answers:
Fixing variables.
Stratifying variables.

Question 5

What is the reason behind the explosion of interest in big data?

We recently discovered ways to use data to answer scientific and business questions.
We recently discovered ways to use data to make predictions.
We have better experimental design now than previously.
The price and difficulty of collecting and storing data has dramatically dropped.

Answer:
The price and difficulty of collecting and storing data has dramatically dropped.

mGalarnyk/quiz3.md