CNN vs. The Onion
CNN vs The Onion
CNN (the Cable News Network) is widely considered a reputable news source. The Onion, on the other hand, is (according to Wikipedia) “an American news satire organization. It is an entertainment newspaper and a website featuring satirical articles reporting on international, national, and local news.” Another way of putting it - The Onion is “fake news” for entertainment purposes.
In this exercise you will assess your ability to determine real news stories published on cnn.com from fake news stories published on theonion.com.
Each of you will take a quiz consisting of 15 questions. Each question has the same possible answers: CNN or The Onion.
Learning Objectives
- Generate Binomial data
- Calculate the probability mass function and cumulative distribution function using
dbinom
andpbinom
- Generate random data from the Binomial distribution using
rbinom
Pre-Activity:
Introducing dbinom
, pbinom
, and rbinom
.
Bernoulli\((p)\) and Binomial\((n,p)\)
dbinom
computes the probability mass function (pmf) of \(X\), \(f(k) = P(X = k)\), for \(k = 0, 1, \ldots, n\).
- Arguments:
x
: the value of \(k\) in \(f(k)\)size
: the parameter \(n\), the number of trialsprob
: the parameter \(p\), the probability of success
pbinom
computes the cumulative distribution function (cdf) \(F(x) = P(X \le x)\)
- Arguments:
q
: the value of \(x\) in \(F(x)\)size
: the parameter \(n\), the number of trialsprob
: the parameter \(p\), the probability of success
rbinom
generates a sample (random numbers) from the Binomial\((n,p)\) distribution.
- Arguments:
n
: the sample sizesize
: the parameter \(n\), the number of trialsprob
: the parameter \(p\), the probability of success
Explore the binomial distribution in the following exercises. What do you observe? When is the binomial distribution skewed vs symmetric?
- What happens if you change the size of n?
- What happens if you change prob to 0.05?
- What happens if you change the prob to 0.95?
- Explore some other parameters of your choosing using the widget below:
You can also access the widget through Carnegie Mellon University’s Integrated Statistics Learning Environment.
Activity
Let X = the number of questions you answer correctly on the CNN vs. The Onion quiz.
- Does X follow a binomial distribution? What assumptions must you make if you want to treat X as a binomial count?
- The number of observations n is fixed.
- Each observation is independent.
- Each observation represents one of two outcomes: “success” or “failure”.
- The probability of “success” p is the same for each outcome.
Let p denote the proportion of questions that you answer correctly. Make a prediction – what do you think your value of p is?
Take the Quiz. Make sure you record the number that you answer correctly.
Each of you will take a quiz consisting of 15 questions. Each question has the same possible answers: CNN or The Onion. You can take the quiz through our google form:
Suppose that instead of thinking about each question and answering to the best of your ability, you randomly guessed answers (e.g. you flipped a coin – heads = CNN, tails = The Onion). Under this scenario, what would you expect p to be? What would you expect X to be?
Suppose you chose to take the quiz using the ‘random guessing’ strategy. A passing grade on most exams is scoring above the 60th percentile. Using this rubric, you would need to answer 9 questions correctly in order to pass. Use the binomial formula to calculate the probability of getting exactly 9 questions right.
Use the appropriate function in R to calculate the probability of getting exactly 9 questions right.
- Assuming ‘random guessing’, what is the probability of getting at least 12 questions right? What is the probability of getting more than half (i.e 8 or more) right? Write your code in R
- Using data from everyone in class, create an appropriate figure that displays the distribution of quiz scores. If everyone simply guessed at random, the average score of the class should be about 7.5. Is it? Fill in the blanks below
- What percent of your classmates scored 12 or more? 8 or more? How do these proportions compare to the probabilities you calculated earlier? What does this tell you about your ability to correctly distinguish between headlines from CNN and The Onion?