library(tidyverse)
library(bayesrules)
library(gridExtra)
library(googlesheets4)
library(googledrive)
CNN vs. The Onion - Beta Binomial
CNN vs The Onion
CNN (the Cable News Network) is widely considered a reputable news source. The Onion, on the other hand, is (according to Wikipedia) “an American news satire organization. It is an entertainment newspaper and a website featuring satirical articles reporting on international, national, and local news.” Another way of putting it - The Onion is “fake news” for entertainment purposes.
In this exercise you will assess your ability to determine real news stories published on cnn.com from fake news stories published on theonion.com.
Learning Objectives
- Explore the effect of different priors on posteriors
- Introduce the concept of a prior
- Plotting priors and posteriors
- Calculating summary statistics of the prior and posteriors
Packages
Priors
The CNN vs. The Onion quiz consists of 15 questions. Each question has the same possible answers: CNN or The Onion. Before we take the quiz, predict how many headlines you will guess correctly out of 15. You might think about your ability to determine fact from fiction or your familiarity with CNN and The Onion.
Let \(\pi\) be the proportion of correct answers you guess right in the CNN vs the Onion quiz. Keeping that number in mind, let’s explore in the table below, three different priors from three different people \(\pi\)
Good Guesser | Unpredictable Guesser | Poor Guesser |
---|---|---|
Beta(14, 1) | Beta(1, 1) | Beta(5, 10) |
Plotting the Priors
Where does your prediction fall?
When we construct our priors from the Beta distribution, the shape parameters \(\alpha\) and \(\beta\) can be interpreted as the approximate number of successes and the approximate number of failures. In constructing your prior, you can derive your alpha and beta parameters into how many questions out of 15 you expect to get correct: Beta(approx_number_correct, approx_number_wrong)
.
Returning to your own prediction, replace approx_number_correct
and approx_number_wrong
with your predictions.
Looking at the graph of your prior, which guesser is your prior most similar to: Good, Unpredictable or Poor?
Vocabulary
We often describe priors in terms of how much information they give about the unknown variable. Priors are often described as:
Informative prior: An informative prior reflects specific information about the unknown variable with high certainty (i.e. low variability).
Vague (diffuse) prior: A vague or diffuse prior reflects little specific information about the unknown variable. A flat prior, which assigns equal prior plausibility to all possible values of the variable, is a special case.
Activity
Data
Our data with the results from the quiz in a data frame called
cnn_onion
.Based on the observed data, we will update the posterior for our three guessers and our own prior.
Next, we calculate the summary statistics for the prior and posterior for all four priors using the function:
summarize_beta_binomial(alpha, beta, y = NULL, n = NULL)
function summarizes the mean, mode, and variance of the prior and posterior Beta models of \(\pi\)Arguments:
alpha, beta
: positive shape parameters of the prior Beta modely
: number of successesn
: number of trials
Next, we plot the prior, likelihood, and the posterior for all four.
Lastly, we examine the effect of different priors on the posterior.
Quiz scores
student question correct year institution
1 1 1 1 2010 Colby
2 1 2 1 2010 Colby
3 1 3 1 2010 Colby
4 1 4 0 2010 Colby
5 1 5 1 2010 Colby
6 1 6 1 2010 Colby
Calculating the summary statistics and plotting the distribution
The Good Guesser
summarize_beta_binomial(alpha = 14, beta = 1, y = 45, n = 150)
model alpha beta mean mode var sd
1 prior 14 1 0.9333333 1.0000000 0.003888889 0.06236096
2 posterior 59 106 0.3575758 0.3558282 0.001383827 0.03719983
plot_beta_binomial(alpha = 14, beta = 1, y = 45, n = 150)
The Unpredictable Guesser
summarize_beta_binomial(alpha = 1, beta = 1, y = 45, n = 150)
model alpha beta mean mode var sd
1 prior 1 1 0.5000000 NaN 0.083333333 0.28867513
2 posterior 46 106 0.3026316 0.3 0.001379384 0.03714006
plot_beta_binomial(alpha = 1, beta = 1, y = 45, n = 150)
The Poor Guesser
summarize_beta_binomial(alpha = 4, beta = 11, y = 45, n = 150)
model alpha beta mean mode var sd
1 prior 4 11 0.2666667 0.2307692 0.012222222 0.11055416
2 posterior 49 116 0.2969697 0.2944785 0.001257703 0.03546411
plot_beta_binomial(alpha = 4, beta = 11, y = 45, n = 150)
Your turn
Fill in the alpha and beta shape parameters from your prior.
Comparison of the priors
Fill in the gaps to add your alpha and beta shape parameters with your guess:
Recap
Take the quiz
Now that we’ve updated our posteriors, let’s take the quiz and add our data to the dataset of trials and successes.
Each of you will take a quiz consisting of 15 questions. Each question has the same possible answers: CNN or The Onion. You can take the quiz through our google form: