library(tidyverse)
library(bayesrules)
library(gridExtra)
library(googlesheets4)
library(googledrive)
CNN vs. The Onion - Beta Binomial
Activity Introduction
The goal of this activity is to explore how prior beliefs, what we think is likely before seeing any data, can influence the conclusions we draw after seeing new evidence: our posterior. To make things interesting, we will use a quiz where you’ll try to tell whether a headline came from CNN (a real news site) or The Onion (a fake, satirical news site).
Before taking the quiz, you’ll think about how many headlines you expect to guess correctly, and you’ll turn that guess into a prior, or starting point for your beliefs. Then, you’ll update your beliefs using data from actual quiz results and compare your updated beliefs, posterior, to those of other types of guessers: someone confident (optimistic), someone unsure (unsure), and someone not very confident (pessimistic) about how many answers they will get correct in the game.
Learning objectives
By the end of the activity you’ll be able to: * Understand what a prior and posterior are * See how beliefs change when we get new infromation * Make and interpret plots of different prior and posterior distributions * Calculate and compare summary statistics like the mean, mode, and standard deviation of these distributions.
Let’s dive in and see how well we can guess the news from fiction and learn about Bayesian thinking along the way!
CNN vs The Onion
CNN (the Cable News Network) is widely considered a reputable news source. The Onion, on the other hand, is (according to Wikipedia) “an American news satire organization. It is an entertainment newspaper and a website featuring satirical articles reporting on international, national, and local news.” Another way of putting it - The Onion is “fake news” for entertainment purposes.
In this exercise you will assess your ability to determine real news stories published on cnn.com from fake news stories published on theonion.com.
Packages
Priors
The CNN vs. The Onion quiz consists of 15 questions. Each question has the same possible answers: CNN or The Onion. Before we take the quiz, predict how many headlines you will guess correctly out of 15. You might think about your ability to determine fact from fiction or your familiarity with CNN and The Onion.
Let \(\pi\) be the proportion of correct answers you guess right in the CNN vs the Onion quiz. Keeping that number in mind, let’s explore in the table below, three different priors from three different people \(\pi\)
Optimistic | Unsure | Pessimistic |
---|---|---|
Beta(14, 1) | Beta(1, 1) | Beta(5, 10) |
Plotting the Priors
Where does your prediction fall?
When we construct our priors from the Beta distribution, the shape parameters \(\alpha\) and \(\beta\) can be interpreted as the approximate number of successes and the approximate number of failures. In constructing your prior, you can derive your alpha and beta parameters into how many questions out of 15 you expect to get correct: Beta(approx_number_correct, approx_number_wrong)
.
Returning to your own prediction, replace approx_number_correct
and approx_number_wrong
with your predictions.
Looking at the graph of your prior, which guesser is your prior most similar to: Optimistic, Unsure or Pessimistic?
Vocabulary
We often describe priors in terms of how much information they give about the unknown variable. Priors are often described as:
Informative prior: An informative prior reflects specific information about the unknown variable with high certainty (i.e. low variability).
Vague (diffuse) prior: A vague or diffuse prior reflects little specific information about the unknown variable. A flat prior, which assigns equal prior plausibility to all possible values of the variable, is a special case.
Reflection: - How would you classify your prior? Is it informative or vague? Why?
Activity
Data
Our data with the results from the quiz in a data frame called
cnn_onion
.Based on the observed data, we will update the posterior for our three guessers and our own prior.
Next, we calculate the summary statistics for the prior and posterior for all four priors using the function:
summarize_beta_binomial(alpha, beta, y = NULL, n = NULL)
function summarizes the mean, mode, and variance of the prior and posterior Beta models of \(\pi\)Arguments:
alpha, beta
: positive shape parameters of the prior Beta modely
: number of successesn
: number of trials
Next, we plot the prior, likelihood, and the posterior for all four.
Lastly, we examine the effect of different priors on the posterior.
Quiz scores
student question correct year institution
1 1 1 1 2010 Colby
2 1 2 1 2010 Colby
3 1 3 1 2010 Colby
4 1 4 0 2010 Colby
5 1 5 1 2010 Colby
6 1 6 1 2010 Colby
Calculating the summary statistics and plotting the distribution
The Optimist
summarize_beta_binomial(alpha = 14, beta = 1, y = 45, n = 150)
model alpha beta mean mode var sd
1 prior 14 1 0.9333333 1.0000000 0.003888889 0.06236096
2 posterior 59 106 0.3575758 0.3558282 0.001383827 0.03719983
plot_beta_binomial(alpha = 14, beta = 1, y = 45, n = 150)
Unsure
summarize_beta_binomial(alpha = 1, beta = 1, y = 45, n = 150)
model alpha beta mean mode var sd
1 prior 1 1 0.5000000 NaN 0.083333333 0.28867513
2 posterior 46 106 0.3026316 0.3 0.001379384 0.03714006
plot_beta_binomial(alpha = 1, beta = 1, y = 45, n = 150)
The Pessimist
summarize_beta_binomial(alpha = 4, beta = 11, y = 45, n = 150)
model alpha beta mean mode var sd
1 prior 4 11 0.2666667 0.2307692 0.012222222 0.11055416
2 posterior 49 116 0.2969697 0.2944785 0.001257703 0.03546411
plot_beta_binomial(alpha = 4, beta = 11, y = 45, n = 150)
Your turn
Fill in the alpha and beta shape parameters from your prior.
Comparison of the priors
Fill in the gaps to add your alpha and beta shape parameters with your guess:
Recap
In this activity, you:
Made a prediction about how well you’d do on a quiz and turned that into a prior distribution
Learned about different types of priors: confident, unsure, and pessimistic
Updated your prior using data from the quiz to get a posterior distribution
Compared how different priors affect the posterior, even when we see the same data
Practiced reading and interpreting plots of priors, likelihoods, and posteriors
Calculated key summary statistics like the mean, mode, and standard deviation
This exercise shows how Bayesian thinking helps us combine what we already believe with new evidence to make better, more informed decisions. And sometimes, it reminds us that even when we think we’re great at telling real news from fake news, the data might say otherwise!
Reflection
Think about your prior prediction and how it compared to the data.
How similar or different was your prior to the actual results?
Did updating your beliefs with data change your thinking? In what way?
If you had chosen a different prior (e.g., more vague or more confident), how would your posterior have changed?
What does this activity show about the role of prior knowledge or assumptions in data analysis?
Take a few minutes to jot down your thoughts before share your ideas with a partner or group.
Take the quiz
Now that we’ve updated our posteriors, let’s take the quiz and add our data to the dataset of trials and successes.
Each of you will take a quiz consisting of 15 questions. Each question has the same possible answers: CNN or The Onion. You can take the quiz through our google form: