CNN vs. The Onion - Beta Binomial

CNN vs The Onion

CNN (the Cable News Network) is widely considered a reputable news source. The Onion, on the other hand, is (according to Wikipedia) “an American news satire organization. It is an entertainment newspaper and a website featuring satirical articles reporting on international, national, and local news.” Another way of putting it - The Onion is “fake news” for entertainment purposes.

In this exercise you will assess your ability to determine real news stories published on cnn.com from fake news stories published on theonion.com.

Learning Objectives

  • Explore the effect of different priors on posteriors
  • Introduce the concept of a prior
  • Plotting priors and posteriors
  • Calculating summary statistics of the prior and posteriors

Packages

library(tidyverse)
library(bayesrules)
library(gridExtra)
library(googlesheets4)
library(googledrive)

Priors

The CNN vs. The Onion quiz consists of 15 questions. Each question has the same possible answers: CNN or The Onion. Before we take the quiz, predict how many headlines you will guess correctly out of 15. You might think about your ability to determine fact from fiction or your familiarity with CNN and The Onion.

Let \(\pi\) be the proportion of correct answers you guess right in the CNN vs the Onion quiz. Keeping that number in mind, let’s explore in the table below, three different priors from three different people \(\pi\)

Good Guesser Unpredictable Guesser Poor Guesser
Beta(14, 1) Beta(1, 1) Beta(5, 10)

Plotting the Priors

Where does your prediction fall?

When we construct our priors from the Beta distribution, the shape parameters \(\alpha\) and \(\beta\) can be interpreted as the approximate number of successes and the approximate number of failures. In constructing your prior, you can derive your alpha and beta parameters into how many questions out of 15 you expect to get correct: Beta(approx_number_correct, approx_number_wrong).

Returning to your own prediction, replace approx_number_correct and approx_number_wrong with your predictions.

Looking at the graph of your prior, which guesser is your prior most similar to: Good, Unpredictable or Poor?

Vocabulary

We often describe priors in terms of how much information they give about the unknown variable. Priors are often described as:

  • Informative prior: An informative prior reflects specific information about the unknown variable with high certainty (i.e. low variability).

  • Vague (diffuse) prior: A vague or diffuse prior reflects little specific information about the unknown variable. A flat prior, which assigns equal prior plausibility to all possible values of the variable, is a special case.

Activity

Data

  • Our data with the results from the quiz in a data frame called cnn_onion.

  • Based on the observed data, we will update the posterior for our three guessers and our own prior.

  • Next, we calculate the summary statistics for the prior and posterior for all four priors using the function:

  • summarize_beta_binomial(alpha, beta, y = NULL, n = NULL) function summarizes the mean, mode, and variance of the prior and posterior Beta models of \(\pi\)

  • Arguments:

    • alpha, beta: positive shape parameters of the prior Beta model
    • y: number of successes
    • n: number of trials
  • Next, we plot the prior, likelihood, and the posterior for all four.

  • Lastly, we examine the effect of different priors on the posterior.

Quiz scores

  student question correct year institution
1       1        1       1 2010       Colby
2       1        2       1 2010       Colby
3       1        3       1 2010       Colby
4       1        4       0 2010       Colby
5       1        5       1 2010       Colby
6       1        6       1 2010       Colby

Calculating the summary statistics and plotting the distribution

The Good Guesser

summarize_beta_binomial(alpha = 14, beta = 1, y = 45, n = 150)
      model alpha beta      mean      mode         var         sd
1     prior    14    1 0.9333333 1.0000000 0.003888889 0.06236096
2 posterior    59  106 0.3575758 0.3558282 0.001383827 0.03719983
plot_beta_binomial(alpha = 14, beta = 1, y = 45, n = 150)

The Unpredictable Guesser

summarize_beta_binomial(alpha = 1, beta = 1, y = 45, n = 150)
      model alpha beta      mean mode         var         sd
1     prior     1    1 0.5000000  NaN 0.083333333 0.28867513
2 posterior    46  106 0.3026316  0.3 0.001379384 0.03714006
plot_beta_binomial(alpha = 1, beta = 1, y = 45, n = 150)

The Poor Guesser

summarize_beta_binomial(alpha = 4, beta = 11, y = 45, n = 150)
      model alpha beta      mean      mode         var         sd
1     prior     4   11 0.2666667 0.2307692 0.012222222 0.11055416
2 posterior    49  116 0.2969697 0.2944785 0.001257703 0.03546411
plot_beta_binomial(alpha = 4, beta = 11, y = 45, n = 150)

Your turn

Fill in the alpha and beta shape parameters from your prior.

Comparison of the priors

Fill in the gaps to add your alpha and beta shape parameters with your guess:

Recap

Take the quiz

Now that we’ve updated our posteriors, let’s take the quiz and add our data to the dataset of trials and successes.

Each of you will take a quiz consisting of 15 questions. Each question has the same possible answers: CNN or The Onion. You can take the quiz through our google form: