Diarrhea Disease Transmission

Case Study: Social Networks, Geographic Configuration and Diarrheal Disease Transmission in Rural Ecuador.

Diarrheal diseases are primarily spread through the fecal-oral route, which involves the ingestion of pathogens originating from fecal contamination. Individuals may be exposed to disease through contaminated water, contaminated food, person-to-person transmission, environmental exposure, and animals and insects.

This case study examines the relationship between social networks, geographic configuration, and the prevalence of diarrheal disease using data collected between 2003 and 2005 from nine rural communities in northern coastal Ecuador. See the full paper in the American Journal of Epidemiology.

The dataset includes:

  • Demographic information: Community and household-level details such as population size, education, and residence duration.
  • Social network data: Measures of social connectedness within two types of networks: casual contact and food-sharing.
  • Geographic data: Household locations, road access, and spatial indices reflecting housing density.

In this lab, students will analyze how geographic factors (like road access and housing density) and social network metrics (such as contact degree) interact to influence the spread of disease using Bayesian modelling techniques. In this lab you will learn to:

  • Apply statistical and visualization techniques to examine relationships between variables.
  • Interpret multivariate models that quantify the role of social and geographic factors in disease prevalence.
  • Reflect on the practical implications of these findings for designing public health interventions.

Bayesian Statistics Learning Objectives

  • Build a Bayesian simple linear regression model
  • Interpret appropriate prior models for the regression parameters;
  • Simulate the posterior model of the regression parameters; and
  • Use simulation results to build a posterior understanding of the relationship between Y and
    X and to build posterior predictive models of
  • Compare these models to a model with multiple linear regression.

Libraries

library(tidyverse) # data cleaning and plotting
library(bayesrules)
library(tidyverse)
library(rstan)
library(rstanarm)
library(bayesplot)
library(tidybayes)
library(janitor)
library(broom.mixed)

Introducing the data

ecData <- read_csv("data/ecDataCompID.csv")
Rows: 9 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (6): city, pop, meandeg, meanind, lnmeanind, remoteness

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ecData
# A tibble: 9 × 6
   city   pop meandeg meanind lnmeanind remoteness
  <dbl> <dbl>   <dbl>   <dbl>     <dbl>      <dbl>
1     1   879    5.10   113.       4.73     -1.98 
2     2   153    6       31.7      3.46      0.472
3     3    99    5.7     43.4      3.77      0.065
4     4   230    7.90    40.4      3.70      3.58 
5     5    84    2.54   507.       6.23     -0.918
6     6   146    8.05    37.3      3.62     -0.263
7     7   285    2.54   153.       5.03     -2.47 
8     8   478    5.95   107.       4.67     -0.997
9     9   319    6.79    45.7      3.82      2.51 

The data contains information on:

  • city: the city id
  • pop: population (number of people)
  • meandeg:
  • meanind:
  • lnmeanind:
  • remoteness: measure of remoteness

Specifying the data model

Specifying the priors

Tuning prior models for regression parameters

Posterior simulation

Simulation via rstanarm

Interpreting the posterior

Posterior prediction