Plotting Categorical Interactions in Logistic Regression with Odds Ratio and 95%CI using R: A Step-by-Step Guide

Plotting Categorical Interactions in Logistic Regression with Odds Ratio and 95%CI using R

Introduction

Logistic regression is a widely used statistical model for binary outcome variables. In many cases, the relationship between the predictor variables and the response variable may not be linear, and interaction terms can help capture this non-linearity. However, when dealing with categorical interactions in logistic regression, plotting the effects of one categorical variable on another can be challenging. In this post, we will explore how to plot categorical interactions in logistic regression using R, including adding odds ratios (OR) and 95% confidence intervals (CI) to the plots.

Background

R’s interactions package provides an easy-to-use interface for creating plots of interaction terms from logistic regression models. However, this package also has limitations when it comes to customization. When dealing with categorical interactions, we often need more control over the plot layout and the inclusion of additional information such as odds ratios and confidence intervals.

Libraries Used

This post will use the following R libraries:

  • ggplot2 for data visualization
  • dplyr for data manipulation
  • geometextpath for adding text to the plot
  • scales for formatting values in plots

Creating a Logistic Regression Model with Categorical Interactions

To begin, we will create a logistic regression model that includes an interaction term. We will use the glm() function from base R to fit the model.

# Load required libraries
library(ggplot2)
library(dplyr)

# Set seed for reproducibility
set.seed(154)

# Generate random data
outcome <- sample(c(0,1), 1000, replace=TRUE)
factor1 <- sample(c("A","B"), 1000, replace=TRUE)
factor2 <- sample(c("D","F"), 1000, replace=TRUE)

df <- data.frame(outcome, factor1, factor2)
df$outcome <- as.factor(df$outcome)

# Fit logistic regression model with interaction term
fit3 <- glm(outcome ~ factor1*factor2, data = df, family=binomial(link="logit"))

Plotting the Interaction Terms using cat_plot

We will use the cat_plot() function from the interactions package to create a plot of the interaction terms.

# Load interactions package
library(interactions)

# Create plot of interaction terms
p <- cat_plot(fit3, pred = factor1, modx = factor2, interval = TRUE)

Customizing the Plot with Additional Information

To customize the plot and include additional information such as odds ratios and confidence intervals, we will use the geom_textpath() function from ggplot2.

# Load geometextpath library
library(geometextpath)

# Calculate odds ratio for each interaction term
p2o <- function(x) round(x/(1 - x), 3)

# Create plot with additional information
p +
  geom_textpath(data = data.frame(factor1 = c('A', 'A', 'B', 'B'),
                                  factor2 = 'D', 
                                  outcome = c(0.625, 0.64, 0.64, 0.625)),
                aes(label = paste('Interaction p =', 
                                  scales::pvalue(summary(fit3)$coef[4, 4]))),
                color = 'black') +
  geom_text(aes(y = ymax, label = paste0("OR ", p2o(outcome), '\n(',
                                         p2o(ymin), ' - ', p2o(ymax), ')'), 
                group = factor2), 
            color = 'black', position = position_dodge(width = 0.9),
            vjust = -0.5)

Interpretation of the Plot

The resulting plot shows the odds ratio and confidence interval for each level of factor2 across levels of factor1. The p-value for interaction is also included in the label text.

Conclusion

In this post, we explored how to plot categorical interactions in logistic regression using R. We used a combination of base R functions (glm()) and data visualization libraries (ggplot2, interactions, and geometextpath) to create an interactive plot that includes odds ratios and confidence intervals. This type of plot can be useful for understanding the relationships between categorical predictor variables in logistic regression models.

References

  • [1] R Development Core Team (2023). R: A Language and Environment for Statistical Computing. Available at: https://www.R-project.org/
  • [2] ggplot2 Development Team (2023). ggplot2: Elegant statistical graphics for data analysis. Available at: https://ggplot2.tidyverse.org/
  • [3] Wickham, H. R. (2016). Interactive visualizations with ggplot 2. New York: Springer.

Example Use Cases

  1. Epidemiology: In epidemiology, logistic regression is often used to model the risk of developing a disease based on various predictor variables such as age, sex, and exposure to a risk factor.
  2. Marketing Research: Logistic regression can be used in marketing research to predict customer behavior based on demographic and behavioral characteristics.

Future Work

Future work could include exploring other ways to customize plots of categorical interactions in logistic regression models. Additionally, incorporating machine learning techniques into these plots could provide further insights into the relationships between predictor variables.


Last modified on 2024-07-24