Delta Analysis in PhIP-seq

Overview

The Delta framework answers two related but distinct questions about peptide-level prevalence:

  1. Where do peptides shift?deltaplot() and deltaplot_interactive() place each peptide at its pooled prevalence (x) and prevalence shift Δ (y), providing a global view of which features move between groups.
  2. Is there a statistically significant global shift?compute_delta() aggregates per-peptide z-scores into a single Stouffer-type statistic and assesses significance via subject-label permutation. forestplot() and forestplot_interactive() then display the top and bottom features ranked by that statistic.

This vignette walks through the complete Delta workflow in phiper:

  1. Visualising raw prevalence shift with deltaplot() and deltaplot_interactive()
  2. Running the permutation test with compute_delta()
  3. Displaying results with forestplot() and forestplot_interactive()

Setup

library(phiper)

Load the bundled example dataset. It contains two patient groups (A, B) measured at two timepoints (T1, T2) across 1 000 simulated peptides.

pd <- load_example_data()
pd
#> ── <phip_data> ───────────────────────────────────────────────────────────────── 
#> 
#> counts (first 5 rows): 
#> # A tibble: 5 × 9
#>   sample_id subject_id group timepoint peptide_id exist counts_control
#>   <chr>     <chr>      <chr> <chr>     <chr>      <int>          <int>
#> 1 A_T1_1    1          A     T1        10003          1              5
#> 2 A_T1_1    1          A     T1        10017          1             37
#> 3 A_T1_1    1          A     T1        10023          1             11
#> 4 A_T1_1    1          A     T1        10062          1              0
#> 5 B_T1_1    1          B     T1        10087          1              1
#> # ℹ 2 more variables: counts_hits <int>, fold_change <dbl>
#> 
#> table size: 78,200 rows x 9 columns
#> 
#> peptide library preview (first 5 rows): 
#> # A tibble: 5 × 8
#>   peptide_id Fullname                    species genus family order class common
#>   <chr>      <chr>                       <chr>   <chr> <chr>  <chr> <chr> <chr> 
#> 1 agilent_1  Chromodomain-helicase-DNA-… Homo s… Homo  Homin… Prim… Mamm… Human 
#> 2 agilent_2  integral membrane protein   Mycoba… Myco… Mycob… Myco… Acti… <NA>  
#> 3 agilent_3  hypothetical protein (6/16… Mycoba… Myco… Mycob… Myco… Acti… <NA>  
#> 4 agilent_4  envelope protein (5/8) & a… Orthof… Orth… Flavi… Amar… Flas… JEV   
#> 5 agilent_5  Myosin-7 & beta-myosin hea… Homo s… Homo  Homin… Prim… Mamm… Human 
#> ... plus 36 more columns
#> 
#> library size: 357,190 rows x 44 columns
#> 
#> meta flags: 
#>   con:            <duckdb_connection>
#>   longitudinal:   TRUE
#>   exist:          TRUE
#>   fold_change:    TRUE
#>   raw_counts:     FALSE
#>   extra_cols:     group, counts_control, counts_hits
#>   peptide_con:    <duckdb_connection>
#>   materialise_table: TRUE
#>   finalizer_env:  <environment>
#>   full_cross:     FALSE

Note. The example data are entirely simulated and have no biological meaning. They exist solely to demonstrate the API.

The Delta plots consume prevalence tables produced by compute_pop() (see the POP vignette for details). Compute a group-level prevalence table first:

pop_group <- compute_pop(
  pd,
  rank_cols  = "peptide_id",
  group_cols = "group"
)
#> [08:06:38] INFO  compute_pop
#> [08:06:38] INFO  compute_pop
#>                    - ranks : peptide_id
#>                    - group_cols: group
#>                    - exist_col : exist
#>                    - pop_k_min : 1
#>                    - paired : FALSE
#> [08:06:38] INFO  ranks resolved
#>                    - available: peptide_id
#> [08:06:38] INFO  computing cohort sizes and validating binary group_cols
#> [08:06:38] INFO  computing presence per sample via k-of-n rule
#> [08:06:38] INFO  counting present samples per feature (pop, unpaired)
#> [08:06:39] INFO  building pairwise comparisons
#> [08:06:40] OK    materialized; computing Fisher p-values
#>                    - table: ph_pop_20260614_080639
#> [08:06:41] OK    done (compute_pop, unpaired)
#>                    - rows : 1950
#>                    - ranks : peptide_id
#>                    - k_min : 1
#> [08:06:41] OK    compute_pop - done
#>                  -> elapsed: 2.956s
head(pop_group)
#>         rank feature group_col group1 n1 N1     prop1 percent1 group2 n2 N2
#> 1 peptide_id   10003     group      A 31 38 0.8157895 81.57895      B  0 42
#> 2 peptide_id   10017     group      A  6 38 0.1578947 15.78947      B  0 42
#> 3 peptide_id   10023     group      A 29 38 0.7631579 76.31579      B  0 42
#> 4 peptide_id   10062     group      A 11 38 0.2894737 28.94737      B  0 42
#> 5 peptide_id   10087     group      A  0 38 0.0000000  0.00000      B  8 42
#> 6 peptide_id   10100     group      A  0 38 0.0000000  0.00000      B 10 42
#>       prop2 percent2       ratio delta_ratio        p_raw n_peptides
#> 1 0.0000000  0.00000 68.52631579   33.263158 8.819969e-16          1
#> 2 0.0000000  0.00000 13.26315789    5.631579 9.186952e-03          1
#> 3 0.0000000  0.00000 64.10526316   31.052632 3.123739e-14          1
#> 4 0.0000000  0.00000 24.31578947   11.157895 1.148463e-04          1
#> 5 0.1904762 19.04762  0.06907895   -6.238095 5.758809e-03          1
#> 6 0.2380952 23.80952  0.05526316   -8.047619 1.180799e-03          1

For the unpaired group comparison with compute_delta(), each subject must appear at most once per group. Because pd contains two timepoints per subject, restrict to a single timepoint first:

pd_t1 <- pd |> dplyr::filter(timepoint == "T1") |> dplyr::collect()

Visualising prevalence shift

deltaplot() — static ggplot2

deltaplot() plots each peptide at:

  • x-axis — pooled prevalence (prop1 + prop2) / 2
  • y-axis — prevalence shift prop2 − prop1

A dashed horizontal line marks Δ = 0. Arrows and labels indicate the direction of enrichment. A GAM smooth (controlled by add_smooth) summarises the trend across pooled prevalence.

The only required input is a data frame containing group1, group2, prop1, and prop2 — which is exactly what compute_pop() returns.

deltaplot(
  pop_group,
  group_pair_values = c("A", "B"),
  group_labels      = c("Group A", "Group B")
)
#> [08:06:41] INFO  Preparing delta prevalence plot.

Use group_pair_values to select a specific contrast when the table contains more than one pair. group_labels provides display names for the axis annotations.

Adjusting the smooth

The smooth uses a GAM with basis dimension smooth_k. Lower values produce smoother curves; set add_smooth = FALSE to suppress it.

deltaplot(
  pop_group,
  group_pair_values  = c("A", "B"),
  group_labels       = c("Group A", "Group B"),
  smooth_k           = 3,
  point_alpha        = 0.15,
  point_size         = 0.4
)
#> [08:06:42] INFO  Preparing delta prevalence plot.

Jitter and appearance

deltaplot(
  pop_group,
  group_pair_values   = c("A", "B"),
  group_labels        = c("Group A", "Group B"),
  point_jitter_width  = 0.01,
  point_jitter_height = 0.01,
  point_alpha         = 0.20,
  arrow_color         = "steelblue"
)
#> [08:06:42] INFO  Preparing delta prevalence plot.

deltaplot_interactive() — plotly

The interactive version mirrors the static API. Hovering over a point shows the feature identifier, group-level counts and percentages, and p-value.

deltaplot_interactive(
  pop_group,
  group_pair_values = c("A", "B"),
  group_labels      = c("Group A", "Group B")
)

Disable the smooth if you have few peptides or want a cleaner display:

deltaplot_interactive(
  pop_group,
  group_pair_values = c("A", "B"),
  group_labels      = c("Group A", "Group B"),
  add_smooth        = FALSE,
  point_size        = 5,
  point_alpha       = 0.5
)

Testing global shift: compute_delta()

compute_delta() performs a permutation test for a global (antigen-wide) shift in prevalence between two groups. For each (rank, feature) stratum it computes a per-peptide z-score, aggregates them via a Stouffer statistic, and assesses significance by permuting subject labels.

Unpaired design — group comparison

set.seed(42)
delta_group <- compute_delta(
  pd_t1,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  B_permutations = 2000L
)
delta_group
#> # A tibble: 1,950 × 19
#>    rank       feature group_col group1 group2 design   n_subjects_paired
#>    <chr>      <chr>   <chr>     <chr>  <chr>  <chr>                <int>
#>  1 peptide_id 10003   group     A      B      unpaired                NA
#>  2 peptide_id 10017   group     A      B      unpaired                NA
#>  3 peptide_id 10023   group     A      B      unpaired                NA
#>  4 peptide_id 10062   group     A      B      unpaired                NA
#>  5 peptide_id 10087   group     A      B      unpaired                NA
#>  6 peptide_id 10100   group     A      B      unpaired                NA
#>  7 peptide_id 10108   group     A      B      unpaired                NA
#>  8 peptide_id 1011    group     A      B      unpaired                NA
#>  9 peptide_id 10121   group     A      B      unpaired                NA
#> 10 peptide_id 1013    group     A      B      unpaired                NA
#> # ℹ 1,940 more rows
#> # ℹ 12 more variables: n_peptides_used <int>, m_eff <dbl>, T_obs <dbl>,
#> #   T_null_mean <dbl>, T_null_sd <dbl>, T_obs_stand <dbl>, Z_from_p <dbl>,
#> #   p_perm <dbl>, b <int>, max_delta <dbl>, frac_delta_pos <dbl>,
#> #   frac_delta_pos_w <dbl>

What’s in the output?

Column Description
rank Rank at which the feature is defined
feature Feature identifier
group_col Name of the grouping column
group1, group2 The two groups compared
design "paired" or "unpaired"
n_subjects_paired Paired subjects used (or NA for unpaired)
n_peptides_used Peptides contributing to the test
m_eff Effective number of peptides after weighting
T_obs Observed Stouffer-type test statistic
p_perm Two-sided permutation p-value
b Number of permutations actually used
max_delta Maximum absolute peptide-level prevalence difference
frac_delta_pos Unweighted fraction of positive peptide-level deltas
frac_delta_pos_w Weighted fraction of positive peptide-level deltas

Choosing the per-peptide statistic

stat_mode controls how each peptide’s z-score is computed:

stat_mode Description
"srlr" (default) Signed root likelihood ratio
"diff" Wald-type z: Δ / SE
"asin" Arcsine-transformed z (variance-stabilised)
"score" Pooled score z (2×2 score test)
"mcnemar" Signed McNemar z — paired designs only
"srlr_paired" Signed root deviance — paired designs only
# Arcsine-transformed statistic
delta_asin <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  stat_mode      = "asin",
  B_permutations = 2000L
)

Aggregation and stratification

aggregate_stat controls how peptide-level z-scores are combined:

  • "stouffer" (default) — weighted Stouffer combination
  • "maxmean" — mean of the dominant direction
  • "af" — absolute fraction

strat_bins bins peptides by pooled prevalence before aggregation. The default bins c(0.002, 0.005, ..., 0.50) stratify across the prevalence range so that low- and high-prevalence peptides are treated separately. Set strat_bins = 0 for a single global statistic without stratification.

# No stratification
delta_nostrat <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  strat_bins     = 0,
  B_permutations = 2000L
)

# Custom bins
delta_custom <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  strat_bins     = c(0.05, 0.25, 0.50),
  B_permutations = 2000L
)

Weighting

weight_mode controls peptide weights in the Stouffer statistic:

  • "equal" (default) — all peptides weight 1
  • "se_invvar" — inverse-SE weights (downweights noisy peptides)
  • "n_eff_sqrt" — square root of expected positives (emphasises common peptides)
delta_invvar <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  weight_mode    = "se_invvar",
  B_permutations = 2000L
)

Paired design — timepoint comparison

When the same subjects are measured at two timepoints, set paired_by to the subject-linking column and stat_mode to a paired statistic. The function then performs sign-flipping permutations instead of label shuffling.

set.seed(42)
delta_paired <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "timepoint",
  paired_by      = "subject_id",
  stat_mode      = "srlr_paired",
  B_permutations = 2000L
)
delta_paired
#> # A tibble: 1,946 × 19
#>    rank       feature group_col group1 group2 design n_subjects_paired
#>    <chr>      <chr>   <chr>     <chr>  <chr>  <chr>              <int>
#>  1 peptide_id 10003   timepoint T1     T2     paired                21
#>  2 peptide_id 10017   timepoint T1     T2     paired                21
#>  3 peptide_id 10023   timepoint T1     T2     paired                21
#>  4 peptide_id 10062   timepoint T1     T2     paired                21
#>  5 peptide_id 10087   timepoint T1     T2     paired                21
#>  6 peptide_id 10100   timepoint T1     T2     paired                21
#>  7 peptide_id 10108   timepoint T1     T2     paired                21
#>  8 peptide_id 1011    timepoint T1     T2     paired                21
#>  9 peptide_id 10121   timepoint T1     T2     paired                21
#> 10 peptide_id 1013    timepoint T1     T2     paired                21
#> # ℹ 1,936 more rows
#> # ℹ 12 more variables: n_peptides_used <int>, m_eff <dbl>, T_obs <dbl>,
#> #   T_null_mean <dbl>, T_null_sd <dbl>, T_obs_stand <dbl>, Z_from_p <dbl>,
#> #   p_perm <dbl>, b <int>, max_delta <dbl>, frac_delta_pos <dbl>,
#> #   frac_delta_pos_w <dbl>

Forest plots

Forest plots display the most extreme positive and negative features for a chosen rank, sorted by the selected test statistic.

forestplot() — static ggplot2

forestplot(
  delta_group,
  rank_of_interest  = "peptide_id",
  statistic_to_plot = "T",
  n_neg_each        = 10,
  n_pos_each        = 10,
  left_label        = "More in A",
  right_label       = "More in B"
)$plot

statistic_to_plot selects what is plotted on the x-axis:

Value Description
"T" Raw T_obs from compute_delta()
"T_stand" Permutation-standardised T (requires T_obs_stand column)
"Z_from_p" Signed Z derived from permutation p-values (requires Z_from_p)

Diverging colours

Set use_diverging_colors = TRUE to shade bars from blue (negative) to red (positive):

forestplot(
  delta_group,
  rank_of_interest     = "peptide_id",
  statistic_to_plot    = "T",
  n_neg_each           = 10,
  n_pos_each           = 10,
  left_label           = "More in A",
  right_label          = "More in B",
  use_diverging_colors = TRUE
)$plot

Filtering to significant features

Pass a column name to filter_significant to restrict the plot to features below a given significance threshold. With p_perm this retains only permutation-significant features.

forestplot(
  delta_group,
  rank_of_interest   = "peptide_id",
  statistic_to_plot  = "T",
  filter_significant = "p_perm",
  sig_level          = 0.05,
  left_label         = "More in A",
  right_label        = "More in B"
)$plot

forestplot_interactive() — plotly

The interactive version mirrors the static API and returns a plotly widget. Hovering over a bar shows the feature identifier and statistic value.

forestplot_interactive(
  delta_group,
  rank_of_interest     = "peptide_id",
  statistic_to_plot    = "T",
  n_neg_each           = 10,
  n_pos_each           = 10,
  left_label           = "More in A",
  right_label          = "More in B",
  use_diverging_colors = TRUE
)$plot

Putting it all together

A typical Delta analysis runs in four steps:

# 1. Compute prevalence (needed for deltaplot)
pop <- compute_pop(
  pd,
  rank_cols  = "peptide_id",
  group_cols = "group"
)

# 2. Visualise raw shift
deltaplot(pop, group_pair_values = c("A", "B"),
          group_labels = c("Group A", "Group B"))
deltaplot_interactive(pop, group_pair_values = c("A", "B"),
                      group_labels = c("Group A", "Group B"))

# 3. Permutation test (unpaired)
set.seed(42)
delta <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  B_permutations = 2000L
)

# Optional: paired test across timepoints
delta_paired <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "timepoint",
  paired_by      = "subject_id",
  stat_mode      = "srlr_paired",
  B_permutations = 2000L
)

# 4. Forest plots
forestplot(delta, rank_of_interest = "peptide_id",
           left_label = "More in A", right_label = "More in B")$plot
forestplot_interactive(delta, rank_of_interest = "peptide_id",
                       left_label = "More in A",
                       right_label = "More in B")$plot

Session info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.2.1    phiper_0.4.1   rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6          xfun_0.58             bslib_0.11.0         
#>  [4] ggplot2_4.0.3         parallelDist_0.2.7    lattice_0.22-9       
#>  [7] vctrs_0.7.3           tools_4.6.0           generics_0.1.4       
#> [10] parallel_4.6.0        tibble_3.3.1          cluster_2.1.8.2      
#> [13] blob_1.3.0            pkgconfig_2.0.3       Matrix_1.7-5         
#> [16] dbplyr_2.5.2          RColorBrewer_1.1-3    S7_0.2.2             
#> [19] RcppParallel_5.1.11-2 lifecycle_1.0.5       compiler_4.6.0       
#> [22] farver_2.1.2          codetools_0.2-20      permute_0.9-10       
#> [25] htmltools_0.5.9       sys_3.4.3             buildtools_1.0.0     
#> [28] sass_0.4.10           yaml_2.3.12           pillar_1.11.1        
#> [31] jquerylib_0.1.4       tidyr_1.3.2           MASS_7.3-65          
#> [34] cachem_1.1.0          vegan_2.7-5           nlme_3.1-169         
#> [37] parallelly_1.47.0     tidyselect_1.2.1      digest_0.6.39        
#> [40] Rtsne_0.17            future_1.70.0         duckdb_1.5.2         
#> [43] purrr_1.2.2           showtextdb_3.0        listenv_0.10.1       
#> [46] maketools_1.3.2       labeling_0.4.3        splines_4.6.0        
#> [49] fastmap_1.2.0         grid_4.6.0            cli_3.6.6            
#> [52] magrittr_2.0.5        utf8_1.2.6            future.apply_1.20.2  
#> [55] withr_3.0.2           scales_1.4.0          showtext_0.9-8       
#> [58] globals_0.19.1        sysfonts_0.8.9        otel_0.2.0           
#> [61] ggsignif_0.6.4        chk_0.10.0            evaluate_1.0.5       
#> [64] knitr_1.51            mgcv_1.9-4            phiperio_0.5.0       
#> [67] rlang_1.2.0           Rcpp_1.1.1-1.1        glue_1.8.1           
#> [70] DBI_1.3.0             jsonlite_2.0.0        R6_2.6.1