---
title: "Delta Analysis in PhIP-seq"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: false
vignette: >
  %\VignetteIndexEntry{Delta Analysis in PhIP-seq}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse   = TRUE,
  comment    = "#>",
  fig.width  = 7,
  fig.height = 5,
  warning    = FALSE,
  message    = FALSE
)
```

## Overview

The Delta framework answers two related but distinct questions about
peptide-level prevalence:

1. **Where do peptides shift?** — `deltaplot()` and `deltaplot_interactive()`
   place each peptide at its pooled prevalence (x) and prevalence shift Δ (y),
   providing a global view of which features move between groups.
2. **Is there a statistically significant global shift?** —
   `compute_delta()` aggregates per-peptide z-scores into a single
   Stouffer-type statistic and assesses significance via subject-label
   permutation. `forestplot()` and `forestplot_interactive()` then display
   the top and bottom features ranked by that statistic.

This vignette walks through the complete Delta workflow in **phiper**:

1. Visualising raw prevalence shift with `deltaplot()` and
   `deltaplot_interactive()`
2. Running the permutation test with `compute_delta()`
3. Displaying results with `forestplot()` and `forestplot_interactive()`

---

## Setup

```{r load}
library(phiper)
```

Load the bundled example dataset. It contains two patient groups (`A`, `B`)
measured at two timepoints (`T1`, `T2`) across 1 000 simulated peptides.

```{r data}
pd <- load_example_data()
pd
```

> **Note.** The example data are entirely simulated and have no biological
> meaning. They exist solely to demonstrate the API.

The Delta plots consume prevalence tables produced by `compute_pop()` (see
the POP vignette for details). Compute a group-level prevalence table first:

```{r pop}
pop_group <- compute_pop(
  pd,
  rank_cols  = "peptide_id",
  group_cols = "group"
)
head(pop_group)
```

For the unpaired group comparison with `compute_delta()`, each subject must
appear at most once per group. Because `pd` contains two timepoints per
subject, restrict to a single timepoint first:

```{r filter-t1}
pd_t1 <- pd |> dplyr::filter(timepoint == "T1") |> dplyr::collect()
```

---

## Visualising prevalence shift

### `deltaplot()` — static ggplot2

`deltaplot()` plots each peptide at:

- **x-axis** — pooled prevalence `(prop1 + prop2) / 2`
- **y-axis** — prevalence shift `prop2 − prop1`

A dashed horizontal line marks Δ = 0. Arrows and labels indicate the direction
of enrichment. A GAM smooth (controlled by `add_smooth`) summarises the trend
across pooled prevalence.

The only required input is a data frame containing `group1`, `group2`, `prop1`,
and `prop2` — which is exactly what `compute_pop()` returns.

```{r deltaplot-basic}
deltaplot(
  pop_group,
  group_pair_values = c("A", "B"),
  group_labels      = c("Group A", "Group B")
)
```

Use `group_pair_values` to select a specific contrast when the table contains
more than one pair. `group_labels` provides display names for the axis
annotations.

#### Adjusting the smooth

The smooth uses a GAM with basis dimension `smooth_k`. Lower values produce
smoother curves; set `add_smooth = FALSE` to suppress it.

```{r deltaplot-smooth}
deltaplot(
  pop_group,
  group_pair_values  = c("A", "B"),
  group_labels       = c("Group A", "Group B"),
  smooth_k           = 3,
  point_alpha        = 0.15,
  point_size         = 0.4
)
```

#### Jitter and appearance

```{r deltaplot-custom}
deltaplot(
  pop_group,
  group_pair_values   = c("A", "B"),
  group_labels        = c("Group A", "Group B"),
  point_jitter_width  = 0.01,
  point_jitter_height = 0.01,
  point_alpha         = 0.20,
  arrow_color         = "steelblue"
)
```

### `deltaplot_interactive()` — plotly

The interactive version mirrors the static API. Hovering over a point shows
the feature identifier, group-level counts and percentages, and p-value.

```{r deltaplot-interactive, eval=FALSE}
deltaplot_interactive(
  pop_group,
  group_pair_values = c("A", "B"),
  group_labels      = c("Group A", "Group B")
)
```

Disable the smooth if you have few peptides or want a cleaner display:

```{r deltaplot-interactive-nosmooth, eval=FALSE}
deltaplot_interactive(
  pop_group,
  group_pair_values = c("A", "B"),
  group_labels      = c("Group A", "Group B"),
  add_smooth        = FALSE,
  point_size        = 5,
  point_alpha       = 0.5
)
```

---

## Testing global shift: `compute_delta()`

`compute_delta()` performs a permutation test for a global (antigen-wide)
shift in prevalence between two groups. For each `(rank, feature)` stratum
it computes a per-peptide z-score, aggregates them via a Stouffer statistic,
and assesses significance by permuting subject labels.

### Unpaired design — group comparison

```{r compute-delta}
set.seed(42)
delta_group <- compute_delta(
  pd_t1,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  B_permutations = 2000L
)
```

```{r delta-result}
delta_group
```

### What's in the output?

| Column | Description |
|---|---|
| `rank` | Rank at which the feature is defined |
| `feature` | Feature identifier |
| `group_col` | Name of the grouping column |
| `group1`, `group2` | The two groups compared |
| `design` | `"paired"` or `"unpaired"` |
| `n_subjects_paired` | Paired subjects used (or `NA` for unpaired) |
| `n_peptides_used` | Peptides contributing to the test |
| `m_eff` | Effective number of peptides after weighting |
| `T_obs` | Observed Stouffer-type test statistic |
| `p_perm` | Two-sided permutation p-value |
| `b` | Number of permutations actually used |
| `max_delta` | Maximum absolute peptide-level prevalence difference |
| `frac_delta_pos` | Unweighted fraction of positive peptide-level deltas |
| `frac_delta_pos_w` | Weighted fraction of positive peptide-level deltas |

### Choosing the per-peptide statistic

`stat_mode` controls how each peptide's z-score is computed:

| `stat_mode` | Description |
|---|---|
| `"srlr"` (default) | Signed root likelihood ratio |
| `"diff"` | Wald-type z: Δ / SE |
| `"asin"` | Arcsine-transformed z (variance-stabilised) |
| `"score"` | Pooled score z (2×2 score test) |
| `"mcnemar"` | Signed McNemar z — paired designs only |
| `"srlr_paired"` | Signed root deviance — paired designs only |

```{r delta-stat-mode, eval=FALSE}
# Arcsine-transformed statistic
delta_asin <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  stat_mode      = "asin",
  B_permutations = 2000L
)
```

### Aggregation and stratification

`aggregate_stat` controls how peptide-level z-scores are combined:

- `"stouffer"` (default) — weighted Stouffer combination
- `"maxmean"` — mean of the dominant direction
- `"af"` — absolute fraction

`strat_bins` bins peptides by pooled prevalence before aggregation. The
default bins `c(0.002, 0.005, ..., 0.50)` stratify across the prevalence
range so that low- and high-prevalence peptides are treated separately.
Set `strat_bins = 0` for a single global statistic without stratification.

```{r delta-strat, eval=FALSE}
# No stratification
delta_nostrat <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  strat_bins     = 0,
  B_permutations = 2000L
)

# Custom bins
delta_custom <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  strat_bins     = c(0.05, 0.25, 0.50),
  B_permutations = 2000L
)
```

### Weighting

`weight_mode` controls peptide weights in the Stouffer statistic:

- `"equal"` (default) — all peptides weight 1
- `"se_invvar"` — inverse-SE weights (downweights noisy peptides)
- `"n_eff_sqrt"` — square root of expected positives (emphasises common
  peptides)

```{r delta-weight, eval=FALSE}
delta_invvar <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  weight_mode    = "se_invvar",
  B_permutations = 2000L
)
```

### Paired design — timepoint comparison

When the same subjects are measured at two timepoints, set `paired_by` to
the subject-linking column and `stat_mode` to a paired statistic. The
function then performs sign-flipping permutations instead of label shuffling.

```{r compute-delta-paired}
set.seed(42)
delta_paired <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "timepoint",
  paired_by      = "subject_id",
  stat_mode      = "srlr_paired",
  B_permutations = 2000L
)
delta_paired
```

---

## Forest plots

Forest plots display the most extreme positive and negative features for a
chosen rank, sorted by the selected test statistic.

### `forestplot()` — static ggplot2

```{r forestplot-basic}
forestplot(
  delta_group,
  rank_of_interest  = "peptide_id",
  statistic_to_plot = "T",
  n_neg_each        = 10,
  n_pos_each        = 10,
  left_label        = "More in A",
  right_label       = "More in B"
)$plot
```

`statistic_to_plot` selects what is plotted on the x-axis:

| Value | Description |
|---|---|
| `"T"` | Raw `T_obs` from `compute_delta()` |
| `"T_stand"` | Permutation-standardised T (requires `T_obs_stand` column) |
| `"Z_from_p"` | Signed Z derived from permutation p-values (requires `Z_from_p`) |

#### Diverging colours

Set `use_diverging_colors = TRUE` to shade bars from blue (negative) to red
(positive):

```{r forestplot-colors}
forestplot(
  delta_group,
  rank_of_interest     = "peptide_id",
  statistic_to_plot    = "T",
  n_neg_each           = 10,
  n_pos_each           = 10,
  left_label           = "More in A",
  right_label          = "More in B",
  use_diverging_colors = TRUE
)$plot
```

#### Filtering to significant features

Pass a column name to `filter_significant` to restrict the plot to features
below a given significance threshold. With `p_perm` this retains only
permutation-significant features.

```{r forestplot-filter, eval=FALSE}
forestplot(
  delta_group,
  rank_of_interest   = "peptide_id",
  statistic_to_plot  = "T",
  filter_significant = "p_perm",
  sig_level          = 0.05,
  left_label         = "More in A",
  right_label        = "More in B"
)$plot
```

### `forestplot_interactive()` — plotly

The interactive version mirrors the static API and returns a plotly widget.
Hovering over a bar shows the feature identifier and statistic value.

```{r forestplot-interactive, eval=FALSE}
forestplot_interactive(
  delta_group,
  rank_of_interest     = "peptide_id",
  statistic_to_plot    = "T",
  n_neg_each           = 10,
  n_pos_each           = 10,
  left_label           = "More in A",
  right_label          = "More in B",
  use_diverging_colors = TRUE
)$plot
```

---

## Putting it all together

A typical Delta analysis runs in four steps:

```{r full-pipeline, eval=FALSE}
# 1. Compute prevalence (needed for deltaplot)
pop <- compute_pop(
  pd,
  rank_cols  = "peptide_id",
  group_cols = "group"
)

# 2. Visualise raw shift
deltaplot(pop, group_pair_values = c("A", "B"),
          group_labels = c("Group A", "Group B"))
deltaplot_interactive(pop, group_pair_values = c("A", "B"),
                      group_labels = c("Group A", "Group B"))

# 3. Permutation test (unpaired)
set.seed(42)
delta <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "group",
  B_permutations = 2000L
)

# Optional: paired test across timepoints
delta_paired <- compute_delta(
  pd,
  rank_cols      = "peptide_id",
  group_cols     = "timepoint",
  paired_by      = "subject_id",
  stat_mode      = "srlr_paired",
  B_permutations = 2000L
)

# 4. Forest plots
forestplot(delta, rank_of_interest = "peptide_id",
           left_label = "More in A", right_label = "More in B")$plot
forestplot_interactive(delta, rank_of_interest = "peptide_id",
                       left_label = "More in A",
                       right_label = "More in B")$plot
```

---

## Session info

```{r session-info}
sessionInfo()
```