| Title: | Automated PhIP-seq Analysis and Reporting |
|---|---|
| Description: | Provides an end-to-end toolkit for Phage ImmunoPrecipitation- sequencing (PhIP-seq) data. Functions import raw peptide-sample count matrices, apply quality control filters, normalise library sizes, compute enrichment statistics and diversity metrics, and identify differentially enriched peptides or motifs. Results can be explored through tidy data frames, visualised with publication-ready ggplot2 graphics, or rendered into fully fledged HTML reports via R Markdown. |
| Authors: | Mateusz Kolek [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-6470-4830>), Alon Alexander [ctb, cph], Thomas Vogl [cph, fnd] (ORCID: <https://orcid.org/0000-0002-3892-1740>) |
| Maintainer: | Mateusz Kolek <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.4.1 |
| Built: | 2026-05-15 09:20:17 UTC |
| Source: | https://github.com/Polymerase3/phiper |
Computes richness, Shannon, Simpson, Pielou's evenness, and Berger-Parker dominance per sample and per grouping variable at one or more ranks (columns describing peptides).
compute_alpha( x, group_cols = NULL, ranks = "peptide_id", mode = c("binary", "threshold", "abundance"), abundance_col = NULL, threshold = NULL, abundance_agg = c("mean", "sum", "max"), metrics = .alpha_metric_names, shannon_base = c("ln", "log2", "log10"), carry_cols = NULL, group_interaction = FALSE, interaction_only = FALSE, interaction_sep = " * ", shannon_log = NULL )compute_alpha( x, group_cols = NULL, ranks = "peptide_id", mode = c("binary", "threshold", "abundance"), abundance_col = NULL, threshold = NULL, abundance_agg = c("mean", "sum", "max"), metrics = .alpha_metric_names, shannon_base = c("ln", "log2", "log10"), carry_cols = NULL, group_interaction = FALSE, interaction_only = FALSE, interaction_sep = " * ", shannon_log = NULL )
x |
A |
group_cols |
Character vector of grouping columns, or |
ranks |
Character vector of exact column names to aggregate by.
Typical values: |
mode |
One of |
abundance_col |
Character scalar naming the abundance column. Required
for |
threshold |
Numeric scalar. Required for |
abundance_agg |
One of |
metrics |
Character vector of diversity metrics to compute. Any subset
of: |
shannon_base |
One of |
carry_cols |
Optional character vector of extra columns to carry forward into the output if present (e.g. sample metadata in your table). |
group_interaction |
Logical; also compute the interaction of all
|
interaction_only |
Logical; if |
interaction_sep |
Separator used for the interaction label (default
|
shannon_log |
Deprecated. Use |
Ranks are the peptide identities or characteristics you aggregate by. They must be exact column names:
For <phip_data>: columns in the PHIPER peptide library (e.g. peptide_id,
lineage/taxa fields).
For data.frame: columns present on your long count table.
The mode parameter controls what n represents in every diversity formula:
"binary" (default): presence = exist > 0. n = count of distinct
enriched peptides per (sample, rank_val).
"threshold": presence = abundance_col > threshold. n = count of
peptides passing the threshold. threshold is required; abundance_col
defaults to "fold_change" when NULL.
"abundance": no binarisation. abundance_col is required. At
rank = "peptide_id", n = the raw abundance value per peptide. At
higher ranks, peptides are aggregated within each (sample, rank_val)
using abundance_agg ("mean", "sum", or "max").
group_cols can be a character vector; the return value is a named list
of data frames, one per group_col.
If group_cols = NULL, a single non-facetted table is returned under the
name "all_samples".
If group_interaction = TRUE (and you supplied >= 2 group_cols), an
additional element is computed for the interaction of all group columns,
with labels joined by interaction_sep.
If interaction_only = TRUE, you get only that interaction element
(requires group_interaction = TRUE and at least 2 group_cols).
A named list of data frames with S3 class
"phip_alpha_diversity". Each element (per group_col, plus optional
interaction or "all_samples") contains: rank, sample_id, the grouping
column (or group when group_cols = NULL), any carry_cols, and one
column per requested metric (see metrics). pielou_evenness is NA
when richness <= 1; berger_parker_dominance is NA when richness == 0.
pd <- load_example_data() # phip_data input: peptide-level diversity by group out <- compute_alpha( pd, group_cols = "group", ranks = "peptide_id" ) # include interaction of multiple grouping variables out2 <- compute_alpha( pd, group_cols = c("group", "timepoint"), ranks = c("peptide_id", "family", "genus"), group_interaction = TRUE ) # interaction only (returns a single element named "group * timepoint") out3 <- compute_alpha( pd, group_cols = c("group", "timepoint"), ranks = "peptide_id", group_interaction = TRUE, interaction_only = TRUE ) ## Not run: # data.frame input: ranks must be columns in the data out_df <- compute_alpha( df_long, group_cols = NULL, ranks = "peptide_id" ) ## End(Not run) # threshold mode: presence = fold_change > 1.5 out_thr <- compute_alpha( pd, group_cols = "group", ranks = "peptide_id", mode = "threshold", threshold = 1.5 )pd <- load_example_data() # phip_data input: peptide-level diversity by group out <- compute_alpha( pd, group_cols = "group", ranks = "peptide_id" ) # include interaction of multiple grouping variables out2 <- compute_alpha( pd, group_cols = c("group", "timepoint"), ranks = c("peptide_id", "family", "genus"), group_interaction = TRUE ) # interaction only (returns a single element named "group * timepoint") out3 <- compute_alpha( pd, group_cols = c("group", "timepoint"), ranks = "peptide_id", group_interaction = TRUE, interaction_only = TRUE ) ## Not run: # data.frame input: ranks must be columns in the data out_df <- compute_alpha( df_long, group_cols = NULL, ranks = "peptide_id" ) ## End(Not run) # threshold mode: presence = fold_change > 1.5 out_thr <- compute_alpha( pd, group_cols = "group", ranks = "peptide_id", mode = "threshold", threshold = 1.5 )
Runs global (Kruskal-Wallis or one-way ANOVA) and pairwise (Wilcoxon or
Tukey HSD) hypothesis tests on each (rank, metric) combination present in
the output of compute_alpha(). Pairwise p-values are adjusted with
p.adjust() and Cohen's d effect sizes are appended.
compute_alpha_significance( x, group_col = NULL, metric = NULL, global_test = c("kruskal", "anova"), pairwise_test = c("wilcoxon", "tukey"), p_adjust_method = "BH" )compute_alpha_significance( x, group_col = NULL, metric = NULL, global_test = c("kruskal", "anova"), pairwise_test = c("wilcoxon", "tukey"), p_adjust_method = "BH" )
x |
A |
group_col |
Character scalar; name of the grouping column. Inferred
from |
metric |
Character vector; subset of metrics to test. |
global_test |
One of |
pairwise_test |
One of |
p_adjust_method |
Method passed to |
A named list of class "phip_alpha_significance" with two tibbles:
$globalOne row per (rank, metric): rank, metric,
statistic, p_value, test.
$pairwiseOne row per (rank, metric, pair): rank, metric,
group1, group2, p_raw, p_adj, cohens_d, stars, test.
Attributes: group_col, global_test, pairwise_test,
p_adjust_method, metrics, ranks.
## Not run: alpha <- compute_alpha(phip_obj, group_cols = "group") sig <- compute_alpha_significance(alpha) sig$global sig$pairwise ## End(Not run)## Not run: alpha <- compute_alpha(phip_obj, group_cols = "group") sig <- compute_alpha_significance(alpha) sig$global sig$pairwise ## End(Not run)
Performs distance-based redundancy analysis (constrained pcoa,
a.k.a. cap) on a distance matrix using vegan::capscale, with
optional negative eigenvalue correction. Returns constrained sample scores,
eigenvalues, variance partitioning, and feature associations.
compute_capscale( dist_obj, ps, formula, neg_correction = c("none", "lingoes", "cailliez"), top_features = 30L, permutations = 999L, feature_assoc = c("weighted_average", "correlation", "regression", "none") )compute_capscale( dist_obj, ps, formula, neg_correction = c("none", "lingoes", "cailliez"), top_features = 30L, permutations = 999L, feature_assoc = c("weighted_average", "correlation", "regression", "none") )
dist_obj |
A |
ps |
A |
formula |
An R formula specifying the constraints (independent
variables) for the ordination, e.g. |
neg_correction |
One of |
top_features |
Integer scalar. Number of top features to return in associations (selected per constrained axis by absolute association, then unioned). Default is 30. |
permutations |
Integer scalar. Number of permutations for per-term
permutation tests via |
feature_assoc |
character scalar. Type of feature-axis association to
return. |
A list of class "beta_capscale" with elements:
sample_coords |
Tibble of sample scores on constrained axes
( |
eigenvalues |
Numeric vector of eigenvalues of the constrained axes. |
variance_partition |
Tibble with total inertia and inertia partitioned into constrained and unconstrained components, with their proportion of total. |
feature_associations |
Tibble of top feature-axis associations for
constrained axes (possibly empty if the |
r2 |
Numeric scalar. Unadjusted R-squared from
|
r2_adj |
Numeric scalar. Adjusted
R-squared from |
perm_terms |
Tibble of per-term permutation tests from
|
cap_model |
The full |
ps <- load_example_data("small_mixture") # compute distance matrix val_col <- "fold_change" dist_bc <- compute_distance( ps, value_col = val_col, method_normalization = "hellinger", distance = "bray", n_threads = 2L ) # pick a simple constraint that exists in the example data (fallback order) dat <- ps cand <- c("group", "big_group", "type_person", "sex", "age") rhs_var <- cand[cand %in% dplyr::tbl_vars(dat)][1] cap_res <- compute_capscale( dist_bc, ps = ps, formula = stats::as.formula(paste0("~ ", rhs_var)), neg_correction = "none", top_features = 30L ) cap_res$variance_partition cap_res$sample_coords cap_res$feature_associationsps <- load_example_data("small_mixture") # compute distance matrix val_col <- "fold_change" dist_bc <- compute_distance( ps, value_col = val_col, method_normalization = "hellinger", distance = "bray", n_threads = 2L ) # pick a simple constraint that exists in the example data (fallback order) dat <- ps cand <- c("group", "big_group", "type_person", "sex", "age") rhs_var <- cand[cand %in% dplyr::tbl_vars(dat)][1] cap_res <- compute_capscale( dist_bc, ps = ps, formula = stats::as.formula(paste0("~ ", rhs_var)), neg_correction = "none", top_features = 30L ) cap_res$variance_partition cap_res$sample_coords cap_res$feature_associations
Test for a global (antigen-wide) shift in peptide-level
prevalence between two groups within each (rank, feature, group_col)
stratum by aggregating per-peptide effects into a single Stouffer-type
statistic and assessing significance via label permutation.
compute_delta( x, rank_cols, group_cols, exist_col = "exist", paired_by = NULL, interaction = FALSE, combine_cols = NULL, interaction_sep = "::", B_permutations = 2000L, weight_mode = c("equal", "se_invvar", "n_eff_sqrt"), stat_mode = c("srlr", "diff", "asin", "score", "mcnemar", "srlr_paired"), perm_method = c("mid_p", "standard"), aggregate_stat = c("stouffer", "maxmean", "af"), strat_bins = c(0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5), winsor_z = 4, rank_feature_keep = NULL, peptide_library = NULL, log = FALSE, log_file = "compute_delta.log" )compute_delta( x, rank_cols, group_cols, exist_col = "exist", paired_by = NULL, interaction = FALSE, combine_cols = NULL, interaction_sep = "::", B_permutations = 2000L, weight_mode = c("equal", "se_invvar", "n_eff_sqrt"), stat_mode = c("srlr", "diff", "asin", "score", "mcnemar", "srlr_paired"), perm_method = c("mid_p", "standard"), aggregate_stat = c("stouffer", "maxmean", "af"), strat_bins = c(0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5), winsor_z = 4, rank_feature_keep = NULL, peptide_library = NULL, log = FALSE, log_file = "compute_delta.log" )
x |
A |
rank_cols |
Character vector of rank columns (e.g. |
group_cols |
Character vector of grouping columns that define
universes; pairwise group contrasts are constructed within each
|
exist_col |
Name of the 0/1 presence column. Default |
paired_by |
Optional single column name identifying paired samples (e.g. subject ID). When provided, paired tests are used where applicable. |
interaction |
Logical; reserved for future use (currently ignored). If
|
combine_cols |
Optional length-2 character vector; reserved for future
use. If non- |
interaction_sep |
Separator for interaction values. Default |
B_permutations |
Number of permutations |
weight_mode |
One of |
stat_mode |
One of |
perm_method |
One of |
aggregate_stat |
One of |
strat_bins |
Numeric vector of pooled prevalence cutpoints in (0, 1).
If |
winsor_z |
Winsorization threshold applied to peptide-level z-scores.
Values beyond |
rank_feature_keep |
Optional named list mapping |
peptide_library |
Optional data frame providing peptide annotations for
non-peptide ranks. Must at least contain |
log |
Logical; if |
log_file |
Path to a log file used by the logging helpers if |
For each stratum (rank, feature, group_col) that defines an ordered pair of
groups (g1, g2), the procedure is:
Peptide-level prevalences.
For each peptide, prevalences in g1 and g2 are
where x_k is the number of samples with presence (exist > 0) and n_k
is the number of distinct samples (or paired subjects).
Per-peptide effect and z-score (stat_mode).
"diff": effect with
and z-score (with a small floor on SE).
"asin": variance-stabilized difference of angles,
"score": pooled score z for equality of proportions (2×2 score test),
using raw proportions and pooled
:
"srlr": signed root likelihood ratio for :
where and
, .
"mcnemar" (paired-only): signed McNemar z based on discordant pairs
, :
If , z is defined as 0.
"srlr_paired" (paired-only): signed root deviance for the paired
binomial problem with :
Terms with are defined as 0; if , z is 0.
Each is then winsorized to winsor_z.
Weights (weight_mode).
"equal": all peptides get weight 1.
"se_invvar": weights proportional to the inverse standard error
(or the analogous denominator for "asin").
"n_eff_sqrt":
i.e. the square root of the expected number of positives across both groups.
Combine into a single test statistic (aggregate_stat).
Let be the weights and the peptide-level z-scores.
If aggregate_stat = "stouffer" and strat_bins = 0:
If aggregate_stat = "stouffer" and strat_bins is a numeric vector
of pooled prevalence cutpoints
(between 0 and 1), peptides are binned by pooled prevalence
a Stouffer statistic is computed within each bin. The
mean of the bin-level z-scores is reported as
. For example, if strat_bins includes 0.10
and 0.20, then peptides with pooled prevalence in (0.10, 0.20] share
a bin.
Multiplicity.
No multiple-testing correction is performed; p_perm is returned as
computed.
Input requirements.
exist_col is treated as 0/1 presence.
There must be at most one positive per
(subject_id, peptide_id, group_col, group_value); paired designs
can have up to two positives across the two group levels. Violations
trigger an error. Example (group levels A/B): for a single subject and
peptide, you may have A=1 and B=0 (or A=0 and B=1, or A=1 and B=1), but you
cannot have two rows both with A=1 (or two rows both with B=1).
Non-peptide ranks specified in rank_cols must be resolvable from a
peptide library (see peptide_library below).
Peptide library resolution.
When rank_cols includes non-"peptide_id" ranks, a peptide library is
required. It is resolved in the following order:
The explicit peptide_library argument.
x$peptide_library if x is a phip_data with an attached library.
get_peptide_library() from phiperio (always available as a dependency).
A tibble with one row per tested stratum:
rank, feature: rank and feature identifiers.
group_col, group1, group2: grouping column and the ordered pair of
groups compared.
design: "paired" or "unpaired".
n_subjects_paired: number of paired subjects used in a paired design
(or NA for unpaired).
n_peptides_used: number of peptides contributing to the test.
m_eff: effective number of peptides after weighting (as returned by the
C++ helper).
T_obs: observed Stouffer-type test statistic.
p_perm: two-sided permutation p-value.
b: number of permutations actually used (may be < B_permutations if
early stopping is implemented in the C++ helper).
max_delta, frac_delta_pos, frac_delta_pos_w:
maximum absolute peptide-level prevalence difference
and unweighted/weighted fractions of positive peptide-level deltas.
The permutation contrasts are evaluated either sequentially or in parallel
using the current future backend:
The function does not change the global future::plan().
If future is installed, the number of workers is inferred from
future::nbrOfWorkers(). To run in parallel, set up a future plan
(e.g. future::plan(multisession, workers = 8)) outside the function.
Internal C++ code is constrained to a single thread per R worker
(RcppParallel::setThreadOptions(numThreads = 1) and BLAS/OpenMP limits),
to avoid oversubscription when using multiple workers.
Reproducibility of permutations is controlled via the global RNG:
call set.seed() before compute_delta() to obtain reproducible
results; each contrast draws its own seed from the global RNG state.
# Load example PhIP-Seq data shipped with the package pd <- load_example_data() # Small unpaired subset with a mock peptide library pd_filt <- pd |> dplyr::filter( peptide_id %in% c("16627", "5243", "24799", "16196", "18003"), timepoint == "T1" ) |> dplyr::collect() mock_peplib <- data.frame( peptide_id = c("16627", "5243", "24799", "16196", "18003"), species = rep("mock_species", 5), stringsAsFactors = FALSE ) res <- compute_delta( x = pd_filt, exist_col = "exist", rank_cols = "species", group_cols = "group", peptide_library = mock_peplib, B_permutations = 500L, # smaller for speed weight_mode = "n_eff_sqrt", stat_mode = "asin", strat_bins = 0, winsor_z = Inf, rank_feature_keep = list(species = NULL), log = FALSE ) res# Load example PhIP-Seq data shipped with the package pd <- load_example_data() # Small unpaired subset with a mock peptide library pd_filt <- pd |> dplyr::filter( peptide_id %in% c("16627", "5243", "24799", "16196", "18003"), timepoint == "T1" ) |> dplyr::collect() mock_peplib <- data.frame( peptide_id = c("16627", "5243", "24799", "16196", "18003"), species = rep("mock_species", 5), stringsAsFactors = FALSE ) res <- compute_delta( x = pd_filt, exist_col = "exist", rank_cols = "species", group_cols = "group", peptide_library = mock_peplib, B_permutations = 500L, # smaller for speed weight_mode = "n_eff_sqrt", stat_mode = "asin", strat_bins = 0, winsor_z = Inf, rank_feature_keep = list(species = NULL), log = FALSE ) res
Computes distances of samples to group centroids (using
vegan::betadisper) and tests for differences in dispersion among
groups or time levels. Pairwise post-hoc tests are always performed when
group_col or time_col has >1 level.
compute_dispersion( dist_obj, ps, group_col = NULL, time_col = NULL, subject_col = "subject_id", permutations = 999, p_adjust = "none" )compute_dispersion( dist_obj, ps, group_col = NULL, time_col = NULL, subject_col = "subject_id", permutations = 999, p_adjust = "none" )
dist_obj |
A |
ps |
A |
group_col |
Name of the group factor column in |
time_col |
Name of the time factor column in |
subject_col |
Name of subject identifier column (for reference only;
not used directly in dispersion test calculations, but kept for API
consistency). Default |
permutations |
Number of permutations for significance testing in
|
p_adjust |
P-value adjustment method applied within each contrast scope.
Use |
A list of class "beta_dispersion" with:
distances |
Tibble of per-sample distances to centroid. Columns:
|
tests |
Tibble of dispersion test results. Columns: |
ps <- load_example_data("small_mixture") # compute distance matrix val_col <- "fold_change" dist_bc <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) dispersion_res <- compute_dispersion( dist_bc, ps = ps, group_col = "group", time_col = "timepoint", p_adjust = "BH" ) dispersion_res$tests head(dispersion_res$distances)ps <- load_example_data("small_mixture") # compute distance matrix val_col <- "fold_change" dist_bc <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) dispersion_res <- compute_dispersion( dist_bc, ps = ps, group_col = "group", time_col = "timepoint", p_adjust = "BH" ) dispersion_res$tests head(dispersion_res$distances)
This function builds a sample-by-feature abundance matrix from a
phip_data object (using ps$data_long), optionally normalizes
the matrix, and then computes pairwise distances between samples.
The normalized abundance matrix used for distance calculation is attached
to the returned dist object as attribute "abundances".
Note: this function pivots to a wide matrix in the database (via dbplyr) and then collects the result into memory. This can be large for big cohorts and/or large peptide sets.
compute_distance( ps, value_col = NULL, method_normalization = c("auto", "relative", "hellinger", "log", "none"), distance = "bray", n_threads = 1L )compute_distance( ps, value_col = NULL, method_normalization = c("auto", "relative", "hellinger", "log", "none"), distance = "bray", n_threads = 1L )
ps |
input data. either:
|
value_col |
character scalar. Name of the abundance column in
|
method_normalization |
character scalar. Normalization applied to the abundance matrix before distance computation. One of:
|
distance |
character scalar. Distance method name. The string is lowercased internally. Supported methods depend on which packages are installed:
If 'parallelDist' is installed but the requested method is not in the fast list above, the function falls back to vegan::vegdist(). |
n_threads |
integer scalar. Number of cpu threads passed to
|
a dist object of pairwise sample distances. The attribute
"abundances" contains the normalized abundance matrix used for the
calculation (rows are samples, columns are features).
# build an example <phip_data> object from the package example dataset ps <- load_example_data("small_mixture") # compute distances (needs either 'parallelDist' or 'vegan') val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, method_normalization = "hellinger", distance = "bray", n_threads = 2L ) a <- attr(d, "abundances") a[1:min(5, nrow(a)), 1:min(5, ncol(a)), drop = FALSE]# build an example <phip_data> object from the package example dataset ps <- load_example_data("small_mixture") # compute distances (needs either 'parallelDist' or 'vegan') val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, method_normalization = "hellinger", distance = "bray", n_threads = 2L ) a <- attr(d, "abundances") a[1:min(5, nrow(a)), 1:min(5, ncol(a)), drop = FALSE]
Performs PCoA on a distance matrix (typically from
compute_distance()), optionally correcting for negative eigenvalues,
and returns coordinates, eigenvalues, variance explained, and feature-axis
associations.
compute_pcoa( dist_obj, neg_correction = c("none", "lingoes", "cailliez"), n_axes = 5L )compute_pcoa( dist_obj, neg_correction = c("none", "lingoes", "cailliez"), n_axes = 5L )
dist_obj |
a |
neg_correction |
character scalar. Method for adjusting negative
eigenvalues (if any). One of |
n_axes |
integer scalar. Number of PCoA axes to return in the sample
scores. Must be > 0. Internally, |
Negative eigenvalues indicate that the distances are not perfectly euclidean.
If neg_correction is "lingoes" or "cailliez", a
correction is applied via vegan::wcmdscale(add = ...).
a list of class "beta_pcoa" with elements:
sample_coords: tibble with sample_id and columns
PCoA1, PCoA2, ... up to n_axes (or fewer if
n_samples - 1 is smaller).
eigenvalues: numeric vector of eigenvalues from the PCoA.
var_explained: one-row tibble with percent variance explained
by the returned axes and %Other. percentages are computed from the
sum of positive eigenvalues.
eigen_diagnostics: one-row tibble with eigenvalue diagnostics:
sum_negative, sum_positive, their ratio, the minimum
eigenvalue, and the number and fraction of negative eigenvalues.
correction_infos: one-row tibble describing any negative
eigenvalue correction applied, including corrected,
correction_method, correction_add, and
correction_note.
# compute a distance matrix with an attached abundance matrix # build an example <phip_data> object from the package example dataset ps <- load_example_data("small_mixture") # compute distances (needs either 'parallelDist' or 'vegan') val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) pcoa_res <- compute_pcoa(d, neg_correction = "none", n_axes = 3L) pcoa_res$sample_coords pcoa_res$var_explained# compute a distance matrix with an attached abundance matrix # build an example <phip_data> object from the package example dataset ps <- load_example_data("small_mixture") # compute distances (needs either 'parallelDist' or 'vegan') val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) pcoa_res <- compute_pcoa(d, neg_correction = "none", n_axes = 3L) pcoa_res$sample_coords pcoa_res$var_explained
Calculates feature-axis associations based on given PCoA
results (output of compute_pcoa())
compute_pcoa_feature_associations( dist_obj, pcoa_result, top_features = 30L, association_method = c("weighted_average", "correlation", "regression") )compute_pcoa_feature_associations( dist_obj, pcoa_result, top_features = 30L, association_method = c("weighted_average", "correlation", "regression") )
dist_obj |
a |
pcoa_result |
a list of class |
top_features |
integer scalar. Number of features to keep per axis when
reporting associations. Features are selected by taking the union of the
top |
association_method |
character scalar. Type of feature-axis association to
return. |
These feature associations are post-hoc summaries of how features relate to PCoA
axes. Weighted-average scores (association_method = "weighted_average") compute
t(X) %*% U / colSums(X), where X is the abundance matrix and
U are the sample coordinates. Correlation and regression associations
are computed between feature abundances and axis scores and are not "true"
PCA loadings unless distances are Euclidean and derived compatibly.
a tibble of feature-axis associations for the returned axes.
# compute a distance matrix with an attached abundance matrix # build an example <phip_data> object from the package example dataset ps <- load_example_data("small_mixture") # compute distances (needs either 'parallelDist' or 'vegan') val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) # Compute PCoA vectors on these distances pcoa_res <- compute_pcoa(d, neg_correction = "none", n_axes = 3L) feature_associations <- compute_pcoa_feature_associations(d, pcoa_res) feature_associations# compute a distance matrix with an attached abundance matrix # build an example <phip_data> object from the package example dataset ps <- load_example_data("small_mixture") # compute distances (needs either 'parallelDist' or 'vegan') val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) # Compute PCoA vectors on these distances pcoa_res <- compute_pcoa(d, neg_correction = "none", n_axes = 3L) feature_associations <- compute_pcoa_feature_associations(d, pcoa_res) feature_associations
Performs PERMANOVA (adonis2) on a distance matrix for overall group/time effects, and optionally conducts post-hoc pairwise or contrast tests (e.g., between each pair of groups, etc.). Supports stratified permutations for repeated measures.
compute_permanova( dist_obj, ps, group_col = NULL, time_col = NULL, subject_col = NULL, permutations = 999, p_adjust = "none" )compute_permanova( dist_obj, ps, group_col = NULL, time_col = NULL, subject_col = NULL, permutations = 999, p_adjust = "none" )
dist_obj |
A |
ps |
A |
group_col |
Name of the grouping column in |
time_col |
Name of the time factor column in |
subject_col |
Name of the subject identifier column in |
permutations |
Number of permutations for significance testing (default 999). |
p_adjust |
P-value adjustment method applied within each contrast scope.
Use |
Global PERMANOVA uses a model with main effects of group_col and
time_col and their interaction (if both are provided and have >1
level).
Samples with missing values in the constrained variables (group/time and,
when used for stratification, subject) are dropped before fitting the
model, and the distance matrix is subset to the remaining samples so that
distances and metadata are always aligned.
Pairwise post-hoc tests are always performed for group_col and
time_col (when present and with >1 level), using adonis2 with
appropriate subsetting and, where applicable, subject stratification.
Stratification via strata = subject is a simplified approach and does
not replace a full repeated-measures permutation design (e.g., via
permute::how()), especially for multi-level time with missing
timepoints.
P-values are optionally adjusted within each contrast scope (e.g.,
"group_pairwise", "time_pairwise") using
stats::p.adjust().
A tibble with columns:
scope |
The scope of the test (e.g., |
contrast |
Description of the contrast (e.g., |
term |
The term being tested. For global tests, this will be the factor
name (or interaction).
For post-hoc tests, it may be |
F_stat |
F statistic of the PERMANOVA test (for global tests and some contrasts where applicable). |
R2 |
R-squared (variance explained) for the term (global tests). |
p_value |
Permutation p-value for the test. |
p_adjust |
Adjusted p-value (within scope); equals |
n_perm |
Number of permutations used. |
ps <- load_example_data("small_mixture") # compute distance matrix val_col <- "fold_change" dist_bc <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) permanova_res <- compute_permanova( dist_bc, ps = ps, group_col = "group", time_col = "timepoint" ) permanova_res2 <- compute_permanova( dist_bc, ps = ps, group_col = "group", time_col = "timepoint", p_adjust = "BH" )ps <- load_example_data("small_mixture") # compute distance matrix val_col <- "fold_change" dist_bc <- compute_distance( ps, value_col = val_col, distance = "jaccard", n_threads = 2L ) permanova_res <- compute_permanova( dist_bc, ps = ps, group_col = "group", time_col = "timepoint" ) permanova_res2 <- compute_permanova( dist_bc, ps = ps, group_col = "group", time_col = "timepoint", p_adjust = "BH" )
Computes feature-level prevalence across binary group columns and performs pairwise statistical tests:
Unpaired: Fisher's exact test (2x2) per (rank, feature, group pair).
Paired: McNemar's exact binomial test per subject.
Presence per sample is determined by a k-of-n rule: a feature is considered
present in a sample if at least pop_k_min of its contributing peptides are
positive. Each group_col must have exactly 2 non-missing levels in the data.
compute_pop( x, rank_cols, group_cols, exist_col = "exist", pop_k_min = 1L, paired = FALSE, peptide_library = NULL )compute_pop( x, rank_cols, group_cols, exist_col = "exist", pop_k_min = 1L, paired = FALSE, peptide_library = NULL )
x |
A |
rank_cols |
Character vector of rank columns, e.g. |
group_cols |
Character vector of binary grouping columns. Each column must have exactly 2 non-missing levels in the data. |
exist_col |
Name of the binary presence column (default |
pop_k_min |
Integer >= 1; k-of-n POP threshold per sample (default 1). |
paired |
|
peptide_library |
Optional peptide metadata table with |
A data.frame with one row per (rank, feature, group_col) comparison:
rank, feature, group_col, group1, group2,
n1, N1, prop1, percent1,
n2, N2, prop2, percent2,
ratio, delta_ratio (unpaired only), p_raw, n_peptides.
A view column is prepended when the input carries a view attribute.
## Not run: res <- compute_pop(pd, rank_cols = "species", group_cols = "group") res ## End(Not run)## Not run: res <- compute_pop(pd, rank_cols = "species", group_cols = "group") res ## End(Not run)
Performs t-distributed stochastic neighbor embedding (t-SNE) on a sample distance matrix to create low-dimensional embeddings for visualization. Returns t-SNE coordinates with optional sample metadata attached.
compute_tsne( ps, dist_obj, dims = 3L, perplexity = 30, theta = 0.5, max_iter = 1000L, meta_cols = NULL, seed = NULL, check_duplicates = FALSE, ... )compute_tsne( ps, dist_obj, dims = 3L, perplexity = 30, theta = 0.5, max_iter = 1000L, meta_cols = NULL, seed = NULL, check_duplicates = FALSE, ... )
ps |
A |
dist_obj |
A sample distance object. Either:
|
dims |
Integer scalar. Number of t-SNE dimensions to compute (2 or 3). Default is 3 to enable both 2D and 3D visualizations. |
perplexity |
Numeric scalar. Perplexity parameter for t-SNE. Must be smaller than the number of samples. If too large, it is automatically reduced with a warning. |
theta |
Numeric scalar. Speed/accuracy tradeoff for Barnes-Hut approximation. Default is 0.5. |
max_iter |
Integer scalar. Maximum number of t-SNE iterations. Default is 1000. |
meta_cols |
Character vector. Column names from |
seed |
Integer scalar. Random seed for reproducibility. If provided, the seed is temporarily set and restored after computation. |
check_duplicates |
Logical scalar. Whether to check for duplicate
points. Default is |
... |
Additional arguments passed to |
This function runs t-SNE in distance mode (is_distance = TRUE),
using the supplied distance matrix directly. The distance computation
should be performed separately using compute_distance().
t-SNE is a visualization method: it preserves local neighborhoods rather than global distances, axes are not directly interpretable, and embeddings can change with different seeds or perplexity settings.
Samples with missing values in metadata columns are retained in the t-SNE
result but will have NA values for those metadata columns.
A tibble of class c("phip_tsne", "tbl_df", "tbl",
"data.frame")
with columns:
sample_id: Sample identifier from distance labels.
tSNE1, tSNE2: First two t-SNE dimensions.
tSNE3: Third dimension if dims >= 3, otherwise
NA.
Additional metadata columns as specified in meta_cols.
Attributes:
"distance": The original dist_obj.
"tsne_params": List of t-SNE parameters and function call.
"meta_cols": Character vector of metadata columns used.
# Build example phip_data object ps <- load_example_data("small_mixture") # Compute distance matrix val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, method_normalization = "hellinger", distance = "bray", n_threads = 2L ) # Compute t-SNE embeddings tsne_res <- compute_tsne( ps = ps, dist_obj = d, dims = 3L, perplexity = 15, meta_cols = c("subject_id", "timepoint"), seed = 42 ) # View results head(tsne_res)# Build example phip_data object ps <- load_example_data("small_mixture") # Compute distance matrix val_col <- "fold_change" d <- compute_distance( ps, value_col = val_col, method_normalization = "hellinger", distance = "bray", n_threads = 2L ) # Compute t-SNE embeddings tsne_res <- compute_tsne( ps = ps, dist_obj = d, dims = 3L, perplexity = 15, meta_cols = c("subject_id", "timepoint"), seed = 42 ) # View results head(tsne_res)
Build a static ggplot showing the per-peptide shift in prevalence
() as a function of pooled prevalence
(). The input should be a tibble/data frame produced
by ph_compute_prevalence() or equivalent with columns
group1, group2, prop1, and prop2.
deltaplot( prev_tbl, group_pair_values = NULL, group_labels = NULL, point_jitter_width = 0.005, point_jitter_height = 0.005, point_alpha = 0.25, point_size = 0.6, add_smooth = TRUE, smooth_k = 5, arrow_color = "red", arrow_head_length_mm = 4, arrow_x_frac = 0.97, plot_title = NULL, plot_subtitle = NULL, x_label = NULL, y_label = NULL )deltaplot( prev_tbl, group_pair_values = NULL, group_labels = NULL, point_jitter_width = 0.005, point_jitter_height = 0.005, point_alpha = 0.25, point_size = 0.6, add_smooth = TRUE, smooth_k = 5, arrow_color = "red", arrow_head_length_mm = 4, arrow_x_frac = 0.97, plot_title = NULL, plot_subtitle = NULL, x_label = NULL, y_label = NULL )
prev_tbl |
Data frame with columns |
group_pair_values |
Optional length-2 character vector
|
group_labels |
Optional length-2 character vector of display labels
|
point_jitter_width, point_jitter_height
|
Jitter amounts for the points. Defaults 0.005. |
point_alpha |
Point transparency. Default 0.25. |
point_size |
Point size. Default 0.6. |
add_smooth |
Add a GAM smooth curve ( |
smooth_k |
Basis dimension |
arrow_color |
Color for the directional arrows and labels. Default
|
arrow_head_length_mm |
Arrow head length in mm. Default 4. |
arrow_x_frac |
Arrow X position as a fraction of the max pooled prevalence. Default 0.97 (near the right edge). |
plot_title, plot_subtitle
|
Optional plot labels for the title/subtitle. |
x_label, y_label
|
Optional axis labels. Defaults are generated from the group labels. |
The plot places each feature (peptide) as a point at:
x-axis: pooled prevalence (prop1 + prop2)/2
y-axis: prevalence shift (prop2 - prop1)
Points are optionally jittered for visibility. A dashed horizontal line marks
. Optional arrows and labels indicate the direction of
increased prevalence for group1 vs group2. If
add_smooth = TRUE, a GAM smooth is overlaid to summarize the trend
across pooled prevalence.
A ggplot object.
set.seed(1) n <- 40 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- deltaplot( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B") ) print(p)set.seed(1) n <- 40 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- deltaplot( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B") ) print(p)
Build an interactive plotly chart showing the per-peptide shift in prevalence
() as a function of pooled prevalence
(). The input should be a tibble/data frame produced
by ph_compute_prevalence() or equivalent with columns
group1, group2, prop1, and prop2.
deltaplot_interactive( prev_tbl, group_pair_values = NULL, group_labels = NULL, point_alpha = 0.6, point_size = 6, add_smooth = TRUE, smooth_k = 5, arrow_color = "red", arrow_x_frac = 0.97, arrow_length_frac = 0.3, label_x_gap_frac = 0.03, label_y_gap_frac = 0.02, plot_title = NULL, plot_subtitle = NULL, point_jitter_width = 0.005, point_jitter_height = 0.005 )deltaplot_interactive( prev_tbl, group_pair_values = NULL, group_labels = NULL, point_alpha = 0.6, point_size = 6, add_smooth = TRUE, smooth_k = 5, arrow_color = "red", arrow_x_frac = 0.97, arrow_length_frac = 0.3, label_x_gap_frac = 0.03, label_y_gap_frac = 0.02, plot_title = NULL, plot_subtitle = NULL, point_jitter_width = 0.005, point_jitter_height = 0.005 )
prev_tbl |
Data frame with columns |
group_pair_values |
Optional length-2 character vector
|
group_labels |
Optional length-2 character vector of display labels
|
point_alpha |
Point transparency. Default 0.6. |
point_size |
Point size. Default 6. |
add_smooth |
Add a GAM smooth curve ( |
smooth_k |
Basis dimension |
arrow_color |
Color for the directional arrows and labels. Default
|
arrow_x_frac |
Arrow X position as a fraction of the x-range. Default 0.97. |
arrow_length_frac |
Arrow length as a fraction of the y-range. Default 0.30. |
label_x_gap_frac |
Horizontal label offset as a fraction of the x-range. |
label_y_gap_frac |
Vertical label offset as a fraction of the y-range. |
plot_title, plot_subtitle
|
Optional plot labels for the title/subtitle. |
point_jitter_width, point_jitter_height
|
Jitter amounts. Defaults 0.005. |
The plot places each feature (peptide) as a point at:
x-axis: pooled prevalence (prop1 + prop2)/2
y-axis: prevalence shift (prop2 - prop1)
Points are optionally jittered for display, and hover text includes the
feature identifier plus prevalence (percent and proportion) and counts
(n1, N1, n2, N2) when available in the input
table. A dashed horizontal line marks . Optional arrows and
labels indicate the direction of increased prevalence for group1 vs
group2. If add_smooth = TRUE, a GAM smooth is overlaid to
summarize the trend.
A plotly object.
## Not run: set.seed(2) n <- 40 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- deltaplot_interactive( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B"), add_smooth = FALSE ) p ## End(Not run)## Not run: set.seed(2) n <- 40 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- deltaplot_interactive( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B"), add_smooth = FALSE ) p ## End(Not run)
Plot empirical cumulative distribution functions (ECDFs) of
per-peptide prevalence for two groups using a
ph_compute_prevalence()-style table. The plot compares the cumulative
distribution of prevalence values between the two groups and optionally
annotates median shifts and a Kolmogorov-Smirnov (KS) test summary.
ecdf_plot( prev_tbl, group_pair_values = NULL, group_labels = NULL, line_width_pt = 1, line_alpha = 1, group1_line_color = "#1f77b4", group2_line_color = "#d62728", show_median_lines = TRUE, show_ks_test = TRUE, plot_title = NULL, plot_subtitle = NULL, x_label = NULL, y_label = NULL )ecdf_plot( prev_tbl, group_pair_values = NULL, group_labels = NULL, line_width_pt = 1, line_alpha = 1, group1_line_color = "#1f77b4", group2_line_color = "#d62728", show_median_lines = TRUE, show_ks_test = TRUE, plot_title = NULL, plot_subtitle = NULL, x_label = NULL, y_label = NULL )
prev_tbl |
Data frame with columns |
group_pair_values |
Optional length-2 character vector |
group_labels |
Optional length-2 character vector of display labels
|
line_width_pt |
Line width for ECDF steps (ggplot units). Default 1.0. |
line_alpha |
Line alpha for ECDF steps. Default 1.0. |
group1_line_color, group2_line_color
|
Line colors for group1 and group2. |
show_median_lines |
Logical; add median lines. Default |
show_ks_test |
Logical; add KS test summary to subtitle. Default
|
plot_title, plot_subtitle
|
Optional plot labels. |
x_label, y_label
|
Optional axis labels. |
Each group is represented by a step function showing the fraction of features with prevalence less than or equal to a given value. Vertical median lines can be added for each group, and the subtitle can include the KS statistic and p-value along with the median difference.
A ggplot object.
set.seed(4) n <- 50 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- ecdf_plot( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B"), show_ks_test = TRUE ) print(p)set.seed(4) n <- 50 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- ecdf_plot( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B"), show_ks_test = TRUE ) print(p)
Plotly version of ecdf_prevalence(), showing ECDF curves for two
groups based on per-peptide prevalence values. The plot can annotate median
shifts and an optional KS test summary.
ecdf_plot_interactive( prev_tbl, group_pair_values = NULL, group_labels = NULL, line_width_px = 2, line_alpha = 1, group1_line_color = "#1f77b4", group2_line_color = "#d62728", show_median_lines = TRUE, show_ks_test = TRUE, plot_title = NULL, plot_subtitle = NULL )ecdf_plot_interactive( prev_tbl, group_pair_values = NULL, group_labels = NULL, line_width_px = 2, line_alpha = 1, group1_line_color = "#1f77b4", group2_line_color = "#d62728", show_median_lines = TRUE, show_ks_test = TRUE, plot_title = NULL, plot_subtitle = NULL )
prev_tbl |
Data frame with columns |
group_pair_values |
Optional length-2 character vector |
group_labels |
Optional length-2 character vector of display labels
|
line_width_px |
Line width for ECDF steps (plotly units). Default 2.0. |
line_alpha |
Line alpha for ECDF steps. Default 1.0. |
group1_line_color, group2_line_color
|
Line colors for group1 and group2. |
show_median_lines |
Logical; add median lines. Default |
show_ks_test |
Logical; add KS test summary to subtitle. Default
|
plot_title, plot_subtitle
|
Optional plot labels. |
Each group is represented by a step curve showing the cumulative fraction of features with prevalence less than or equal to a given value. Vertical median lines can be added for each group, and the subtitle can include the KS statistic and p-value with the median difference.
A plotly object.
## Not run: set.seed(5) n <- 50 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- ecdf_plot_interactive( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B"), show_ks_test = TRUE ) p ## End(Not run)## Not run: set.seed(5) n <- 50 prev_tbl <- data.frame( feature = paste0("pep", seq_len(n)), group1 = "A", group2 = "B", prop1 = runif(n), prop2 = runif(n) ) p <- ecdf_plot_interactive( prev_tbl, group_pair_values = c("A", "B"), group_labels = c("Group A", "Group B"), show_ks_test = TRUE ) p ## End(Not run)
Builds a forest plot highlighting the most extreme positive and negative
DELTA/Stouffer statistics within a chosen rank. The plot shows the top
n_pos_each features on the positive side and the top n_neg_each
features on the negative side, ordered by the selected statistic
(statistic_to_plot). This provides a compact summary of which features
shift most strongly between the two groups for a given rank.
forestplot( results_tbl, rank_of_interest, statistic_to_plot = c("T", "T_stand", "Z_from_p"), n_neg_each = 15, n_pos_each = 15, filter_significant = "none", sig_level = 0.05, left_label = "More in group1", right_label = "More in group2", arrow_length_frac = 0.35, label_x_gap_frac = 0.06, y_pad = 0.6, label_y_offset = 0, label_vjust = -0.3, arrow_color = "red", arrow_linewidth = 0.6, arrow_head_length_mm = 3, use_diverging_colors = FALSE, base_text_pt = 12, font_family = "Montserrat", seg_width = 1.2, point_size = 3.6, show_grid = FALSE )forestplot( results_tbl, rank_of_interest, statistic_to_plot = c("T", "T_stand", "Z_from_p"), n_neg_each = 15, n_pos_each = 15, filter_significant = "none", sig_level = 0.05, left_label = "More in group1", right_label = "More in group2", arrow_length_frac = 0.35, label_x_gap_frac = 0.06, y_pad = 0.6, label_y_offset = 0, label_vjust = -0.3, arrow_color = "red", arrow_linewidth = 0.6, arrow_head_length_mm = 3, use_diverging_colors = FALSE, base_text_pt = 12, font_family = "Montserrat", seg_width = 1.2, point_size = 3.6, show_grid = FALSE )
results_tbl |
Data frame/tibble with at least: |
rank_of_interest |
Character scalar specifying the rank to plot (e.g.,
|
statistic_to_plot |
Which statistic to rank/plot: |
n_neg_each |
Number of most negative features to show. Default 15. |
n_pos_each |
Number of most positive features to show. Default 15. |
filter_significant |
Column name to filter on, or |
sig_level |
Significance threshold used when |
left_label |
Text for the left arrow/side label. |
right_label |
Text for the right arrow/side label. |
arrow_length_frac |
Fraction of max |
label_x_gap_frac |
Horizontal gap for arrow labels beyond arrow tips
(fraction of max |
y_pad |
Vertical padding above the top category for arrows/labels (y-axis units). |
label_y_offset |
Additional vertical offset for arrow-end labels (y-axis units). |
label_vjust |
Vertical justification for arrow labels (passed to
|
arrow_color |
Arrow/label color. |
arrow_linewidth |
Arrow line width. |
arrow_head_length_mm |
Arrow head length in mm. |
use_diverging_colors |
Logical; if |
base_text_pt |
Base text size in points for the plot. |
font_family |
Font family for plot text. |
seg_width |
Segment line width for the horizontal bars. |
point_size |
Point size for feature markers. |
show_grid |
Logical; show major/minor grid lines. |
The y-axis lists feature names (sorted by the chosen statistic), and the x-axis shows the signed effect size for each feature. Each feature is drawn as a horizontal segment from zero to its statistic value, with a point at the end of the segment. A dashed vertical line marks zero to separate negative from positive shifts. The subtitle reports the contrast (group1 vs group2) and how many negative/positive features are shown.
If use_diverging_colors = TRUE, segments/points are colored by
magnitude on a blue-to-red scale (negative to positive), otherwise a single
color is used. The arrow annotations at the top label the direction of
enrichment for each group and help interpret the sign of the statistic.
statistic_to_plot controls the statistic used for both ranking and
plotting: raw T_obs, permutation-standardized T_obs_stand, or
Z_from_p (signed Z from permutation p-values). For calibrated
inference or cross-figure comparability, prefer permutation Z-based results
or T_stand.
A list with:
data - tibble used for plotting (selected top/bottom items),
plot - ggplot object.
# in this example we mock the output of compute_delta with a simple # data.frame - it works the same set.seed(1) n <- 20 results_tbl <- data.frame( rank = rep("species", n), feature = paste0("feat_", seq_len(n)), group1 = "control", group2 = "treated", design = "case-control", T_obs = rnorm(n, sd = 2), p_perm = runif(n), T_obs_stand = rnorm(n), Z_from_p = qnorm(1 - runif(n) / 2) * sign(rnorm(n)) ) out <- forestplot( results_tbl, rank_of_interest = "species", statistic_to_plot = "T", n_neg_each = 5, n_pos_each = 5, left_label = "More in control", right_label = "More in treated", use_diverging_colors = TRUE, font_family = "sans" ) print(out$plot)# in this example we mock the output of compute_delta with a simple # data.frame - it works the same set.seed(1) n <- 20 results_tbl <- data.frame( rank = rep("species", n), feature = paste0("feat_", seq_len(n)), group1 = "control", group2 = "treated", design = "case-control", T_obs = rnorm(n, sd = 2), p_perm = runif(n), T_obs_stand = rnorm(n), Z_from_p = qnorm(1 - runif(n) / 2) * sign(rnorm(n)) ) out <- forestplot( results_tbl, rank_of_interest = "species", statistic_to_plot = "T", n_neg_each = 5, n_pos_each = 5, left_label = "More in control", right_label = "More in treated", use_diverging_colors = TRUE, font_family = "sans" ) print(out$plot)
Plotly version of forestplot() that highlights the most extreme
positive and negative features within a chosen rank. The plot shows the top
n_pos_each features on the positive side and the top n_neg_each
features on the negative side, ordered by the selected statistic
(statistic_to_plot).
forestplot_interactive( results_tbl, rank_of_interest, statistic_to_plot = c("T", "T_stand", "Z_from_p"), n_neg_each = 15, n_pos_each = 15, filter_significant = "none", sig_level = 0.05, left_label = "More in group1", right_label = "More in group2", arrow_length_frac = 0.35, label_x_gap_frac = 0.06, label_y_offset = 0, arrow_color = "red", arrow_linewidth = 0.6, arrow_head_length_mm = 3, use_diverging_colors = FALSE, show_grid = FALSE, base_text_pt = 12, font_family = "Montserrat", seg_width = 1.6, point_size = 11 )forestplot_interactive( results_tbl, rank_of_interest, statistic_to_plot = c("T", "T_stand", "Z_from_p"), n_neg_each = 15, n_pos_each = 15, filter_significant = "none", sig_level = 0.05, left_label = "More in group1", right_label = "More in group2", arrow_length_frac = 0.35, label_x_gap_frac = 0.06, label_y_offset = 0, arrow_color = "red", arrow_linewidth = 0.6, arrow_head_length_mm = 3, use_diverging_colors = FALSE, show_grid = FALSE, base_text_pt = 12, font_family = "Montserrat", seg_width = 1.6, point_size = 11 )
results_tbl |
Data frame/tibble with at least: |
rank_of_interest |
Character scalar specifying the rank to plot (e.g.,
|
statistic_to_plot |
Which statistic to rank/plot: |
n_neg_each |
Number of most negative features to show. Default 15. |
n_pos_each |
Number of most positive features to show. Default 15. |
filter_significant |
Column name to filter on, or |
sig_level |
Significance threshold used when |
left_label |
Text for the left arrow/side label. |
right_label |
Text for the right arrow/side label. |
arrow_length_frac |
Fraction of max |
label_x_gap_frac |
Horizontal label gap for arrow labels beyond arrow
tips (fraction of max |
label_y_offset |
Additional vertical offset for arrow-end labels (y-axis units). |
arrow_color |
Arrow/label color. |
arrow_linewidth |
Arrow line width (plotly units). |
arrow_head_length_mm |
Arrow head size (approximate, plotly units). |
use_diverging_colors |
Logical; if |
show_grid |
Logical; show grid lines. |
base_text_pt |
Base text size. |
font_family |
Font family name. |
seg_width |
Segment line width. |
point_size |
Point size. |
The y-axis lists feature names (sorted by the chosen statistic), and the x-axis shows the signed effect size for each feature. Each feature is drawn as a horizontal segment from zero to its statistic value, with a point at the end of the segment. A dashed vertical line marks zero to separate negative from positive shifts. The title and subtitle report the contrast and how many features are shown.
If use_diverging_colors = TRUE, segments/points are colored by
magnitude on a blue-to-red scale (negative to positive), otherwise a single
color is used. Arrow annotations label the direction of enrichment for each
group.
statistic_to_plot controls the statistic used for both ranking and
plotting: raw T_obs, permutation-standardized T_obs_stand, or
Z_from_p (signed Z from permutation p-values).
A list with data and plot (plotly object).
set.seed(1) n <- 20 results_tbl <- data.frame( rank = rep("species", n), feature = paste0("feat_", seq_len(n)), group1 = "control", group2 = "treated", design = "case-control", T_obs = rnorm(n, sd = 2), p_perm = runif(n), T_obs_stand = rnorm(n), Z_from_p = qnorm(1 - runif(n) / 2) * sign(rnorm(n)) ) out <- forestplot_interactive( results_tbl, rank_of_interest = "species", statistic_to_plot = "T", n_neg_each = 5, n_pos_each = 5, left_label = "More in control", right_label = "More in treated", use_diverging_colors = TRUE, label_x_gap_frac = 0, label_y_offset = -0.05 ) out$plotset.seed(1) n <- 20 results_tbl <- data.frame( rank = rep("species", n), feature = paste0("feat_", seq_len(n)), group1 = "control", group2 = "treated", design = "case-control", T_obs = rnorm(n, sd = 2), p_perm = runif(n), T_obs_stand = rnorm(n), Z_from_p = qnorm(1 - runif(n) / 2) * sign(rnorm(n)) ) out <- forestplot_interactive( results_tbl, rank_of_interest = "species", statistic_to_plot = "T", n_neg_each = 5, n_pos_each = 5, left_label = "More in control", right_label = "More in treated", use_diverging_colors = TRUE, label_x_gap_frac = 0, label_y_offset = -0.05 ) out$plot
Return the path to an example dataset shipped with phiperio,
suitable for use with load_example_data or
convert_standard.
get_example_path(name = c("phip_mixture"))get_example_path(name = c("phip_mixture"))
name |
Character scalar. Name of the example dataset.
Currently supported: |
A character scalar with an absolute path to the file.
sim_path <- get_example_path("phip_mixture") # phip_obj <- convert_standard(sim_path)sim_path <- get_example_path("phip_mixture") # phip_obj <- convert_standard(sim_path)
Convenience helper to quickly load a shipped example dataset
("phip_mixture") into a <phip_data> object, suitable for downstream
analysis and visualization. This function wraps
convert_standard, automatically supplying the
correct parameters for the included example data.
load_example_data(name = c("phip_mixture", "small_mixture"))load_example_data(name = c("phip_mixture", "small_mixture"))
name |
Character scalar. Name of the shipped example dataset.
Currently supported: |
A <phip_data> object created from the chosen example dataset.
# Load the example data shipped with the package: ex <- load_example_data() # ex is now a <phip_data> object ready for analysis # Specify the dataset name explicitly ex2 <- load_example_data("small_mixture")# Load the example data shipped with the package: ex <- load_example_data() # ex is now a <phip_data> object ready for analysis # Specify the dataset name explicitly ex2 <- load_example_data("small_mixture")
A concise, publication-ready qualitative palette tailored for PHIP figures. Colors are stable across sessions to keep styling reproducible.
phip_palettephip_palette
A character vector of hex colors.
Plot alpha diversity metrics from the precomputed output of
compute_alpha(). Supports filtering groups/ranks and optional
faceting by rank. If the input contains an interaction table, you can request
only that via interaction_only = TRUE.
plot_alpha( x, metric = c("richness", "shannon_diversity", "simpson_diversity", "pielou_evenness", "berger_parker_dominance"), group_col = "group", rank_col = "rank", filter_groups = NULL, filter_ranks = NULL, custom_colors = NULL, facet_by_rank = TRUE, ncol = 2, facet_scales = "fixed", interaction_only = FALSE, interaction_sep = NULL, jitter_width = 0.25, point_size = 1.8, point_alpha = 0.7, text_size = 12, font_family = "Montserrat", show_grids = TRUE, x_order = NULL, x_labels = NULL, y_range = NULL, x_tickangle = 0, significance = NULL, show_significance = FALSE, sig_p_threshold = 0.05, sig_step_increase = 0.05, sig_tip_length = 0.01, ... )plot_alpha( x, metric = c("richness", "shannon_diversity", "simpson_diversity", "pielou_evenness", "berger_parker_dominance"), group_col = "group", rank_col = "rank", filter_groups = NULL, filter_ranks = NULL, custom_colors = NULL, facet_by_rank = TRUE, ncol = 2, facet_scales = "fixed", interaction_only = FALSE, interaction_sep = NULL, jitter_width = 0.25, point_size = 1.8, point_alpha = 0.7, text_size = 12, font_family = "Montserrat", show_grids = TRUE, x_order = NULL, x_labels = NULL, y_range = NULL, x_tickangle = 0, significance = NULL, show_significance = FALSE, sig_p_threshold = 0.05, sig_step_increase = 0.05, sig_tip_length = 0.01, ... )
x |
a |
metric |
one of |
group_col |
name of the grouping column in the alpha table
(default |
rank_col |
name of the rank column (default |
filter_groups |
optional character vector; keep only these group levels. |
filter_ranks |
optional character vector; keep only these ranks. |
custom_colors |
optional named vector of colors for groups. |
facet_by_rank |
logical; facet by |
ncol |
integer; number of columns in facet wrap (default |
facet_scales |
|
interaction_only |
logical; if |
interaction_sep |
character; separator used to join interaction labels.
default is taken from |
jitter_width |
Numeric; horizontal jitter width for points. |
point_size |
Numeric; size of jittered points. |
point_alpha |
Numeric in (0,1); alpha for jittered points. |
text_size |
Numeric; base text size for plot labels. |
font_family |
Character; font family for plot text. |
show_grids |
Logical; whether to show panel grid lines. |
x_order |
Optional character vector specifying the x-axis order. |
x_labels |
Optional named character vector mapping x-axis labels. |
y_range |
Optional numeric length-2 vector for y-axis limits. |
x_tickangle |
Numeric; x-axis label angle in degrees. |
significance |
Optional |
show_significance |
Logical; if |
sig_p_threshold |
Numeric; only pairs with |
sig_step_increase |
Numeric; vertical step between stacked brackets.
Passed to |
sig_tip_length |
Numeric; length of bracket tips. Default |
... |
Reserved for future extensions; ignored. |
x can be:
the named list returned by compute_alpha() (class
"phip_alpha_diversity"), or
a single data frame taken from that list.
when interaction_only = TRUE, the function tries to select the element
named by paste(attr(x, "group_cols"), collapse = interaction_sep). if not
found, it falls back to the first element whose name contains the separator.
a ggplot object.
## Not run: # precomputed alpha (list) -> boxplot per group p <- plot_alpha(alpha_list, metric = "richness", group_col = "Cohort") # only the interaction table (if available) p_int <- plot_alpha( alpha_list, metric = "shannon_diversity", group_col = "Cohort * timepoint", interaction_only = TRUE ) ## End(Not run)## Not run: # precomputed alpha (list) -> boxplot per group p <- plot_alpha(alpha_list, metric = "richness", group_col = "Cohort") # only the interaction table (if available) p_int <- plot_alpha( alpha_list, metric = "shannon_diversity", group_col = "Cohort * timepoint", interaction_only = TRUE ) ## End(Not run)
native plotly version of plot_alpha(). mirrors the ggplot look:
per-group boxplots with jittered points and optional faceting by rank.
plot_alpha_interactive( x, metric = c("richness", "shannon_diversity", "simpson_diversity", "pielou_evenness", "berger_parker_dominance"), group_col = "group", rank_col = "rank", filter_groups = NULL, filter_ranks = NULL, custom_colors = NULL, facet_by_rank = TRUE, ncol = 2, facet_scales = "fixed", interaction_only = FALSE, interaction_sep = NULL, x_order = NULL, x_labels = NULL, y_range = NULL, x_tickangle = 0, quartile_method = c("exclusive", "inclusive", "linear"), jitter_width = 0.25, point_size = 6, point_alpha = 0.85, text_size = 12, font_family = "Montserrat", show_grids = TRUE, ... )plot_alpha_interactive( x, metric = c("richness", "shannon_diversity", "simpson_diversity", "pielou_evenness", "berger_parker_dominance"), group_col = "group", rank_col = "rank", filter_groups = NULL, filter_ranks = NULL, custom_colors = NULL, facet_by_rank = TRUE, ncol = 2, facet_scales = "fixed", interaction_only = FALSE, interaction_sep = NULL, x_order = NULL, x_labels = NULL, y_range = NULL, x_tickangle = 0, quartile_method = c("exclusive", "inclusive", "linear"), jitter_width = 0.25, point_size = 6, point_alpha = 0.85, text_size = 12, font_family = "Montserrat", show_grids = TRUE, ... )
x |
a |
metric |
one of |
group_col |
name of the grouping column in the alpha table
(default |
rank_col |
name of the rank column (default |
filter_groups |
optional character vector; keep only these group levels. |
filter_ranks |
optional character vector; keep only these ranks. |
custom_colors |
named character vector of hex colors for groups (like ggplot scale_fill_manual()). |
facet_by_rank |
logical; facet by |
ncol |
integer; number of columns in facet wrap (default |
facet_scales |
|
interaction_only |
logical; if |
interaction_sep |
character; separator used to join interaction labels.
default is taken from |
x_order |
optional character vector with desired group order (levels). |
x_labels |
optional named character vector mapping group -> label (used on x axis). |
y_range |
optional numeric length-2; y axis range (e.g., c(0, 2300)). |
x_tickangle |
numeric; tick label rotation in degrees (default 0; e.g., 25). |
quartile_method |
one of c("exclusive","inclusive","linear"); passed to plotly box (default "exclusive" ~ ggplot). |
jitter_width |
Numeric; horizontal jitter width for points. |
point_size |
Numeric; size of jittered points. |
point_alpha |
Numeric in (0,1); alpha for jittered points. |
text_size |
Numeric; base text size for plot labels. |
font_family |
Character; font family for plot text. |
show_grids |
Logical; whether to show panel grid lines. |
... |
Reserved for future extensions; ignored. |
A plotly htmlwidget.
## Not run: plot_alpha_interactive(df, metric = "richness", group_col = "group") ## End(Not run)## Not run: plot_alpha_interactive(df, metric = "richness", group_col = "group") ## End(Not run)
Summarises the pairwise significance table from compute_alpha_significance()
either as a filtered tibble (type = "table") or a Cohen's d heatmap with
significance annotations (type = "heatmap").
plot_alpha_significance( x, metric = NULL, rank = NULL, type = c("table", "heatmap"), p_threshold = 0.05, ... )plot_alpha_significance( x, metric = NULL, rank = NULL, type = c("table", "heatmap"), p_threshold = 0.05, ... )
x |
A |
metric |
Character scalar; metric to display. Defaults to the first
metric present in |
rank |
Character scalar; rank to display. Defaults to the first rank
present in |
type |
One of |
p_threshold |
Numeric; pairs with |
... |
Reserved; ignored. |
type = "table": a tibble::tibble of significant pairwise results.
type = "heatmap": a ggplot heatmap (Cohen's d fill, stars overlaid).
## Not run: sig <- compute_alpha_significance(alpha_list) plot_alpha_significance(sig, type = "table") plot_alpha_significance(sig, type = "heatmap") ## End(Not run)## Not run: sig <- compute_alpha_significance(alpha_list) plot_alpha_significance(sig, type = "table") plot_alpha_significance(sig, type = "heatmap") ## End(Not run)
Creates a 2D scatter plot from a "beta_capscale" object (output of
compute_capscale()), showing sample scores on CAP axes. Points can be
coloured by group and shaped by (categorical) time. Optional ellipses and
centroids can be overlaid for group/time/group*time summaries.
plot_cap( cap_res, axes = c(1, 2), group_col = NULL, time_col = NULL, show_centroids = TRUE, centroid_by = c("auto", "group", "time", "group_time"), connect_centroids = c("none", "group", "time"), show_ellipses = TRUE, ellipse_by = c("group"), ellipse_type = c("t", "norm", "euclid"), ellipse_level = 0.95, point_size = 1.5, point_alpha = 0.6, centroid_size = 3 )plot_cap( cap_res, axes = c(1, 2), group_col = NULL, time_col = NULL, show_centroids = TRUE, centroid_by = c("auto", "group", "time", "group_time"), connect_centroids = c("none", "group", "time"), show_ellipses = TRUE, ellipse_by = c("group"), ellipse_type = c("t", "norm", "euclid"), ellipse_level = 0.95, point_size = 1.5, point_alpha = 0.6, centroid_size = 3 )
cap_res |
A |
axes |
Integer vector of length 2 giving the CAP axes to plot
(e.g., |
group_col |
Optional name of a grouping column in
|
time_col |
Optional name of a categorical time column in
|
show_centroids |
Logical; if |
centroid_by |
How to define centroids. One of
|
connect_centroids |
How to connect centroids with lines. One of
|
show_ellipses |
Logical; if |
ellipse_by |
Character vector describing which ellipses to draw.
Any combination of |
ellipse_type |
Ellipse type passed to |
ellipse_level |
Numeric confidence level for ellipses (default |
point_size |
Numeric size of sample points. Default |
point_alpha |
Numeric opacity of sample points in |
centroid_size |
Numeric size for centroid points. Default |
Axis labels are annotated with the percentage of constrained variance
explained by each CAP axis, computed as
, where are the
constrained eigenvalues from cap_res$eigenvalues. Only positive parts
of eigenvalues are used when computing percentages.
Styled with theme_phip().
Typically you will want to join sample-level metadata into
cap_res$sample_coords before plotting, e.g.:
cap_res$sample_coords <- dplyr::left_join( cap_res$sample_coords, meta, # data frame with sample_id, group, time, etc. by = "sample_id" )
A ggplot2::ggplot object representing the CAP ordination.
## Not run: # cap_res <- compute_capscale(dist_bc, ps = ps, formula = ~ type_person + age) cap_res$sample_coords <- dplyr::left_join( cap_res$sample_coords, meta, # contains type_person, timepoint, etc. by = "sample_id" ) plot_cap( cap_res, axes = c(1, 2), group_col = "type_person", time_col = "timepoint", show_centroids = TRUE, centroid_by = "group_time", connect_centroids = "group", show_ellipses = TRUE, ellipse_by = c("group", "group_time") ) ## End(Not run)## Not run: # cap_res <- compute_capscale(dist_bc, ps = ps, formula = ~ type_person + age) cap_res$sample_coords <- dplyr::left_join( cap_res$sample_coords, meta, # contains type_person, timepoint, etc. by = "sample_id" ) plot_cap( cap_res, axes = c(1, 2), group_col = "type_person", time_col = "timepoint", show_centroids = TRUE, centroid_by = "group_time", connect_centroids = "group", show_ellipses = TRUE, ellipse_by = c("group", "group_time") ) ## End(Not run)
Plot distances to group/time centroids from a beta_dispersion object
(output of compute_dispersion()) as violins, hollow boxplots and/or
jittered points, by level on the x-axis.
plot_dispersion( x, scope = "group", contrast = "<global>", show_violin = TRUE, show_box = TRUE, show_points = TRUE, point_size = 1.5, point_alpha = 0.6, width_violin = 0.9, width_box = 0.5, jitter_width = 0.1, jitter_height = 0, add_pvalue = TRUE )plot_dispersion( x, scope = "group", contrast = "<global>", show_violin = TRUE, show_box = TRUE, show_points = TRUE, point_size = 1.5, point_alpha = 0.6, width_violin = 0.9, width_box = 0.5, jitter_width = 0.1, jitter_height = 0, add_pvalue = TRUE )
x |
A |
scope |
Character scalar indicating which dispersion scope to plot.
Typically one of |
contrast |
Character scalar indicating which contrast within
|
show_violin |
Logical; if |
show_box |
Logical; if |
show_points |
Logical; if |
point_size |
Numeric point size for jitter layer. Default 1.5. |
point_alpha |
Numeric alpha for jitter layer. Default 0.6. |
width_violin |
Width of violin layer. Default 0.9. |
width_box |
Width of boxplot layer. Default 0.5. |
jitter_width, jitter_height
|
Jitter width/height for points.
Defaults: |
add_pvalue |
Logical; if |
A ggplot object.
## Not run: disp_res <- compute_dispersion(dist_bc, ps, group_col = "group_char") # Global group dispersion: plot_dispersion(disp_res, scope = "group", contrast = "<global>") # Pairwise contrast, with only boxplots + points: plot_dispersion( disp_res, scope = "group", contrast = "dementia vs control", show_violin = FALSE, show_box = TRUE, show_points = TRUE ) ## End(Not run)## Not run: disp_res <- compute_dispersion(dist_bc, ps, group_col = "group_char") # Global group dispersion: plot_dispersion(disp_res, scope = "group", contrast = "<global>") # Pairwise contrast, with only boxplots + points: plot_dispersion( disp_res, scope = "group", contrast = "dementia vs control", show_violin = FALSE, show_box = TRUE, show_points = TRUE ) ## End(Not run)
Visualizes per-sample peptide enrichment counts across groups in a
<phip_data> object, with optional interaction of multiple grouping
variables. Presence is defined by exist > 0. The plot(s) are produced by
an internal helper .plot_enrichment_counts_one().
plot_enrichment_counts( phip_data, group_cols = NULL, prevalence_threshold = 0.05, custom_colors = NULL, binwidth = 1, group_interaction = FALSE, interaction_only = FALSE, interaction_sep = " * ", annotation_size = 4, ... )plot_enrichment_counts( phip_data, group_cols = NULL, prevalence_threshold = 0.05, custom_colors = NULL, binwidth = 1, group_interaction = FALSE, interaction_only = FALSE, interaction_sep = " * ", annotation_size = 4, ... )
phip_data |
A |
group_cols |
Character vector of grouping columns in |
prevalence_threshold |
Numeric in |
custom_colors |
Optional named vector for group colors passed through to
|
binwidth |
Numeric bin width for histograms (default |
group_interaction |
Logical; also compute a plot for the interaction of
all |
interaction_only |
Logical; if |
interaction_sep |
Character separator for interaction labels (default |
annotation_size |
Numeric; size of the in-plot threshold annotations
(passed to |
... |
Reserved for future extensions; ignored. |
If group_cols = NULL, a single plot is returned for all samples.
If group_cols is a character vector, a list of plots is returned (one per
grouping column) unless interaction_only = TRUE.
If group_interaction = TRUE and at least two group_cols are supplied,
an additional interaction plot is created whose label joins groups using
interaction_sep.
If interaction_only = TRUE, only the interaction plot is returned (this
requires group_interaction = TRUE and at least two group_cols).
A single plot object (when group_cols = NULL, or when only one plot is
produced), or
A named list of plot objects (when multiple plots are produced).
# per-group plots pd <- load_example_data() p <- plot_enrichment_counts(pd, group_cols = c("group","timepoint")) # add interaction plot p2 <- plot_enrichment_counts(pd, group_cols = c("group","timepoint"), group_interaction = TRUE ) # interaction only p3 <- plot_enrichment_counts(pd, group_cols = c("group","timepoint"), group_interaction = TRUE, interaction_only = TRUE )# per-group plots pd <- load_example_data() p <- plot_enrichment_counts(pd, group_cols = c("group","timepoint")) # add interaction plot p2 <- plot_enrichment_counts(pd, group_cols = c("group","timepoint"), group_interaction = TRUE ) # interaction only p3 <- plot_enrichment_counts(pd, group_cols = c("group","timepoint"), group_interaction = TRUE, interaction_only = TRUE )
Creates a 2D PCoA scatterplot from a "beta_pcoa" object (the output of
compute_pcoa()), with optional grouping by a categorical variable,
optional time factor, centroids, centroid trajectories and ellipses.
plot_pcoa( pcoa_res, axes = c(1, 2), group_col = NULL, time_col = NULL, show_centroids = TRUE, centroid_by = c("auto", "group", "time", "group_time"), connect_centroids = c("none", "group", "time"), show_ellipses = TRUE, ellipse_by = c("group"), ellipse_type = c("t", "norm", "euclid"), ellipse_level = 0.95, point_size = 1.5, point_alpha = 0.6, centroid_size = 3 )plot_pcoa( pcoa_res, axes = c(1, 2), group_col = NULL, time_col = NULL, show_centroids = TRUE, centroid_by = c("auto", "group", "time", "group_time"), connect_centroids = c("none", "group", "time"), show_ellipses = TRUE, ellipse_by = c("group"), ellipse_type = c("t", "norm", "euclid"), ellipse_level = 0.95, point_size = 1.5, point_alpha = 0.6, centroid_size = 3 )
pcoa_res |
A |
axes |
Integer vector of length 2 giving the PCoA axes to plot
(e.g., |
group_col |
Optional name of the grouping column in
|
time_col |
Optional name of a categorical time column in
|
show_centroids |
Logical; if |
centroid_by |
Granularity of centroids. One of:
|
connect_centroids |
Character; if centroids are shown and
For other |
show_ellipses |
Logical; if |
ellipse_by |
Character vector specifying which factor(s) to use for
ellipses. Possible values are |
ellipse_type |
Type of ellipse passed to
|
ellipse_level |
Numeric confidence level for ellipses (default
|
point_size |
Numeric size of points. Default |
point_alpha |
Numeric alpha (opacity) of points in |
centroid_size |
Numeric size of centroid points. Default |
The function expects that any grouping / time variables you wish to use
(e.g., "type_person", "age_group", "timepoint") have already been
merged into pcoa_res$sample_coords (typically by joining with a metadata
table via sample_id).
Axis labels are automatically annotated with the percentage of variance
explained, using the var_explained component of pcoa_res if available.
Styled with theme_phip().
A ggplot2::ggplot object representing the PCoA scatter plot.
## Not run: # pcoa_res <- compute_pcoa(dist_bc) # Suppose you have metadata with columns sample_id, type_person, timepoint: meta_sc <- dplyr::left_join( pcoa_res$sample_coords, meta, by = "sample_id" ) pcoa_res$sample_coords <- meta_sc # Basic PCoA coloured by type_person with group-level ellipses: plot_pcoa( pcoa_res, axes = c(1, 2), group_col = "type_person", time_col = "timepoint", ellipse_by = "group", show_centroids = TRUE, centroid_by = "group_time", connect_centroids = "group" ) ## End(Not run)## Not run: # pcoa_res <- compute_pcoa(dist_bc) # Suppose you have metadata with columns sample_id, type_person, timepoint: meta_sc <- dplyr::left_join( pcoa_res$sample_coords, meta, by = "sample_id" ) pcoa_res$sample_coords <- meta_sc # Basic PCoA coloured by type_person with group-level ellipses: plot_pcoa( pcoa_res, axes = c(1, 2), group_col = "type_person", time_col = "timepoint", ellipse_by = "group", show_centroids = TRUE, centroid_by = "group_time", connect_centroids = "group" ) ## End(Not run)
Creates a scree plot from a "beta_pcoa" object (output of
compute_pcoa()), showing the percentage of variance explained
for the first n_axes axes.
plot_scree(pcoa_res, n_axes = NULL, type = c("bar", "line"))plot_scree(pcoa_res, n_axes = NULL, type = c("bar", "line"))
pcoa_res |
A |
n_axes |
Integer giving the number of axes to display. If |
type |
Type of scree plot to draw: |
Percentages are computed from the positive part of the eigenvalues,
i.e. pmax(eigenvalues, 0), to handle possible small negative
eigenvalues from non-perfectly Euclidean distance matrices.
Styled with theme_phip().
A ggplot2::ggplot object representing the scree plot.
## Not run: # pcoa_res <- compute_pcoa(dist_bc) plot_scree(pcoa_res) # More axes as line plot: plot_scree(pcoa_res, n_axes = 15, type = "line") ## End(Not run)## Not run: # pcoa_res <- compute_pcoa(dist_bc) plot_scree(pcoa_res) # More axes as line plot: plot_scree(pcoa_res, n_axes = 15, type = "line") ## End(Not run)
Plot t-SNE embeddings computed by compute_tsne(). Supports both
2D (ggplot2) and 3D (plotly) views.
plot_tsne( tsne_res, view = c("2d", "3d"), colour = NULL, size = 1.5, alpha = 0.8, palette = NULL, ... )plot_tsne( tsne_res, view = c("2d", "3d"), colour = NULL, size = 1.5, alpha = 0.8, palette = NULL, ... )
tsne_res |
A |
view |
Character, either |
colour |
Optional name of a column in |
size |
Numeric point size for scatter plots. Defaults to |
alpha |
Numeric transparency for points (0-1). Defaults to |
palette |
Optional vector of colour values passed to
|
... |
Currently ignored; reserved for future extensions. |
For view = "2d", a ggplot object.
For view = "3d", a plotly object (htmlwidget).
## Not run: tsne_res <- compute_tsne(ps, compute_distance(ps)) # 2D plot p2d <- plot_tsne(tsne_res, view = "2d", colour = "type_person") # 3D interactive plot p3d <- plot_tsne(tsne_res, view = "3d", colour = "type_person") ## End(Not run)## Not run: tsne_res <- compute_tsne(ps, compute_distance(ps)) # 2D plot p2d <- plot_tsne(tsne_res, view = "2d", colour = "type_person") # 3D interactive plot p3d <- plot_tsne(tsne_res, view = "3d", colour = "type_person") ## End(Not run)
A thin wrapper around ggplot2::scale_colour_manual() that
applies phip_palette. Set as the session default on library(phiper).
scale_colour_phip(...) scale_color_phip(...)scale_colour_phip(...) scale_color_phip(...)
... |
Arguments passed on to
|
A ggplot2 scale.
Other phip-ggplot:
scale_fill_phip(),
theme_phip()
ggplot2::ggplot(iris, ggplot2::aes(Sepal.Length, Sepal.Width, colour = Species)) + ggplot2::geom_point() + scale_colour_phip()ggplot2::ggplot(iris, ggplot2::aes(Sepal.Length, Sepal.Width, colour = Species)) + ggplot2::geom_point() + scale_colour_phip()
A thin wrapper around ggplot2::scale_fill_manual() that
applies phip_palette. Set as the session default on library(phiper).
scale_fill_phip(...)scale_fill_phip(...)
... |
Arguments passed on to
|
A ggplot2 scale.
Other phip-ggplot:
scale_colour_phip(),
theme_phip()
ggplot2::ggplot(iris, ggplot2::aes(Species, Sepal.Length, fill = Species)) + ggplot2::geom_boxplot() + scale_fill_phip()ggplot2::ggplot(iris, ggplot2::aes(Species, Sepal.Length, fill = Species)) + ggplot2::geom_boxplot() + scale_fill_phip()
Creates an interactive scatter (plotly) comparing prevalence in group a
vs group b for per-feature results produced by compute_pop().
Accepts a plain data.frame with columns percent1, percent2, feature,
group1, group2, p_raw, etc.
When pair is given, subsetting uses .ph_filter_pairs().
Color mapping:
Default (color_by = NULL): BH-corrected p-values computed per-plot from
p_raw ("significant (BH)", "nominal only", "not significant").
color_by as a named vector highlights points matching the specified
peptide-library values:
color_by = c("is_flagellum" = TRUE)
color_by = c("species" = "Staphylococcus aureus")
color_by = c("is_flagellum" = TRUE, "species" = "Staphylococcus aureus")
scatter_interactive( df, pair = NULL, rank = NULL, xlab = NULL, ylab = NULL, alpha = 0.05, color_by = NULL, color_title = NULL, peplib = NULL, background_df = NULL, ... )scatter_interactive( df, pair = NULL, rank = NULL, xlab = NULL, ylab = NULL, alpha = 0.05, color_by = NULL, color_title = NULL, peplib = NULL, background_df = NULL, ... )
df |
A |
pair |
optional length-2 character, e.g. |
rank |
optional single rank (character) to keep. |
xlab, ylab
|
axis labels; defaults to |
alpha |
numeric in (0,1]; significance threshold; not the plotly alpha. |
color_by |
optional named vector identifying peptide-library values to
highlight, e.g. |
color_title |
optional legend title when |
peplib |
Optional peptide metadata table used to resolve |
background_df |
Optional data frame of background points. |
... |
graphical parameters: |
a plotly object.
## Not run: p <- scatter_interactive(scatters, pair = c("kid_serum::T2", "kid_serum::T8"), rank = "peptide_id", color_by = c("is_flagellum" = TRUE), color_title = "Flagellum" ) p ## End(Not run)## Not run: p <- scatter_interactive(scatters, pair = c("kid_serum::T2", "kid_serum::T8"), rank = "peptide_id", color_by = c("is_flagellum" = TRUE), color_title = "Flagellum" ) p ## End(Not run)
compute_pop()
A ggplot2 scatterplot comparing prevalence in group a vs group b.
Default coloring uses BH-corrected p-values computed per-plot from p_raw:
"significant (BH)", "nominal only", "not significant".
If only one category is present the plot falls back to p-value bins.
When color_by is supplied as a named vector, peptide metadata is joined and
points matching the specified values are highlighted. Multiple groups may be
given simultaneously:
color_by = c("is_flagellum" = TRUE)
color_by = c("species" = "Staphylococcus aureus")
color_by = c("is_flagellum" = TRUE, "species" = "Staphylococcus aureus")
scatter_static( df, pair = NULL, rank = NULL, xlab = NULL, ylab = NULL, alpha = 0.05, color_by = NULL, color_title = NULL, ... )scatter_static( df, pair = NULL, rank = NULL, xlab = NULL, ylab = NULL, alpha = 0.05, color_by = NULL, color_title = NULL, ... )
df |
A data frame with prevalence results. |
pair |
optional group pair (character length-2). |
rank |
optional single rank (character) to keep. |
xlab, ylab
|
axis labels; defaults to |
alpha |
numeric in (0,1]; significance threshold for category labels. |
color_by |
optional named vector identifying peptide-library values to
highlight, e.g. |
color_title |
optional legend title when |
... |
graphical parameters: |
A ggplot object.
set.seed(1) prev <- data.frame( rank = "peptide_id", feature = paste0("pep", 1:30), group1 = "A", group2 = "B", prop1 = runif(30), prop2 = runif(30), percent1 = runif(30, 0, 100), percent2 = runif(30, 0, 100), ratio = runif(30, 0.1, 10), p_raw = runif(30), n_peptides = 1L ) # basic plot scatter_static(prev) # filter to a specific pair and set axis labels scatter_static(prev, pair = c("A", "B"), xlab = "Group A (%)", ylab = "Group B (%)", alpha = 0.05 )set.seed(1) prev <- data.frame( rank = "peptide_id", feature = paste0("pep", 1:30), group1 = "A", group2 = "B", prop1 = runif(30), prop2 = runif(30), percent1 = runif(30, 0, 100), percent2 = runif(30, 0, 100), ratio = runif(30, 0.1, 10), p_raw = runif(30), n_peptides = 1L ) # basic plot scatter_static(prev) # filter to a specific pair and set axis labels scatter_static(prev, pair = c("A", "B"), xlab = "Group A (%)", ylab = "Group B (%)", alpha = 0.05 )
theme_phip
A clean, publication-ready ggplot2 theme tuned for facetted plots with the Montserrat font. The font is registered and showtext rendering is enabled automatically when the package loads — no setup required.
theme_phip(base_size = 14, base_family = "Montserrat")theme_phip(base_size = 14, base_family = "Montserrat")
base_size |
Base font size. |
base_family |
Base font family (default |
A ggplot2 theme object.
Other phip-ggplot:
scale_colour_phip(),
scale_fill_phip()
## Not run: ggplot2::ggplot(iris, ggplot2::aes(Sepal.Length, Sepal.Width, colour = Species)) + ggplot2::geom_point() + scale_colour_phip() + theme_phip() ## End(Not run)## Not run: ggplot2::ggplot(iris, ggplot2::aes(Sepal.Length, Sepal.Width, colour = Species)) + ggplot2::geom_point() + scale_colour_phip() + theme_phip() ## End(Not run)
Interactive volcano plot (log2 ratio vs -log10 p)
volcano_interactive( df, pair = NULL, rank = NULL, color_by = NULL, color_title = NULL, fc_cut = 1, p_cut = 0.05, p_mode = c("raw", "bh"), significant_colors = c(`not significant` = "#386cb0", `significant prior correction` = "#1b9e77", `significant post fdr correction` = "#e31a1c") )volcano_interactive( df, pair = NULL, rank = NULL, color_by = NULL, color_title = NULL, fc_cut = 1, p_cut = 0.05, p_mode = c("raw", "bh"), significant_colors = c(`not significant` = "#386cb0", `significant prior correction` = "#1b9e77", `significant post fdr correction` = "#e31a1c") )
df |
A data frame with prevalence results. |
pair |
optional group pair (character length-2). |
rank |
optional single rank (character) to keep. |
color_by |
optional named vector identifying peptide-library values to
highlight, e.g. |
color_title |
optional legend title when |
fc_cut |
Numeric; absolute log2 fold-change cutoff. |
p_cut |
Numeric; p-value cutoff. |
p_mode |
One of |
significant_colors |
Named vector of colors for significance categories. |
A plotly htmlwidget.
## Not run: set.seed(3) prev <- data.frame( rank = "peptide_id", feature = paste0("pep", 1:40), group1 = "A", group2 = "B", prop1 = runif(40), prop2 = runif(40), percent1 = runif(40, 0, 100), percent2 = runif(40, 0, 100), ratio = runif(40, 0.1, 10), p_raw = c(runif(10, 0, 0.01), runif(30, 0.1, 1)), n_peptides = 1L ) # interactive volcano — hover to inspect each peptide volcano_interactive(prev) # BH correction volcano_interactive(prev, p_mode = "bh", fc_cut = 1.5, p_cut = 0.01) ## End(Not run)## Not run: set.seed(3) prev <- data.frame( rank = "peptide_id", feature = paste0("pep", 1:40), group1 = "A", group2 = "B", prop1 = runif(40), prop2 = runif(40), percent1 = runif(40, 0, 100), percent2 = runif(40, 0, 100), ratio = runif(40, 0.1, 10), p_raw = c(runif(10, 0, 0.01), runif(30, 0.1, 1)), n_peptides = 1L ) # interactive volcano — hover to inspect each peptide volcano_interactive(prev) # BH correction volcano_interactive(prev, p_mode = "bh", fc_cut = 1.5, p_cut = 0.01) ## End(Not run)
Static volcano plot (log2 ratio vs -log10 p)
volcano_static( df, pair = NULL, rank = NULL, color_by = NULL, color_title = NULL, fc_cut = 1, p_cut = 0.05, p_mode = c("raw", "bh"), significant_colors = c(`not significant` = "#386cb0", `significant prior correction` = "#1b9e77", `significant post fdr correction` = "#e31a1c") )volcano_static( df, pair = NULL, rank = NULL, color_by = NULL, color_title = NULL, fc_cut = 1, p_cut = 0.05, p_mode = c("raw", "bh"), significant_colors = c(`not significant` = "#386cb0", `significant prior correction` = "#1b9e77", `significant post fdr correction` = "#e31a1c") )
df |
A data frame with prevalence results. |
pair |
optional group pair (character length-2). |
rank |
optional single rank (character) to keep. |
color_by |
optional named vector identifying peptide-library values to
highlight, e.g. |
color_title |
optional legend title when |
fc_cut |
Numeric; absolute log2 fold-change cutoff. |
p_cut |
Numeric; p-value cutoff. |
p_mode |
One of |
significant_colors |
Named vector of colors for significance categories. |
A ggplot object.
set.seed(2) prev <- data.frame( rank = "peptide_id", feature = paste0("pep", 1:40), group1 = "A", group2 = "B", prop1 = runif(40), prop2 = runif(40), percent1 = runif(40, 0, 100), percent2 = runif(40, 0, 100), ratio = runif(40, 0.1, 10), p_raw = c(runif(10, 0, 0.01), runif(30, 0.1, 1)), n_peptides = 1L ) # basic volcano volcano_static(prev) # BH correction, custom cutoffs volcano_static(prev, fc_cut = 1.5, p_cut = 0.01, p_mode = "bh") # filter to one pair volcano_static(prev, pair = c("A", "B"), rank = "peptide_id")set.seed(2) prev <- data.frame( rank = "peptide_id", feature = paste0("pep", 1:40), group1 = "A", group2 = "B", prop1 = runif(40), prop2 = runif(40), percent1 = runif(40, 0, 100), percent2 = runif(40, 0, 100), ratio = runif(40, 0.1, 10), p_raw = c(runif(10, 0, 0.01), runif(30, 0.1, 1)), n_peptides = 1L ) # basic volcano volcano_static(prev) # BH correction, custom cutoffs volcano_static(prev, fc_cut = 1.5, p_cut = 0.01, p_mode = "bh") # filter to one pair volcano_static(prev, pair = c("A", "B"), rank = "peptide_id")