Research Radar — 2026-06-24
Methods & AI
Computational
HTS-Oracle X: AI-Guided Prospective Discovery of Small Molecule Immune Checkpoint Binders
bioRxiv Published 2026-06-22 preprint DOI: 10.64898/2026.06.17.732853
deep learning drug discovery immune checkpoint small molecule protein-protein interaction multimodal AI high-throughput screening
Summary: Presents HTS-Oracle X, a multimodal deep learning platform that integrates bidirectional cross-attention fusion of ChemBERTa SMILES embeddings with extended RDKit descriptors for prospective discovery of small molecule immune checkpoint binders. Targeting immune checkpoint protein-protein interactions with small molecules has been limited by the shallow, featureless binding surfaces of co-stimulatory and co-inhibitory receptors and the low hit rates of conventional high-throughput screening. HTS-Oracle X trains on continuous biophysical binding signals rather than binary labels, employs Monte Carlo Dropout uncertainty quantification for uncertainty-adjusted compound selection, and was trained on 45,760 Dianthus TRIC-screened compounds per target under scaffold-aware cross-validation. The platform was applied prospectively and successfully discovered novel small molecule binders targeting immune checkpoint interfaces, demonstrating that AI-guided screening can overcome the limitations of conventional approaches against these challenging targets.
Why it matters: Small molecule immunomodulators that target immune checkpoints could overcome key limitations of antibody-based checkpoint blockade — including poor tumour penetration, lack of oral bioavailability, and high manufacturing costs — but discovering such molecules has proven exceptionally difficult because checkpoint receptor interfaces are large, flat, and featureless. HTS-Oracle X demonstrates that multimodal deep learning combining chemical language models with structural descriptors, trained on continuous binding signals with proper uncertainty quantification, can prospectively discover active compounds against these challenging targets. If this approach generalises to other immune checkpoint targets, it could open a new class of orally available, tumour-penetrating immunomodulators for cancer immunotherapy. The uncertainty quantification component is also critical: it enables compound selection decisions that account for model confidence, reducing the number of false leads that waste downstream screening resources.
Why for Yiru: Small molecule immune checkpoint inhibitors represent a frontier in TME pharmacology. Unlike antibodies, small molecules can penetrate deep into tumour tissue, reach immune synapses within the TME, and potentially be designed with short half-lives for controlled immunomodulation. HTS-Oracle X's multimodal architecture — fusing SMILES embeddings with molecular descriptors — is a design pattern applicable to any TME drug discovery problem where binding surfaces are challenging. From a computational perspective, the uncertainty quantification framework is particularly relevant: TME drug discovery often suffers from high false-positive rates in phenotypic screens, and Monte Carlo Dropout-based uncertainty selection could improve hit-to-lead success rates across TME target classes.
Drug-Prot: A query system for statistical inference of drug effects and interactions in dynamic proteomic networks
bioRxiv Published 2026-06-22 preprint DOI: 10.64898/2026.06.17.732914
proteomics drug effects drug-drug interaction statistical inference perturbation biology breast cancer combination therapy
Summary: Presents Drug-Prot, a computational framework that leverages large-scale perturbation proteomics to quantify causal drug effects, drug-drug interactions, and dynamic protein relationships. Understanding drug effects and interactions is essential for developing combination therapies, yet most computational approaches rely on transcriptomic or phenotypic data that miss the proteomic level of drug action. Using data from 63 single drugs and 59 drug combinations applied to 18 breast cancer cell lines at three time points (6, 24, and 48 hours), Drug-Prot estimates drug effects on protein expression and reconstructs directed temporal protein dependency networks. The framework provides publicly available software enabling targeted analyses of user-defined protein sets, substantially reducing the multiple-testing burden, and offers an interactive web application for corrected p-values for single-drug and combination effects with directed temporal network visualisation.
Why it matters: Most drug effect studies measure transcriptomic changes, but proteins are the functional executors of drug action, and transcript-protein correlations are often poor — especially for drugs that affect protein stability, trafficking, or degradation. Drug-Prot fills this gap by providing a rigorous statistical framework for proteomic drug profiling across time, cell lines, and drug combinations. The ability to query user-defined protein sets with proper multiple-testing correction makes the resource practical for hypothesis-driven research. The temporal dimension (three timepoints) is particularly valuable: it enables reconstruction of directed protein networks that distinguish direct targets from downstream effects, providing mechanistic insight beyond static profiling.
Why for Yiru: Proteomic profiling of drug effects is underexplored in TME research compared to transcriptomics, yet many TME-relevant drugs — kinase inhibitors, epigenetic modulators, protein degraders — act primarily at the protein level. Drug-Prot's framework could be applied to TME-relevant drug combinations: for example, profiling how checkpoint blockade combined with targeted therapy alters the TME proteome over time. The directed temporal protein network reconstruction is directly applicable to understanding signalling rewiring in the TME upon drug treatment. From a methodological perspective, Drug-Prot's statistical framework for quantifying combination effects (synergy, additivity, antagonism) at the proteomic level provides a template for analysing perturbation proteomics data from TME model systems.
Multivariate Random Forests for Cross-Modal Multi-Omics Integration
bioRxiv Published 2026-06-22 preprint DOI: 10.64898/2026.06.17.732933
multi-omics integration random forests machine learning clustering cross-modal learning bioinformatics biomarker discovery
Summary: Introduces multiRF, a random-forest-based method for cross-modal multi-omics integration that handles complex data types and separates shared and modality-specific structure. Current multi-omics clustering methods typically merge all data types into a single representation, which can blur biology that is strong in one omics layer, or rely on linear structure that misses complex cross-modal relationships. multiRF learns sample similarities across omics layers from multivariate random forests, combines them across data types, and uses the resulting weights to estimate which part of each omics layer is predictive of the others, thereby decomposing each layer into shared and modality-specific components. The method outperforms existing approaches across diverse benchmarks, recovering known biology that is specific to individual omics layers while also extracting integrated signals.
Why it matters: Multi-omics data are increasingly generated in large cohort studies, but integrative analysis remains challenging because different omics layers capture different biological processes — some signals are shared, others are layer-specific. multiRF's explicit separation of shared and modality-specific structure is conceptually important: it recognises that not all biology needs to be integrated, and that layer-specific signals may be as informative as shared signals. The random forest foundation makes the method robust to non-linear relationships, mixed data types, and missing data, all of which are common in real-world multi-omics datasets. As multi-omics cohorts grow, methods that can flexibly integrate diverse data types while preserving interpretability become essential.
Why for Yiru: TME research increasingly generates multi-omics data — genomics, transcriptomics, proteomics, epigenomics — from the same tumour samples. multiRF's decomposition of shared versus modality-specific variation could reveal whether certain TME features (e.g., immune infiltration signatures) are consistently captured across omics layers or are specific to particular molecular measurements. For biomarker discovery, identifying features that are robustly shared across omics layers might yield more reproducible predictors of immunotherapy response than single-omics biomarkers. The method's ability to handle non-linear relationships is particularly relevant for TME data, where molecular measurements often exhibit complex, non-linear associations with clinical outcomes.
nanoASM: Long-Read Allele-Specific DNA Methylation Profiling Enables Functional Annotation of Regulatory Noncoding Variants in Human Prostate Tissues
bioRxiv Published 2026-06-22 preprint DOI: 10.64898/2026.06.17.732357
nanopore sequencing DNA methylation allele-specific methylation prostate cancer regulatory genomics long-read sequencing epigenomics
Summary: Performs whole-genome nanopore sequencing on normal and tumour prostate tissues to characterise differential methylation, methylation entropy, and allele-specific methylation (ASM) associated with noncoding genetic variants. Long-read nanopore sequencing enables simultaneous detection of germline variation and native DNA base modifications on individual DNA molecules, providing a unique opportunity to investigate allele-specific epigenetic regulation. Genome-wide analysis identified extensive cancer-associated differentially methylated regions, with hypermethylated DMRs significantly enriched near transcription start sites and regulatory regions. Integration with transcriptomic datasets revealed strong inverse relationships between promoter methylation and gene expression. The study further demonstrates that ASM at regulatory noncoding variants can help functionally annotate prostate cancer risk-associated variants that fall outside protein-coding regions.
Why it matters: Most GWAS-identified risk variants for common cancers fall in noncoding regions whose functional significance is difficult to determine. Allele-specific methylation provides a molecular readout of whether a noncoding variant affects local epigenetic regulation in cis, offering a path from statistical association to mechanistic understanding. The use of long-read nanopore sequencing is methodologically important: unlike short-read approaches, it can phase methylation and genetic variation on individual DNA molecules, directly revealing which allele of a heterozygous variant is methylated. This phased information is essential for distinguishing cis-acting regulatory variation from trans-acting effects.
Why for Yiru: Epigenetic regulation in the TME is increasingly recognised as a key determinant of immune cell function and tumour progression. nanoASM's approach to linking noncoding genetic variants with allele-specific methylation could be applied to TME-relevant regulatory variants — for example, identifying whether tumour-associated noncoding variants in immune checkpoint genes affect local methylation and thus expression. The phased methylation analysis is also relevant for understanding how tumour heterogeneity in methylation patterns contributes to immune evasion. Methodologically, the long-read approach could be extended to profile allele-specific methylation in TME immune cells, revealing how germline variation shapes the epigenetic landscape of anti-tumour immunity.
DLDN-Bench: A Benchmark Framework for Deep Learning De Novo Peptide Sequencing in Proteomics
bioRxiv Published 2026-06-22 preprint DOI: 10.64898/2026.06.10.728383
proteomics de novo peptide sequencing deep learning mass spectrometry benchmark bioinformatics
Summary: Introduces DLDN-Bench, a standardised benchmark framework for evaluating deep learning-based de novo peptide sequencing from mass spectrometry data. De novo peptide sequencing is essential for identifying novel peptides without relying on protein sequence databases, a capability critical for discovering tumour neoantigens, post-translational modifications, and peptides from non-model organisms. However, the rapid proliferation of deep learning models has led to heterogeneous evaluation practices and limited comparability across methods. DLDN-Bench provides benchmark datasets derived from human muscle biopsy mass spectrometry data, annotated through consensus across multiple database search engines, and systematically evaluates recent deep learning-based de novo sequencing tools alongside traditional approaches. The framework establishes consistent evaluation metrics and data splits to enable fair comparison and reproducible benchmarking.
Why it matters: De novo peptide sequencing is the only way to identify peptides not present in protein sequence databases, making it essential for discovering tumour-specific neoantigens, peptides from microbial or viral origins, and post-translational modifications that alter peptide mass. As deep learning models for de novo sequencing proliferate, the lack of standardised benchmarking has made it difficult to assess real progress. DLDN-Bench addresses this by providing a rigorous, reproducible evaluation framework with well-characterised ground truth data. The focus on human muscle tissue data provides a clinically relevant benchmark, and the multi-search-engine consensus annotation ensures high-quality ground truth.
Why for Yiru: Personalised cancer vaccines and neoantigen discovery depend on accurate identification of tumour-specific peptides presented by MHC molecules. De novo sequencing from immunopeptidomics data — where peptides are eluted from MHC molecules and identified by mass spectrometry — is a critical computational step in this pipeline. DLDN-Bench provides a framework for evaluating which de novo sequencing tools perform best on the types of peptides relevant to TME immunopeptidomics. Benchmarking on consistently processed, well-annotated data is essential before applying these tools to clinically critical neoantigen discovery tasks.
Biomedical discoveries
Biomedicine
Human CD8-iTreg are potent GVHD suppressors and tumoricidal effectors by release of Granzyme-K+ Supramolecular Attack Particles
bioRxiv Published 2026-06-23 preprint DOI: 10.64898/2026.06.18.731665
regulatory T cells CD8 T cells GVHD cell therapy immunotherapy tumour immunity supramolecular attack particles granzyme K
Summary: Characterises induced human CD8+ regulatory T cells (CD8-iTreg) generated from peripheral blood CD8+CD25neg T cells using anti-CD3e mAb-loaded artificial antigen presenting cells, IL-2, TGFbeta, and Rapamycin. These CD8-iTreg differentiate into a stable, highly proliferative bifunctional population with suppressive activity comparable to CD4-iTreg while retaining cytolytic capacity similar to conventional CD8+ cytotoxic T lymphocytes. Multi-parameter spectral flow cytometry and single-cell RNA-seq reveal a distinct immunoregulatory signature with tissue-residency marker CD103 and increased canonical Treg markers. Remarkably, CD8-iTreg suppress graft-versus-host disease in vivo with efficacy matching CD4-iTreg, yet simultaneously retain potent tumouricidal activity through release of Granzyme-K+ supramolecular attack particles (SMAPs) — membrane-bound cytotoxic granules that kill target cells upon contact. This bifunctional capacity distinguishes CD8-iTreg from conventional CD4-iTreg, which suppress immune responses broadly including anti-tumour immunity.
Why it matters: The simultaneous suppression of pathogenic immune responses and preservation of protective anti-tumour immunity has been a central challenge in cellular immunotherapy. Conventional CD4+ Tregs can suppress GVHD and autoimmunity but also dampen anti-tumour responses, a fundamental limitation. CD8-iTreg overcome this by retaining intrinsic tumouricidal activity through a distinct Granzyme-K+ SMAP mechanism while maintaining regulatory function. The finding that CD8-iTreg kill tumour cells via supramolecular attack particles — large, membranolytic complexes distinct from the granzyme B-perforin pathway used by conventional CTLs — opens new avenues for engineering T cells with selective cytotoxic specificity. The demonstration of in vivo GVHD suppression with preserved anti-lymphoma activity provides proof-of-concept for bifunctional Treg therapy.
Why for Yiru: The bifunctional CD8-iTreg concept is directly relevant to TME immunotherapy. Tumour microenvironments are characterised by both immunosuppressive barriers (which Treg suppressive function could help control) and the need for effective anti-tumour cytotoxicity. CD8-iTreg that simultaneously suppress pathologic inflammation and kill tumour cells could address a key limitation of current Treg-based therapies — the risk of dampening beneficial anti-tumour immunity. The Granzyme-K+ SMAP mechanism is particularly interesting: SMAPs are nanostructured membranolytic complexes whose composition and regulation differ from classical cytotoxic granules, potentially enabling selective targeting. Understanding whether TME-resident CD8+ Treg-like cells exist and whether their SMAP mechanisms can be therapeutically modulated is a new line of TME investigation.
Elastin-derived peptides suppress CCL20 expression and block ILC2 recruitment during lung inflammation
bioRxiv Published 2026-06-23 preprint DOI: 10.64898/2026.06.18.733133
elastin ILC2 lung inflammation CCL20 COPD asthma extracellular matrix innate lymphoid cells
Summary: Investigates how elastin degradation during chronic lung inflammation generates elastin peptides (EPs) with immunomodulatory properties that suppress group 2 innate lymphoid cell (ILC2) responses. Using mouse models of EP-induced emphysema and house dust mite-induced asthma, EPs instillation reduced lung ILC2 numbers without affecting Th2 cells. In COPD patients, decreased CCL20 expression was observed in lung immune cells, with an inverse correlation between serum CCL20 levels and clinical indicators of elevated EPs burden. EPs instillation during HDM-induced lung inflammation also blocked CCL20-dependent ILC2 recruitment. The findings reveal that extracellular matrix breakdown products — long viewed as passive damage-associated molecules — can actively shape the tissue immune environment by regulating ILC2 trafficking through CCL20-CCR6 axis modulation.
Why it matters: Extracellular matrix remodelling is a hallmark of chronic inflammatory diseases, but its immunomodulatory consequences are often overlooked. This study demonstrates that elastin peptides — generated by matrix breakdown in COPD and asthma — actively suppress ILC2 recruitment by downregulating CCL20, providing a mechanistic link between matrix degradation and type 2 immune regulation. The finding challenges the simple view that matrix fragments are merely pro-inflammatory damage signals, showing instead that they can have context-dependent immunomodulatory effects. Understanding how ECM-derived peptides shape tissue immune environments could reveal new therapeutic targets for inflammatory diseases where matrix remodelling is prominent.
Why for Yiru: ECM remodelling is a defining feature of the tumour microenvironment, where matrix degradation, desmoplasia, and stiffness changes profoundly affect immune cell trafficking and function. The finding that ECM-derived peptides regulate immune cell recruitment through specific chemokine axes (CCL20-CCR6) raises the question of whether similar mechanisms operate in the TME. Tumour-associated ECM breakdown may generate peptide fragments that modulate ILC and T cell recruitment to tumours. The CCL20-CCR6 axis is particularly relevant: it mediates recruitment of Th17 cells and dendritic cells to tumours and has been implicated in both pro- and anti-tumour immunity depending on context. More broadly, this study exemplifies how matrix-immune crosstalk can be mediated through specific, identifiable molecular pathways rather than generic damage signals.
A microbial metabolite reduces alcohol-induced inflammation via dual modulation of NF-κB and Interferon pathway
bioRxiv Published 2026-06-23 preprint DOI: 10.64898/2026.06.18.733199
microbiome metabolite alcohol-associated hepatitis NF-κB interferon inflammation 4-hydroxyphenylacetic acid
Summary: Identifies 4-hydroxyphenylacetic acid (4-HPAA), a gut microbiome-derived metabolite, as a unique immunomodulator that selectively enhances interferon signalling while limiting NF-κB-mediated inflammation, thereby restoring immune balance in alcohol-associated hepatitis. Alcohol-associated hepatitis is characterised by excessive NF-κB-driven inflammation and blunted antiviral interferon responses, a dual defect that current therapies do not address. Screening a library of 152 gut microbiome-derived metabolites in human monocytic THP1-Dual cells — which secrete reporters for both NF-κB and interferon signalling — revealed 4-HPAA as a dual modulator. The metabolite suppressed NF-κB activation while simultaneously enhancing interferon pathway activity, representing a therapeutic strategy that targets both arms of the dysregulated immune response.
Why it matters: Alcohol-associated hepatitis has limited treatment options and high mortality, partly because therapeutic strategies that suppress inflammation often worsen the interferon deficiency that leaves patients vulnerable to infection. The identification of a single microbiome-derived metabolite that simultaneously addresses both defects — suppressing excessive NF-κB while boosting interferon — is therapeutically attractive. The screening strategy itself is noteworthy: using dual-reporter cell lines to specifically search for compounds with opposing effects on two signalling pathways, rather than simply looking for anti-inflammatory activity, addresses the specific immunological defect in the disease. This pathway-selective screening approach could be applied to other diseases where inflammatory and anti-viral signalling are imbalanced.
Why for Yiru: The NF-κB-interferon signalling balance is equally critical in the TME. Tumour-associated inflammation driven by NF-κB can promote tumour progression, while Type I interferon signalling is essential for anti-tumour immunity and is often suppressed in tumours. The concept of identifying metabolites or small molecules that rebalance these two pathways has direct TME therapeutic implications: a compound that suppresses tumour-promoting NF-κB inflammation while enhancing anti-tumour interferon signalling could complement checkpoint blockade. The dual-reporter screening strategy is also methodologically relevant for TME drug discovery, where pathway-selective rather than broadly suppressive immunomodulators are needed.
Human CD1c-autoreactive T cells recognise Mycobacterium tuberculosis-infected antigen-presenting cells and display cytotoxic effector programmes
bioRxiv Published 2026-01-20 preprint DOI: 10.64898/2026.01.16.700025
tuberculosis CD1c autoreactive T cells lipid antigens non-classical T cells cytotoxic T cells antigen presentation mycobacterium
Summary: Investigates how CD1c-autoreactive T cells — which recognise self-lipids presented by CD1c, an MHC class I-like antigen-presenting molecule — respond to Mycobacterium tuberculosis infection. CD1c-autoreactive T cells are frequent in human blood, but their role during infection has been unclear. Using engineered human antigen-presenting cell systems and single-cell transcriptomic profiling, the study reveals that CD1c is present within human TB granulomas, but Mtb infection down-modulates CD1c expression on infected APCs. CD1c-autoreactive T cells recognise Mtb-infected APCs in a CD1c-dependent manner and display cytotoxic effector programmes including granzyme B and perforin expression. The findings suggest that CD1c-autoreactive T cells contribute to anti-mycobacterial immunity through recognition of Mtb-infected cells, and that Mtb may evade this response by downregulating CD1c.
Why it matters: Tuberculosis remains the leading infectious cause of death worldwide, and the limited efficacy of BCG vaccination highlights the need to understand the full repertoire of immune responses that can recognise Mtb. CD1c-autoreactive T cells represent a non-classical T cell population that recognises lipid antigens rather than peptides, yet their anti-microbial function has been poorly defined. This study establishes that these cells are cytotoxic and can recognise Mtb-infected cells, providing a mechanism by which they could contribute to anti-TB immunity. The finding that Mtb downregulates CD1c suggests a specific immune evasion strategy targeting this pathway, analogous to MHC-I downregulation by viruses and tumours.
Why for Yiru: Non-classical T cells are increasingly recognised as important components of the TME immune landscape. CD1-restricted T cells recognise lipid antigens, which are abundant in the TME — tumour cells have altered lipid metabolism that could generate distinct lipid antigen profiles. Understanding how CD1c-autoreactive T cells recognise and kill target cells provides a mechanistic framework for investigating whether similar lipid-reactive T cell responses operate in anti-tumour immunity. The finding that pathogens evolve strategies to downregulate CD1c raises the question of whether tumours similarly modulate CD1 expression to evade lipid-reactive T cells. The single-cell transcriptomic profiling approach also provides a template for characterising CD1-restricted T cell states in human tissues.
The senescence-inhibitory p53 isoform Δ133p53α represses the proinflammatory chemokine CXCL10 in progeria model mice and naturally aged mice
bioRxiv Published 2026-04-02 preprint DOI: 10.64898/2026.03.31.715385
p53 senescence aging CXCL10 inflammation progeria cytokine chemokine
Summary: Profiles the anti-inflammatory effects of the senescence-inhibitory p53 isoform Δ133p53α, demonstrating that it represses the proinflammatory chemokine CXCL10 in both progeria model mice (LmnaG609G/+) and naturally aged mice. Δ133p53α is a naturally occurring p53 isoform that inhibits p53-mediated cellular senescence, and its transgenic expression counteracts aging-associated pathological changes. Using Luminex-based multiplex cytokine/chemokine profiling, quantitative RT-PCR, and RNA in situ hybridisation, the study identifies CXCL10 as a key target of Δ133p53α-mediated repression. CXCL10 is a chemoattractant for activated T cells and NK cells and is implicated in aging-associated chronic inflammation (inflammaging). The repression of CXCL10 provides a mechanistic link between the anti-senescence activity of Δ133p53α and its broader anti-inflammatory effects in aging.
Why it matters: Chronic low-grade inflammation (inflammaging) is a hallmark of aging that contributes to multiple age-related diseases, but the molecular mechanisms connecting cellular senescence to inflammatory cytokine production are incompletely understood. This study identifies a specific p53 isoform that represses CXCL10, linking the senescence programme to chemokine regulation. CXCL10 is not just a marker of inflammation — it recruits T cells and NK cells and has been implicated in autoimmune pathology and cancer immune surveillance. Understanding how p53 isoforms differentially regulate inflammatory mediators is important because p53 is mutated or dysregulated in most cancers, and its isoforms may differentially affect the TME.
Why for Yiru: CXCL10 is a key chemokine in the TME, recruiting CXCR3+ T cells and NK cells into tumours, and its expression levels correlate with immunotherapy response in multiple cancer types. The finding that a p53 isoform represses CXCL10 suggests that p53 status in tumour cells could directly influence T cell recruitment to the TME. Tumours with specific p53 isoform expression patterns may have dysregulated CXCL10 production, affecting immune infiltration and immunotherapy sensitivity. From a therapeutic perspective, understanding which p53 isoforms regulate CXCL10 could inform strategies to enhance T cell recruitment to checkpoint-blockade-treated tumours. More broadly, the concept that p53 isoforms have distinct and non-overlapping functions in inflammation regulation is important for understanding how p53 mutations reshape the TME.
Cross-disciplinary watchlist
Other Fields
replicateFest: An R Package and Shiny App for Analysis of T Cell Receptor Repertoire Data from the Functional Expansion of Specific T cell (FEST) Assays
bioRxiv Published 2026-06-23 preprint DOI: 10.64898/2026.06.18.733036
TCR repertoire FEST assay T cell response neoantigen bioinformatics R package immunotherapy reproducibility
Summary: Presents replicateFest, a computational framework implemented as an R package and Shiny web application for analysing T cell receptor repertoire data from Functional Expansion of Specific T cell (FEST) assays. FEST-based assays combine short-term peptide stimulation with TCR sequencing to identify clonotypes that expand in response to specific antigens, providing invaluable data for detecting neoantigen-specific T cell responses and assessing checkpoint blockade efficacy. However, variability from biological and technical replicates poses challenges for reproducibility. replicateFest applies Fisher's method for combining p-values across replicates and provides a user-friendly interface for identifying reproducible antigen-specific TCR clonotypes.
Why it matters: Identifying which T cell clonotypes recognise specific tumour neoantigens is a central challenge in cancer immunotherapy. FEST assays provide a direct functional readout of antigen specificity combined with TCR sequence information, but the lack of standardised analytical tools for handling replicate variability has limited reproducibility. replicateFest addresses this by providing rigorous statistical methods for combining replicate data and an accessible interface for non-bioinformatician users. As personalised neoantigen vaccines and TCR-based therapies advance, standardised tools for identifying antigen-specific T cells become critical for both clinical trials and translational research.
Why for Yiru: Neoantigen-specific T cell identification is central to personalised cancer vaccine development and TCR-engineered cell therapy — both frontier approaches for solid tumour immunotherapy. replicateFest provides a standardised pipeline for analysing FEST data that could be integrated into TME research workflows. The statistical framework for handling replicate variability is directly applicable to any T cell functional assay used in TME studies. The R package and Shiny app design makes it accessible to immunologists who may not have extensive computational expertise.
gamdid: generalized additive models for differential distributions in single cell experiments
bioRxiv Published 2026-06-23 preprint DOI: 10.64898/2026.06.18.733106
single-cell proteomics differential distribution generalized additive models statistical method cellular heterogeneity R package
Summary: Introduces gamdid (generalized additive models for differential distributions), a statistical framework and R package for differential distribution analysis in single-cell proteomics data. Single-cell proteomics generates protein abundance measurements across hundreds to thousands of individual cells, offering unprecedented resolution to study cellular heterogeneity. However, existing methods for identifying differences between conditions are limited to detecting shifts in mean expression, missing biologically relevant shape differences in the distribution of protein expression across cells. gamdid uses generalized additive models to flexibly model the full distribution of protein expression per condition and test for any type of distributional difference — including changes in variance, skewness, modality, and tail behaviour — not just mean shifts.
Why it matters: Single-cell technologies are rapidly advancing from transcriptomics to proteomics, but the analytical methods lag behind. Most single-cell differential analysis tools focus on mean expression changes, yet the power of single-cell resolution lies in capturing heterogeneity — differences in how expression is distributed across cells, not just average differences. gamdid addresses this gap by testing for any distributional change, enabling detection of phenomena like increased cell-to-cell variability upon drug treatment, bimodal expression patterns emerging in subpopulations, or altered tail behaviour reflecting rare but biologically important cell states. As single-cell proteomics matures, methods like gamdid that capture the full distributional impact of perturbations will be essential.
Why for Yiru: TME single-cell analysis fundamentally depends on capturing heterogeneity. The distribution of protein expression across immune cells within a tumour — for example, whether checkpoint molecule expression is uniformly low or bimodally distributed across exhausted and non-exhausted T cells — carries information that mean expression alone misses. gamdid could be applied to TME single-cell proteomics data to detect distributional changes in immune checkpoint expression, cytokine production, or metabolic enzyme levels between treatment groups. The GAM-based framework also handles the small sample sizes common in clinical TME studies, where the number of patients is limited even though the number of cells per patient is large.
ComCat: Combating Covariate Effects in Brain Analysis
bioRxiv Published 2026-06-23 preprint DOI: 10.64898/2026.06.18.733200
neuroimaging batch effects covariate correction ComBat B-splines multi-site studies bioinformatics
Summary: Introduces ComCat, an extension of the ComBat harmonisation framework that can handle both categorical site indicators and continuous confounding variables in large-scale neuroimaging studies. As neuroimaging analysis shifts toward multi-site studies, managing unwanted variability from heterogeneous data sources has become critical. While ComBat and its extensions are widely used for harmonisation, they only model categorical site effects and cannot account for continuous sources of confounding such as image quality, head motion, and acquisition parameters. ComCat preserves biologically relevant covariates while removing the effects of categorical site indicators and continuous nuisance variables, the latter modelled as smooth nonlinear functions via B-spline basis expansion. The method is applicable across a broad range of brain imaging modalities and demonstrates improved preservation of biological signal while removing unwanted variation.
Why it matters: Batch effects and unwanted technical variation are pervasive in computational biology and biomedical imaging, and ComBat-family harmonisation methods have become standard tools for addressing them. However, the restriction to categorical covariates is a significant limitation: many important sources of confounding — motion, image quality, temperature, reagent lots — are continuous gradients rather than discrete categories. ComCat's extension to continuous covariates modelled via B-splines is a technically elegant solution that fills this gap. The ability to model continuous confounders nonlinearly is important because the relationship between confounder intensity and biological measurement is often non-linear.
Why for Yiru: Batch effect correction is equally important in TME multi-omics and imaging studies, where samples are often collected across multiple centres, processed with different reagent lots, and imaged on different platforms. ComCat's ability to model continuous covariates such as tissue preservation quality, sequencing depth, or staining intensity as smooth nonlinear functions is directly applicable to TME data harmonisation. The B-spline approach could be extended to harmonise spatial transcriptomics data where continuous covariates like tissue section thickness or RNA degradation scores need to be regressed out while preserving biological spatial patterns.
Comorbidity structure as an inductive bias: Comparing output-head designs for multi-label prediction of diabetes and myocardial infarction complications
bioRxiv Published 2026-06-23 preprint DOI: 10.64898/2026.06.18.733068
multi-label learning comorbidity inductive bias deep learning clinical prediction diabetes myocardial infarction
Summary: Investigates whether incorporating comorbidity structure as an inductive bias in output-head design improves multi-label prediction of clinical complications. The central premise is that label-dependence mechanisms — how complications co-occur — are explicit hypotheses about disease biology, not generic modelling additions. Comparing six output-head architectures (independent baseline, linear additive, multiplicative, symmetric conditional random field, residual MLP, and combined additive-multiplicative) across two clinically distinct multi-label prediction tasks — Type 2 diabetes complications (nephropathy, neuropathy, retinopathy) and myocardial infarction complications — the study finds that explicitly modelling label dependencies through structured output heads consistently outperforms independent prediction, with the optimal architecture depending on the disease context.
Why it matters: Clinical prediction models typically treat each complication as an independent binary outcome, ignoring the rich structure of comorbidity — that certain complications tend to co-occur because they share pathophysiological mechanisms. This study formalises comorbidity structure as an inductive bias in neural network architecture design, showing that modelling label dependencies improves prediction. The finding that different label dependency architectures suit different diseases suggests that output-head design should be informed by disease biology rather than chosen by convention. This principle — using domain knowledge about label relationships to guide architecture design — generalises beyond clinical prediction to any multi-label problem where labels have known dependencies.
Why for Yiru: Multi-label prediction with structured label dependencies is relevant to TME computational problems where multiple correlated outcomes need to be predicted simultaneously — for example, predicting which immune checkpoint molecules are upregulated in a tumour, which immune cell subtypes are enriched, or which clinical outcomes are likely. The principle that label-dependency structure should inform model architecture is applicable to TME biomarker panels where individual markers are correlated. The comparison of different output-head architectures provides practical guidance for designing TME prediction models that capture correlated outcomes.
Natural selection on synonymous genetic variation in the major histocompatibility complex
bioRxiv Published 2026-02-23 preprint DOI: 10.64898/2026.02.23.707394
MHC synonymous variation natural selection codon usage immunogenetics codon bias translational selection
Summary: Presents evidence of natural selection on synonymous genetic variation in the major histocompatibility complex (MHC) — the most polymorphic region of the vertebrate genome. Synonymous nucleotide variation that does not alter amino acid sequence can still influence phenotypes through effects on mRNA stability, splicing, translation efficiency, and protein folding. Using data from a wild population of Great Reed Warblers, the study shows that codon usage in exon 3 of MHC class I genes is under strong purifying selection in 56 out of 87 codon sites. This selective constraint on synonymous sites suggests that synonymous variation in MHC genes is not evolutionarily neutral but is shaped by selection, likely because it affects the efficiency and accuracy of MHC molecule expression — which in turn influences antigen presentation capacity and pathogen resistance.
Why it matters: The textbook view of synonymous mutations as evolutionarily neutral has been progressively overturned, but evidence for selection on synonymous sites in immune genes — which must balance rapid evolution for pathogen recognition with structural constraint — has been limited. This study provides clear evidence that synonymous variation in MHC genes is subject to purifying selection, implying functional importance. The finding has implications for our understanding of MHC evolution, disease association studies (where synonymous MHC variants are typically ignored), and the interpretation of synonymous mutations in immune genes in general. If synonymous variation affects MHC expression levels, it could influence the threshold for T cell activation and thus susceptibility to infectious and autoimmune diseases.
Why for Yiru: MHC expression levels are a critical determinant of anti-tumour immune recognition — tumours frequently downregulate MHC-I to evade CD8+ T cell killing. The finding that synonymous MHC variation is under selection suggests that even synonymous germline variants in MHC genes could influence antigen presentation capacity and thus cancer immune surveillance. In TME research, understanding why some individuals mount stronger anti-tumour immune responses than others has focused on non-synonymous MHC variation (HLA type), but this study suggests synonymous variation might also contribute. Computational TME studies correlating germline MHC variation with tumour immune infiltration could test whether synonymous MHC variants influence TME composition and immunotherapy outcomes.