Research Radar — 2026-05-07
Methods & AI
Computational
scHG: A supercell framework with high-order graph learning enables scalable multi-omics analysis
PLOS Computational Biology Published 2026-05-06 research article DOI: 10.1371/journal.pcbi.1013851
multi-omics graph neural networks single-cell clustering spatial biology
Summary: Introduces the supercell paradigm for multi-omics clustering, grouping expression-coherent cells into intermediate units using angle-aware similarity and second-order co-occurrence neighbors. scHG, a high-order graph learning framework with omics-weighted optimizer, outperforms state-of-the-art methods across six benchmark datasets (up to 30,672 cells), improving mean ARI by 3.97% and reducing runtime by 26.40%. Notably resolves rare populations including dendritic cells and NK-like B cells hidden by standard pipelines.
Why it matters: The supercell approach bridges the gap between single-cell resolution and computational tractability for large-scale multi-omics integration, with direct relevance to rare cell detection in tumor microenvironments.
Why for Yiru: Multi-omics integration with graph learning is directly applicable to spatial transcriptomics and tumor microenvironment analysis.
Learning the Language of the Microbiome with Transformers
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.02.722381
foundation models transformers microbiome self-supervised learning benchmarking
Summary: Presents Atlas, a pretraining dataset of over 539,000 microbiome datapoints, and the Waypoint family of GPT-2 style causal language models (6M–170M parameters) for microbiome analysis. Introduces Compass, a curated benchmark of eight predictive tasks including biome classification, drug-microbiome interactions, and infant gut development. Pretrained transformers begin to reliably outperform classical methods once training data exceeds ~10,000 examples.
Why it matters: Establishes the first comprehensive foundation model framework for microbiome data, demonstrating that self-supervised pretraining at scale yields significant improvements across diverse downstream tasks.
Why for Yiru: Foundation model approaches applied to biological sequence data are directly relevant to building analogous models for single-cell and spatial omics.
Bridging LLM Reasoning and Chemical Knowledge via an Evolutionary Multi-Agent Framework for Molecular Synthesis
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.02.722342
large language models drug discovery multi-agent systems reinforcement learning molecular design
Summary: Proposes EvoSyn, an evolutionary multi-agent framework that synergizes LLM reasoning with domain experts for molecular synthesis. Uses a dual-process evolutionary paradigm: co-evolving linguistic capabilities with multi-objective constraints and self-evolving through a Markov Game formulation. Domain feedback penalizes invalid proposals and grounds generation in feasible reaction pathways. Significantly outperforms state-of-the-art baselines on comprehensive benchmarks.
Why it matters: Demonstrates how LLM-guided evolution with rigorous domain validation can overcome hallucination problems in generative molecular design, producing molecules that are both bioactive and synthetically actionable.
Why for Yiru: Multi-agent LLM frameworks and evolutionary optimization strategies are applicable to biological sequence design, including peptide and antibody engineering.
UNKAI: A protein functional identity prediction model based on ESM-C latent representations and the attention mechanism
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.02.722384
protein language models ESM attention mechanism enzyme function deep learning
Summary: Develops a deep learning method to predict whether two proteins catalyze the same enzymatic reaction, using ESM Cambrian (ESM C) latent representations processed through an attention-based neural network. Outperforms existing methods including sequence similarity and AlphaFold-based approaches. Attention weight analysis reveals autonomous highlighting of catalytic and binding residues, eliminating the need for manual feature engineering.
Why it matters: Shows that protein language model embeddings combined with attention mechanisms can achieve interpretable enzyme function prediction without structural information, democratizing functional annotation.
Why for Yiru: Attention-based interpretation of protein language models provides a blueprint for interpretable deep learning in biological sequence analysis.
Tumor cell specific total mRNA expression informed neural networks predicts cancer progression
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.01.722212
deep learning cancer genomics multi-omics prognosis transcriptomics
Summary: Presents TmSNet, a deep learning framework that predicts tumor cell-specific total mRNA expression (TmS) from mRNA, DNA methylation, miRNA, and immune cell proportions. Integrates structured feature selection (gradient boosting, LASSO, elastic net) with specialized neural architectures. Achieves cross-validated CCC up to 0.93 across 12 TCGA cancer types and generalizes to external cohorts. Predicted TmS effectively stratifies patients by risk.
Why it matters: Provides a scalable, alignment-free method for inferring tumor transcriptional activity without matched DNA sequencing, enabling analysis of large heterogeneous cohorts.
Why for Yiru: Multi-omic feature integration with neural networks for cancer prognosis is directly relevant to building clinically applicable models from spatial and single-cell data.
STAT: A multi-agent framework for integrated and interactive spatial transcriptomics analysis
bioRxiv Published 2026-05-05 preprint DOI: 10.64898/2026.05.01.722244
spatial transcriptomics multi-agent systems large language models interactive analysis benchmarking
Summary: Introduces STAT, a multi-agent framework making spatial transcriptomics analysis conversational and interactive. Features a persistent session, shared tissue viewer, and staged skill-aware pipeline. Outperforms baseline LLMs and existing autonomous agents across 11 analytical task categories on three spatial platforms. Successfully reproduces published Visium HD colorectal cancer findings from natural language prompts alone.
Why it matters: Represents a practical integration of LLM agents with spatial biology workflows, maintaining transparency and user control while dramatically reducing analysis overhead.
Why for Yiru: Multi-agent LLM frameworks for spatial transcriptomics analysis are directly relevant to building intelligent analysis pipelines for spatial omics data.
Biomedical discoveries
Biomedicine
Multispecific nanobody degraders co-deplete membrane receptors and enable targeted delivery of diverse payloads
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.02.722401
targeted protein degradation nanobody ADC PROTAC cancer therapy EGFR cMET
Summary: Develops MINDS (Multivalent Interchangeable Nanobody Degradation System), a modular nanobody-Fc chassis co-engaging EGFR, cMET, and TfR1 for lysosomal co-depletion and intracellular payload delivery. Tritazumab achieves picomolar degradation potency with near-maximal depletion within ~1.5 hours. BRD4 molecular glue conjugate improved selectivity window >100-fold; EZH2 PROTAC conjugate achieved ~1,000-fold increase in intracellular degradation potency versus free PROTAC.
Why it matters: A platform technology integrating multispecific receptor degradation with diverse payload delivery (cytotoxic, molecular glue, PROTAC) that could transform targeted cancer therapy by addressing receptor heterogeneity and compensatory signaling.
Why for Yiru: Multispecific targeted degradation and payload delivery is highly relevant to immunotherapy and tumor microenvironment engineering.
Local translation drives glioblastoma heterogeneity and tumor invasion
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.02.722387
glioblastoma tumor invasion spatial transcriptomics subcellular translation tumor microenvironment
Summary: Establishes local protein translation as a fundamental driver of tumor microtube (TM) dynamics and invasive cell states in glioblastoma. Using subcellular transcriptomics integrating organelle organization with spatially resolved transcriptomics, reveals that TM gene expression drives cell state identity. Targeted disruption of TM-localized translation via photoswitchable puromycin and knockdown of TM-enriched proteins GPM6A and GAP43 impairs invasion and reduces tumor growth.
Why it matters: Identifies a previously unappreciated subcellular mechanism driving glioblastoma invasion, with direct therapeutic implications for targeting local translation to block brain colonization.
Why for Yiru: Subcellular spatial transcriptomics approach and the link between local translation and tumor cell state plasticity are highly innovative and relevant to spatial multi-omics methods development.
Glutamine-dependent downregulation of FLT3-ITD is a mechanism of FLT3 inhibitor resistance in FLT3-ITD AML in hypoxia
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.02.722336
AML FLT3-ITD drug resistance tumor microenvironment hypoxia glutamine metabolism
Summary: Reveals that hypoxia, characteristic of the bone marrow niche, causes 3–5-fold increase in FLT3 inhibitor IC50 through glutamine-dependent upregulation of the ubiquitin ligase c-CBL, accelerating FLT3-ITD proteasomal degradation (half-life 1.0 vs 2.5 hours). Glutaminase inhibitor telaglenastat abrogates c-CBL upregulation, preserves FLT3-ITD expression, and synergizes with FLT3 inhibitors in hypoxia.
Why it matters: Explains the clinical observation that FLT3 inhibitors clear blood blasts but not bone marrow blasts, and identifies a metabolic intervention (glutaminase inhibition) that restores sensitivity.
Why for Yiru: Metabolic microenvironment-driven drug resistance is a critical theme in tumor immunology and directly relevant to understanding spatial heterogeneity in treatment response.
Retroelement Hypomethylation Links Hypoxia Signaling, Immune Phenotypes, and Survival in Clear Cell Renal Cell Carcinoma
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.01.722263
ccRCC epigenetics retrotransposons tumor microenvironment immune infiltration cGAS-STING
Summary: Identifies three reproducible RE methylation subtypes (Repressed, Transient, Active) in ccRCC through genome-wide prediction across Alu, LINE-1, and LTR elements. Active subtype shows significantly worse survival, reduced EPAS1/HIF2A expression, increased immune infiltration, elevated PD-1, and heightened cGAS-STING/interferon signaling — an immune-inflamed yet immunosuppressed state. Findings validated in CPTAC and independently replicated in an institutional cohort.
Why it matters: Establishes retroelement methylation as a novel molecular classifier in ccRCC linking epigenetic dysregulation to immune phenotypes, with potential for improving risk stratification.
Why for Yiru: Integration of epigenomic features with tumor immune microenvironment phenotypes using multi-omic computational approaches is a core interest.
Integrated Multi-Omics Identifies Lineage-Dependent Myeloid Cells Recruitment and the APP-CD74 Axis as an Immunoregulatory Target in Pediatric High-Grade Glioma
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.01.722277
DIPG pediatric glioma tumor-associated macrophages tumor microenvironment immunotherapy spatial biology
Summary: Uses bulk RNA-seq (26 DIPG autopsy specimens), scRNA-seq (8 patients), and CellChat analysis to reveal that DIPG tumors actively recruit monocytes through chemokine-mediated mechanisms driven by mesenchymal-like lineage state. Identifies APP-CD74 signaling as a prominent tumor-TAM interaction pathway. APP suppression in tumors attenuates proinflammatory TAM activity. Protein docking identifies the APP-CD74 binding interface for therapeutic targeting.
Why it matters: Identifies a druggable tumor-myeloid communication axis in a devastating pediatric brain cancer with no effective treatments, providing a structural basis for therapeutic development.
Why for Yiru: Multi-omics deconvolution of tumor-immune communication with structural follow-up is an exemplary workflow for translational computational immunology.
Spatial transcriptomics identifies a translayer architecture of pyroptosis-related transcription in systemic sclerosis skin
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.03.722547
spatial transcriptomics pyroptosis systemic sclerosis autoimmunity skin inflammasome
Summary: Reanalyzes public Visium skin sections (4 healthy, 9 systemic sclerosis) revealing a conserved translayer pyroptosis architecture: epidermal NLRP1/PYCARD/CASP4 bias vs. dermal GSDMD bias. This spatial separation is detectable in healthy skin and enhanced in SSc. Spatial deconvolution shows dermal GSDMD associated with endothelial abundance. Findings replicated in independent cohort (10 SSc sections).
Why it matters: Demonstrates that inflammatory programs in autoimmune disease have a reproducible spatial architecture that may require compartment-specific therapeutic targeting rather than systemic inhibition.
Why for Yiru: Spatial transcriptomics deconvolution of immune programs in inflammatory disease is directly relevant to building spatial analysis pipelines for tumor and autoimmune microenvironments.
Cross-disciplinary watchlist
Other Fields
ArchaicSeeker 3.0: A deep-learning framework for scalable, haplotype-resolved inference of archaic introgression
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.05.722798
deep learning population genetics human evolution local ancestry inference genomics
Summary: Introduces ArchaicSeeker 3.0, a deep-learning framework for haplotype-resolved detection of archaic introgression. Integrates tract-scale sequence modeling with overlap-aware reassembly and boundary refinement. Simulation-trained model avoids inference-time recalibration, outperforming existing methods in precision, recall, and F1 score. Applied to 3,453 genomes from 209 populations, identifies novel introgressed regions with locus-level phylogenetic support.
Why it matters: Advances deep learning in population genetics by providing a scalable, assumption-free framework for detecting archaic ancestry that generalizes across diverse demographic scenarios.
Why for Yiru: Deep learning frameworks for sequence-based inference that achieve robustness without demographic assumptions are methodologically relevant to building generalizable models for biological sequence analysis.
Simple baselines rival protein language models in mutation-dense design tasks
bioRxiv Published 2026-05-06 preprint DOI: 10.64898/2026.05.01.722313
protein language models benchmarking protein design machine learning methodology
Summary: Benchmarks widely used protein language models against conventional baselines in dense, experimentally validated multi-mutant landscapes. Finds that regardless of architecture and parameter count, pLMs are statistically similar to one another and none consistently outperforms conventional methods. Zero-shot functional variant discrimination is comparable to homology-based methods. Suggests pLMs may need biophysical/structural priors for protein function design.
Why it matters: A sobering reality check for the protein language model field, demonstrating that simpler methods match or exceed pLMs on the hardest design tasks — important for directing future ML research efforts.
Why for Yiru: Critical methodological benchmarking of foundation models against simple baselines is an essential practice that should be applied to spatial and single-cell foundation models as well.
Uncertainty-aware localization microscopy by variational diffusion
bioRxiv Published 2026-05-05 preprint DOI: 10.64898/2026.05.01.722206
diffusion models variational inference super-resolution microscopy uncertainty quantification computer vision
Summary: Proposes a conditional variational diffusion model (CVDM) for kernel density estimation in single-molecule localization microscopy. Models a probability distribution over high-resolution solutions to the ill-posed inverse problem of localizing fluorescent molecules in dense images. Enables uncertainty quantification of reconstructed images, a capability absent from existing deep models for localization microscopy.
Why it matters: Introduces uncertainty quantification to deep learning-based super-resolution microscopy, enabling researchers to assess confidence in reconstructed molecular localizations — critical for biological interpretation.
Why for Yiru: Uncertainty-aware generative models for imaging are methodologically relevant to spatial transcriptomics analysis, where confidence in spatial feature detection is essential.