Research Radar — 2026-06-03
Methods & AI
Computational
FLOWR: flow matching for structure-aware de novo, interaction- and fragment-based ligand generation
Nature Computational Science Published 2026-05-28 research article DOI: 10.1038/s43588-026-00998-8
flow matching ligand generation drug discovery structure-based design generative model protein-ligand equivariant computational chemistry
Summary: Introduces FLOWR, a generative model for structure-based drug design that combines continuous and categorical flow matching with equivariant optimal transport to generate drug-like ligands conditioned on protein binding pockets. Structure-based de novo ligand generation — designing drug candidates directly from a protein 3D structure — has been dominated by diffusion models, which iteratively denoise random coordinates into valid molecular geometries. While powerful, diffusion models are inherently slow because they require hundreds of denoising steps. Flow matching offers an alternative generative paradigm that learns a continuous transformation (flow) from a simple prior distribution to the target data distribution, typically requiring far fewer sampling steps. FLOWR adapts this framework to the challenging domain of 3D molecular generation by combining continuous flow matching for atomic coordinates with categorical flow matching for discrete atom and bond types, all within an SE(3)-equivariant architecture that respects the symmetries of 3D space. Crucially, FLOWR unifies three ligand design modes in a single model: de novo generation from scratch given only a protein pocket, fragment-based generation that grows molecules from a starting core, and interaction-conditional generation that optimizes ligands for specific protein-ligand contacts. The model achieves up to 70-fold faster inference compared to diffusion-based methods while improving the physical validity (bond geometry, valency) and interaction accuracy (hydrogen bonds, hydrophobic contacts) of generated compounds.
Why it matters: The speed improvement alone — 70-fold faster than diffusion — is transformative for practical drug discovery workflows where millions of compounds may need to be generated and scored. But the deeper advance is conceptual: flow matching provides a simpler, more efficient alternative to diffusion models that may become the default generative paradigm for molecular design, much as it has displaced diffusion in image and video generation. The unified framework that handles de novo, fragment-based, and interaction-conditional generation in a single model eliminates the need for separate tools for different design tasks, streamlining the computational chemistry pipeline. Faster, more accurate structure-based ligand generation has direct implications for tackling undruggable targets and accelerating hit-to-lead optimization.
Why for Yiru: Computational drug discovery methods are increasingly relevant to TME-targeted therapeutics — for example, designing small molecules that disrupt immunosuppressive protein-protein interactions (PD-L1/PD-1, CD47/SIRPα, chemokine-receptor axes) or that reprogramme macrophage polarization. FLOWR interaction-conditional generation mode could be used to design ligands that specifically target TME-relevant protein interfaces identified from structural biology or AlphaFold predictions. More broadly, the flow matching framework could be adapted to other biological generation tasks — designing guide RNA sequences for CRISPR, generating promoter sequences for cell-type-specific expression, or designing peptide neoantigens for personalised cancer vaccines.
Estimation of direct and indirect polygenic effects and gene–environment interactions using polygenic scores in case–parent trio studies
Nature Genetics Published 2026-06-02 research article DOI: 10.1038/s41588-026-02601-2
polygenic score PGS trio study gene-environment interaction statistical genetics family-based heritability GWAS
Summary: Introduces PGS-TRI, a statistical framework designed for analyzing polygenic scores (PGS) in case-parent trio studies that disentangles direct genetic effects (the causal effect of an individual own genotype) from indirect genetic effects (effects mediated through the family environment, also called genetic nurture) and gene-environment interactions. Polygenic scores — aggregate measures of genetic risk computed from genome-wide association study (GWAS) summary statistics — are increasingly used to predict disease risk, but their interpretation is complicated by confounding from family environment and population structure. In trio studies where both parents and an affected child are genotyped, the parental genotypes that are not transmitted to the child provide a natural control for family-level confounding: any association between non-transmitted parental alleles and child outcomes must reflect indirect genetic effects or population stratification rather than direct genetic effects. PGS-TRI formalizes this intuition into a rigorous regression framework that jointly models the transmitted and non-transmitted PGS to estimate direct effects, indirect effects (genetic nurture), and their interactions with environmental exposures. The method accounts for the correlation between transmitted and non-transmitted alleles due to assortative mating and can incorporate measured environmental variables to test for gene-environment interaction. Applied to large trio datasets, PGS-TRI reveals that indirect genetic effects contribute substantially to the observed PGS-outcome associations for several complex traits, highlighting the importance of family-based designs for accurate genetic risk prediction.
Why it matters: As polygenic scores move toward clinical implementation — for risk stratification, screening recommendations, and reproductive decision-making — understanding what they actually measure becomes critical. The finding that indirect genetic effects (the family environment shaped by parental genotypes) account for a substantial portion of PGS associations means that PGS-based risk predictions may be less about an individual own biology and more about their family context than commonly assumed. PGS-TRI provides a principled framework for decomposing these effects, which is essential for both accurate risk communication and for designing interventions that target the appropriate level (individual vs. family). The gene-environment interaction component is also important because it allows testing whether genetic risk is modifiable by environmental factors — a question central to precision public health.
Why for Yiru: The methodological framework of decomposing direct vs. indirect effects is conceptually transferable to other contexts. In the TME, one could ask analogous questions: does a tumour mutation (direct effect) drive immune evasion directly, or does it reshape the microenvironment (indirect effect) that then suppresses immunity? Family-based study designs are increasingly used in cancer genomics — for example, studying germline variants that affect TME composition or immunotherapy response in related individuals. The statistical machinery developed in PGS-TRI for handling correlated genetic effects could inform methods for analyzing multi-omic data from related samples in cancer studies. More broadly, the gene-environment interaction framework connects to Yiru interest in how genetic background (germline or somatic) interacts with TME context to determine disease outcomes.
Histology-informed spatial domain identification through multi-view graph convolutional networks
PLOS Computational Biology Published 2026-06-01 research article DOI: 10.1371/journal.pcbi.1014281
spatial transcriptomics histology graph neural network spatial domain multi-view clustering tissue architecture computational biology
Summary: Presents STESH, a spatial transcriptomics clustering method that integrates gene Expression, Spatial location, and Histology images through a multi-view graph convolutional network with attention mechanisms. Identifying spatial domains — regions of tissue with coherent gene expression and cellular composition — is a fundamental task in spatial transcriptomics analysis. Most existing methods rely primarily on gene expression and spatial coordinates, but histology images (H&E-stained tissue sections routinely collected alongside spatial transcriptomics data) contain rich morphological information about tissue architecture, cell density, and extracellular matrix organization that is not captured by transcriptomics alone. STESH addresses this gap by extracting histological features through a convolutional neural network and constructing multiple graph views: an expression view (gene expression similarity), a spatial view (physical proximity), a histology view (morphological similarity), and a collaborative convolution view that captures cross-modal interactions. These views are integrated through a multi-view graph convolutional network with a learnable attention mechanism that weights each view contribution per spatial domain. A decoder reconstructs the input features to ensure that the learned representations retain biologically meaningful information. Evaluated across multiple tissue types (brain, kidney, skin, tumour) and technology platforms (10x Visium, Slide-seq, MERFISH), STESH consistently outperformed ten state-of-the-art methods in clustering accuracy as measured by adjusted Rand index, normalized mutual information, and Fowlkes-Mallows index.
Why it matters: Spatial domain identification is the first analytical step in virtually every spatial transcriptomics study — it defines the tissue regions that downstream analyses (differential expression, cell-cell communication, niche analysis) operate on. The integration of histology images is a natural and underexploited source of information: pathologists have used H&E morphology to define tissue regions for over a century, and STESH essentially brings this histological expertise into computational spatial analysis. The multi-view graph framework is also methodologically sound — it respects the fact that different data modalities (expression, space, morphology) capture different aspects of tissue organization and should be integrated adaptively rather than simply concatenated. The strong performance across diverse tissue types and platforms suggests that STESH could become a standard preprocessing step in spatial transcriptomics pipelines.
Why for Yiru: Spatial domain identification in tumour samples is critical for TME analysis — the TME is not homogeneous but organized into distinct niches (invasive front, tumour core, immune-infiltrated regions, stromal compartments, perivascular niches) with different cellular compositions and functions. STESH could improve the resolution and accuracy of TME niche identification by leveraging histology — for example, distinguishing desmoplastic stroma from immune-infiltrated stroma based on H&E texture, or identifying necrotic regions that confound transcriptomic-based clustering. The multi-view attention mechanism could also reveal which modality (expression, spatial proximity, or histology) is most informative for defining specific TME niches, providing biological insight into what defines each microenvironment.
Supervised deep learning with gene functional annotation for cell classification
PLOS Computational Biology Published 2026-06-01 research article DOI: 10.1371/journal.pcbi.1014327
single-cell deep learning gene annotation graph neural network cell classification functional genomics protein-protein interaction computational biology
Summary: Develops SDAN (Supervised Deep learning with gene functional ANnotation), a method that integrates gene functional annotation information — specifically protein-protein interaction networks — with single-cell gene expression profiles through a graph neural network for interpretable cell classification. Single-cell RNA sequencing (scRNA-seq) differential expression analysis routinely identifies hundreds or thousands of statistically significant genes, but with extremely small p-values and negligible effect sizes in large datasets, making biological interpretation challenging. The core problem is that standard gene-by-gene testing treats genes as independent entities, ignoring the functional relationships encoded in protein-protein interaction networks, pathways, and gene ontology annotations. SDAN addresses this by constructing a gene functional graph where nodes are genes and edges represent known functional relationships (physical interactions, shared pathways, co-expression), then using a graph neural network to learn which functionally coherent gene sets best discriminate between cell types or conditions of interest. The model outputs both cell-level classification scores and gene-set-level importance scores, enabling identification of the specific biological processes that define each cell population. The authors demonstrate SDAN on three real-world applications: identifying gene sets associated with severe COVID-19, dementia, and cancer immunotherapy response. Across all three, SDAN consistently outperformed existing methods in both classification accuracy and interpretability of the selected gene sets.
Why it matters: The gap between statistical significance and biological significance is one of the most frustrating challenges in single-cell genomics — getting a list of 2,000 differentially expressed genes with p < 10⁻³⁰⁰ is not biologically informative. SDAN bridges this gap by incorporating prior biological knowledge (gene functional relationships) directly into the model architecture, effectively constraining the solution space to functionally interpretable gene sets. This approach is conceptually similar to pathway-based and network-based methods but with the added power of deep learning to capture non-linear relationships and interactions. The consistent performance across diverse disease contexts (infectious disease, neurodegeneration, cancer immunotherapy) suggests that the framework is generalizable and could become a standard tool for interpretable single-cell analysis.
Why for Yiru: Identifying the functionally coherent gene programs that define TME cell states — exhausted vs. effector T cells, M1 vs. M2 macrophages, inflammatory vs. myofibroblastic CAFs — is a central challenge in TME computational biology. SDAN could be applied to TME single-cell atlases to identify the gene sets (pathways, complexes, functional modules) that most accurately classify immunologically relevant cell states and predict clinical outcomes. The gene-set-level interpretability is particularly valuable for generating testable hypotheses: rather than saying "Gene X is differentially expressed in immunotherapy responders," SDAN can say "the antigen presentation machinery module is the key discriminator." This shifts interpretation from single-gene to pathway-level thinking, which is more aligned with how biological function actually operates.
Challenges and progress in RNA velocity: Comparative analysis across multiple biological contexts
PLOS Computational Biology Published 2026-06-01 research article DOI: 10.1371/journal.pcbi.1014303
RNA velocity benchmarking single-cell trajectory inference scRNA-seq method comparison computational biology transcriptomics
Summary: Presents a systematic comparative analysis of five RNA velocity methods across three diverse biological datasets, evaluating performance on local consistency, method agreement, identification of driver genes, and robustness to sequencing depth. RNA velocity leverages the ratio of unspliced to spliced mRNA reads in single-cell RNA-seq data to predict the future transcriptional state of individual cells — whether a cell is transitioning toward a different state (inducing genes) or maintaining its current identity (steady state). Since its introduction, RNA velocity has become one of the most widely used tools for inferring cellular trajectories and developmental dynamics, with multiple methodological variants (steady-state model, dynamical model, deep learning-based approaches) but no systematic comparison of their relative performance. This benchmark evaluates five representative RNA velocity methods on three datasets spanning different biological contexts: hematopoiesis (well-characterized differentiation hierarchy), reprogramming (large-scale state transition), and developmental timecourse. The authors find substantial variability across methods in terms of velocity vector consistency, agreement on driver genes, and robustness to downsampling. Key findings include: no single method dominates across all metrics and datasets; deep learning-based methods show improved robustness to sequencing depth but reduced interpretability; and local consistency (agreement of velocity vectors among neighboring cells) varies dramatically depending on dataset complexity and method choice.
Why it matters: RNA velocity has become a default analysis in nearly every single-cell study that examines dynamic processes, but its underlying assumptions (constant transcriptional rates, steady-state approximation) are frequently violated in real biological systems. This benchmark provides the community with much-needed guidance on method selection and highlights the conditions under which RNA velocity is reliable versus misleading. The finding that method performance is highly context-dependent means that researchers cannot simply apply a default method and trust the results — they need to validate velocity inferences with multiple methods and orthogonal approaches. This is also a cautionary tale about the rapid adoption of computational methods before thorough benchmarking, a pattern that has repeated across genomics (differential expression, clustering, trajectory inference).
Why for Yiru: RNA velocity is commonly applied to TME single-cell data to infer dynamic processes such as T cell activation, macrophage polarization, and epithelial-mesenchymal transition in tumour cells. The benchmarking results are directly actionable — they inform which velocity methods are most appropriate for TME datasets, which typically have complex differentiation hierarchies, high transcriptional noise, and variable sequencing depth. The finding that deep learning-based methods are more robust to sequencing depth is particularly relevant for clinical TME samples where sequencing depth is often limited. More broadly, this benchmark exemplifies the type of rigorous method comparison that the computational biology field needs more of — and serves as a template for benchmarking other trajectory inference and dynamic modeling methods.
Biomedical discoveries
Biomedicine
Targeting tumor-intrinsic STK40 induces immune vulnerability and drives T cell reinvigoration
Cancer Cell Published 2026-05-28 research article DOI: 10.1016/j.ccell.2026.05.001
STK40 immune evasion hepatocellular carcinoma tumour-intrinsic IFN-gamma cDC1 T cell immunotherapy
Summary: Identifies the serine/threonine kinase STK40 as a central regulator of immune evasion in hepatocellular carcinoma (HCC) that operates through a dual mechanism: suppressing tumour-intrinsic interferon-gamma (IFN-γ) responsiveness while simultaneously inhibiting tumour-extrinsic type 1 conventional dendritic cell (cDC1)-mediated T cell activation. Tumour immune evasion — the ability of cancer cells to avoid destruction by the immune system — is typically studied through the lens of checkpoint molecules (PD-L1, CTLA-4) or immunosuppressive cell recruitment. However, tumour-intrinsic signalling pathways that actively suppress immune recognition are less well characterized but represent attractive therapeutic targets because they are less susceptible to the adaptive resistance mechanisms that limit checkpoint inhibitor efficacy. Through a functional genomics screen in HCC models, the authors identify STK40 as a kinase whose inhibition restores IFN-γ signalling in tumour cells — including upregulation of MHC class I, antigen presentation machinery, and chemokines that recruit T cells — while simultaneously enhancing the activation and tumour-infiltration of cDC1 cells, the dendritic cell subset most critical for priming anti-tumour CD8+ T cell responses. Mechanistically, STK40 acts through two parallel pathways: in tumour cells, it phosphorylates and inactivates STAT1, the master transcription factor downstream of IFN-γ receptor signalling; in the tumour microenvironment, it suppresses cDC1 maturation through a paracrine mechanism involving reduced production of DC-activating cytokines. Pharmacological inhibition or genetic ablation of STK40 in mouse HCC models leads to robust tumour regression that depends on both CD8+ T cells and cDC1 cells, and STK40 inhibition synergizes with anti-PD-1 therapy.
Why it matters: This study identifies a single molecular target (STK40) that simultaneously dismantles two major barriers to anti-tumour immunity — tumour-intrinsic IFN-γ insensitivity and defective dendritic cell activation — making it an exceptionally attractive therapeutic target. HCC is the most common primary liver cancer and the third leading cause of cancer death worldwide, with limited response rates to current immunotherapies. The finding that STK40 is a druggable kinase (kinases are among the most successfully targeted protein classes in oncology) opens a direct path to clinical development. More broadly, the dual tumour-intrinsic and tumour-extrinsic mechanism of STK40 illustrates a general principle: the most potent immune evasion strategies operate at multiple levels simultaneously, and the most effective therapies will need to counteract this multi-layered evasion. The synergy with anti-PD-1 also suggests a rational combination strategy for clinical testing.
Why for Yiru: This study is directly relevant to TME computational biology at multiple levels. First, the STK40-driven immune evasion program could be computationally profiled across cancer types using public single-cell and bulk transcriptomic data — is STK40 activity a general mechanism of immune evasion beyond HCC? Second, the dual cell-intrinsic and cell-extrinsic mechanism illustrates the importance of multi-cellular computational models that capture both tumour-autonomous signalling and TME cell-cell communication. Third, the paracrine mechanism by which tumour STK40 activity suppresses DC maturation could be modeled using spatial transcriptomics data to identify the signalling molecules mediating this cross-talk. Identifying additional tumour-intrinsic immune evasion kinases through computational analysis of kinase activity signatures in immunotherapy-resistant tumours is a promising direction.
Human haematopoietic stem cells remember inflammatory stress
Nature Published 2026-05-27 research article DOI: 10.1038/s41586-026-10522-7
haematopoietic stem cell HSC inflammatory memory trained immunity single-cell multiomics epigenetics inflammation stem cell biology
Summary: Demonstrates that human haematopoietic stem cells (HSCs) retain a durable memory of inflammatory stress that persists after the resolution of inflammation, identified through xenograft inflammation-recovery models and single-cell multiomics profiling. The concept of trained immunity — that innate immune cells such as monocytes and macrophages can develop a form of memory, responding more robustly to secondary challenges after an initial inflammatory exposure — has been well established in mature immune cells. However, whether the most primitive haematopoietic cells, HSCs, can also retain inflammatory memory has been unclear and controversial. This study addresses this question using a powerful experimental system: human HSCs are transplanted into immunodeficient mice (xenografts), the mice are subjected to an acute inflammatory challenge (polyinosinic:polycytidylic acid, a viral mimetic), allowed to recover, and then the HSCs are re-isolated and profiled at single-cell resolution using combined transcriptomics, chromatin accessibility (ATAC-seq), and DNA methylation analysis. The authors identify a distinct HSC subpopulation that emerges after inflammatory recovery and persists long-term, characterized by durable chromatin accessibility changes at myeloid transcription factor binding sites, altered DNA methylation patterns at inflammatory gene loci, and a biased differentiation output toward the myeloid lineage upon secondary transplantation. This memory is functionally consequential: HSCs from inflammation-exposed mice generate more myeloid cells and fewer lymphoid cells when transplanted into secondary recipients, recapitulating the myeloid bias observed in aged and chronically inflamed individuals.
Why it matters: This study establishes that inflammatory memory extends to the very apex of the haematopoietic hierarchy — the HSC — with profound implications for understanding how acute infections, chronic inflammation, and ageing shape the immune system. The finding that a single inflammatory episode can durably alter HSC output toward myelopoiesis at the expense of lymphopoiesis provides a mechanistic explanation for the myeloid skewing observed in ageing (inflammageing), chronic inflammatory diseases (autoimmunity, obesity), and after severe infections (sepsis, COVID-19). It also raises important clinical considerations: therapies that mobilize HSCs (G-CSF for stem cell donation) or transplant HSCs (bone marrow transplantation) may be affected by the donor inflammatory history, and chemotherapy or radiotherapy — which cause massive inflammation — may leave lasting epigenetic scars on surviving HSCs that affect haematopoietic recovery and long-term immune function.
Why for Yiru: The concept of inflammatory memory in stem cells is directly relevant to the TME for several reasons. First, haematopoietic stem and progenitor cells can be mobilized to the tumour site where they differentiate into tumour-associated macrophages and neutrophils — if these HSCs carry inflammatory memory, it could shape the composition and function of the TME myeloid compartment. Second, cancer therapies (chemotherapy, radiotherapy, immunotherapy) cause systemic inflammation that could imprint HSCs and alter subsequent anti-tumour immune responses. Third, the single-cell multiomics approach used to identify the memory HSC population — combined transcriptomic, epigenomic, and functional assays — provides a template for studying how TME-derived signals imprint immune cell states in tumour-draining lymph nodes and bone marrow. Computational methods for detecting epigenetic memory from single-cell multiomics data are an active and relevant area of development.
15-strain live biotherapeutic product or same donor fecal microbiota transplant for recurrent Clostridioides difficile infection: a randomized phase 1b trial
Nature Medicine Published 2026-06-02 clinical trial (phase 1b) DOI: 10.1038/s41591-026-04442-2
microbiome live biotherapeutic fecal microbiota transplant Clostridioides difficile clinical trial defined consortium infectious disease gastroenterology
Summary: Reports results from a randomized, single-blind, parallel-group phase 1b clinical trial comparing a defined 15-strain live biotherapeutic product (MTC01) with conventional faecal microbiota transplantation (FMT) from the same donor for the treatment of recurrent Clostridioides difficile infection (rCDI). C. difficile is a leading cause of healthcare-associated infectious diarrhoea, and recurrent infections — which occur in 20-30 percent of patients after initial antibiotic treatment — are notoriously difficult to treat. Faecal microbiota transplantation, which involves transferring stool from a healthy donor to restore a disrupted gut microbiome, is highly effective (~90 percent efficacy) but suffers from practical limitations: donor screening is laborious, product standardization is impossible, and there are safety concerns about transferring undefined microbial communities (including potential pathogens). Defined live biotherapeutic products — consortia of characterized, manufactured bacterial strains — aim to replicate FMT efficacy with the standardization and safety of a pharmaceutical product. In this trial, 72 patients with rCDI were randomized to receive either MTC01 (15 bacterial strains derived from a single healthy donor stool) or conventional FMT from the same donor. At 8 weeks, both treatments showed high and comparable efficacy (approximately 85 percent cure rate) with similar engraftment of donor strains. MTC01 demonstrated a favorable safety profile, with no treatment-related serious adverse events.
Why it matters: This trial demonstrates that a defined, manufactured bacterial consortium can match the efficacy of FMT while offering the advantages of pharmaceutical-grade product: consistent composition, scalable manufacturing, rigorous safety testing, and freedom from the logistical and ethical challenges of stool banking. If confirmed in larger phase 2/3 trials, defined live biotherapeutic products could replace FMT as the standard of care for rCDI and expand access to microbiome-based therapies globally. More broadly, this represents an important proof of concept for the defined consortium approach to microbiome therapeutics — a strategy being pursued for indications ranging from inflammatory bowel disease to cancer immunotherapy enhancement to metabolic disorders. The finding that 15 strains are sufficient to recapitulate the therapeutic effect of whole stool suggests that the active ingredients of FMT may be simpler than the full microbial ecosystem.
Why for Yiru: The gut microbiome is increasingly recognized as a modulator of cancer immunotherapy response — specific bacterial species have been associated with improved responses to anti-PD-1 therapy in melanoma, lung cancer, and other tumours. The defined consortium approach validated in this trial could be applied to develop microbiome-based adjuncts to cancer immunotherapy: rather than performing FMT from immunotherapy responders (which has shown promise but faces the same standardization challenges), one could develop defined consortia of the specific bacterial strains associated with response. Computational analysis of microbiome sequencing data from immunotherapy trials — identifying the minimal set of bacterial species and functional pathways associated with response — would directly inform the design of such consortia. The engraftment and strain-tracking methods used in this trial are also relevant to studying microbiome dynamics in the TME context.
Decoding the origins of cellular self-organization for engineered biology
Nature Biotechnology Published 2026-06-01 perspective DOI: 10.1038/s41587-026-03161-w
self-organization stem cell embryo model synthetic biology tissue engineering morphogenesis developmental biology multicellularity
Summary: Presents a Perspective that positions cellular self-organization — the ability of cells to spontaneously generate complex, patterned structures without external instruction — as a foundational principle for the origin of multicellular life and discusses how decoding this principle with stem-cell-based embryo models will advance biological engineering. The emergence of multicellular organisms from single-celled ancestors required the evolution of mechanisms for cells to coordinate their behaviours — division, differentiation, migration, adhesion — to produce functional tissues and organs. The authors argue that self-organization, rather than a rigid genetic blueprint, is the key principle: cells follow simple local rules (responding to chemical gradients, mechanical forces, and neighbour interactions) that collectively produce complex emergent structures. Recent advances in stem-cell-based embryo models — blastoids, gastruloids, and other organoids that recapitulate early developmental events from pluripotent stem cells in vitro — have provided unprecedented experimental access to these self-organizing processes. The Perspective reviews the principles of self-organization across scales (from subcellular cytoskeletal dynamics to tissue-level morphogenesis), the experimental models that have revealed these principles, and the engineering opportunities they enable: designing synthetic tissues with predictable self-organizing behaviour, creating more physiologically relevant organoids for drug testing, and ultimately building functional replacement tissues and organs.
Why it matters: Understanding how cells self-organize is arguably the central unsolved problem in developmental biology and is increasingly recognized as essential for regenerative medicine and tissue engineering. The current paradigm of tissue engineering — seeding cells onto scaffolds with predefined geometry and hoping they form the desired tissue — has had limited success precisely because it ignores self-organization. This Perspective makes the case that the future of biological engineering lies in harnessing self-organization: providing cells with the right initial conditions and letting them build the tissue themselves, as they do during development. The practical implications span organoid technology (more reproducible and complex organoids), cell therapy (engineering cells that self-organize into functional tissue after transplantation), and synthetic biology (designing minimal self-organizing systems from the bottom up).
Why for Yiru: Self-organization principles are directly relevant to understanding TME structure and heterogeneity. Tumours are self-organizing systems — tumour cells, immune cells, and stromal cells follow local rules (gradients of oxygen, nutrients, cytokines, and mechanical cues) that collectively produce the complex spatial architecture of the TME. Computational models of self-organization could predict how TME architecture emerges from these local rules and how therapeutic interventions disrupt or restore normal tissue organization. Stem-cell-based models (tumour organoids, immune-tumour co-cultures) are increasingly used to study TME biology, and the self-organization perspective provides a framework for interpreting and engineering these models. Understanding how immune cells self-organize within tissues — forming tertiary lymphoid structures, immune exclusion zones, and infiltration fronts — is a frontier at the intersection of immunology, spatial biology, and computational modeling.
A Hormone Cell Atlas maps the human endocrine system at cellular resolution
Science Published 2026-05-28 research article DOI: 10.1126/science.aeb2672
hormone cell atlas endocrine single-cell transcriptomics systems biology receptor resource
Summary: Presents the Hormone Cell Atlas, a comprehensive resource mapping the expression of 379 hormone and receptor genes across 14 million single cells from multiple human tissues, providing the first systematic cellular-resolution view of the human endocrine system. Hormones coordinate physiological functions across distant tissues and organs, but our understanding of which cells produce which hormones and which cells express the corresponding receptors has been fragmented — pieced together from decades of individual studies using different techniques, species, and resolution levels. Drawing inspiration from the Human Cell Atlas initiative, the authors integrate and harmonize single-cell RNA-seq datasets spanning major endocrine organs (pituitary, thyroid, adrenal, pancreas, gonads) and hormone-responsive tissues (liver, adipose, muscle, bone, immune cells, brain regions) to systematically map the cellular sources and targets of every major human hormone. Key findings include: previously unappreciated extra-glandular hormone production (immune cells producing thyroid-stimulating hormone, adipocytes producing sex steroids); extensive hormone receptor co-expression patterns that define distinct cellular response modules; and identification of cell types that are central hubs in the endocrine network, receiving signals from and sending signals to multiple tissues. The atlas is made available as an interactive web resource enabling researchers to query hormone-receptor expression patterns across cell types and tissues.
Why it matters: The endocrine system has been studied for over a century, yet this is the first time its cellular architecture has been mapped systematically and comprehensively at single-cell resolution. The atlas transforms endocrinology from a gland-centric view (the thyroid produces thyroid hormone, the pancreas produces insulin) to a network view where hormone production and reception are distributed across virtually all cell types. This has immediate implications for understanding endocrine disorders — for example, identifying which cell types beyond the classical endocrine glands contribute to hormone excess or deficiency — and for predicting the side effects of hormone-based therapies by revealing which cell types in which tissues express the target receptor. The atlas is also an invaluable reference for the growing field of endocrine-immune interactions, which are increasingly recognized as important in cancer, autoimmunity, and metabolic disease.
Why for Yiru: Hormones are increasingly recognized as important modulators of the TME and anti-tumour immunity. Glucocorticoids are potent immunosuppressants used to manage immunotherapy side effects but may antagonize anti-tumour immunity. Sex hormones (oestrogen, testosterone) influence cancer risk and immune function in sex-specific ways. Metabolic hormones (insulin, leptin, adiponectin) link obesity to cancer risk and may affect TME metabolism and immune cell function. The Hormone Cell Atlas provides a systematic reference for identifying which hormone-receptor axes are active in specific TME cell types — for example, do exhausted T cells express glucocorticoid receptors at higher levels than effector T cells, explaining their selective vulnerability to stress-induced immunosuppression? Computational integration of the Hormone Cell Atlas with TME single-cell atlases could reveal underappreciated endocrine-immune axes relevant to cancer biology and therapy.
Cross-disciplinary watchlist
Other Fields
Protein language models for structural biology
Nature Computational Science Published 2026-05-28 review DOI: 10.1038/s43588-026-00993-z
protein language model structural biology deep learning protein structure prediction protein design evolutionary scale computational biology AI
Summary: Provides a comprehensive Review of how protein language models — transformer-based deep learning models trained on hundreds of millions of protein sequences — are transforming structural biology by decoding the evolutionary grammar encoded in protein sequences and enabling scalable structure prediction and design. Protein language models (pLMs) such as ESM-2, ProtTrans, and ProGen2 are trained using self-supervised learning objectives (masked language modeling, autoregressive generation) on massive protein sequence databases spanning the tree of life. Through this training, they learn rich representations that capture evolutionary, structural, and functional properties of proteins without ever being explicitly shown a protein structure. The Review covers three major application domains. First, structure prediction: pLMs achieve near-experimental accuracy in predicting protein 3D structures from sequence alone, with ESMFold processing sequences orders of magnitude faster than AlphaFold2 by embedding structural information directly in the language model rather than requiring expensive multiple sequence alignments. Second, variant effect prediction: pLMs can predict whether amino acid mutations will destabilize protein structure or disrupt function, enabling high-throughput computational mutagenesis scanning of entire proteomes. Third, protein design: generative pLMs can create novel protein sequences with desired structural and functional properties, including enzymes with enhanced catalytic activity, binding proteins with tailored specificity, and self-assembling protein nanomaterials. The authors also discuss emerging frontiers including multimodal models that integrate sequence with experimental structural data, and the application of pLMs to understanding protein dynamics, conformational ensembles, and interactions.
Why it matters: Protein language models represent one of the most impactful applications of AI to biology, on par with AlphaFold in their transformative potential. Unlike AlphaFold, which requires multiple sequence alignments and substantial compute, pLMs can make structure predictions from single sequences in milliseconds, democratizing access to structural biology for researchers studying orphan proteins, metagenomic sequences, and synthetic constructs. The ability to computationally scan all possible mutations in a protein has immediate clinical applications: interpreting variants of unknown significance in genetic testing, predicting drug resistance mutations in pathogens and cancers, and designing proteins with enhanced stability for industrial and therapeutic use. The protein design capabilities open a new paradigm — computational generation followed by experimental validation — that could dramatically accelerate enzyme engineering, therapeutic antibody design, and biomaterials development.
Why for Yiru: Protein language models are directly applicable to TME biology at multiple levels. First, pLM-based variant effect prediction could be used to systematically assess the functional impact of tumour mutations on the proteins they encode — distinguishing driver mutations that alter protein function from passenger mutations, and predicting which mutations create neoepitopes for immune recognition. Second, pLM-based protein design could be applied to engineer TME-targeted therapeutics: designing high-affinity binders to block immunosuppressive ligand-receptor interactions, engineering cytokines with tailored receptor selectivity for TME delivery, or creating conditionally active enzymes that are activated by TME-specific signals (low pH, hypoxia, tumour proteases). Third, the computational efficiency of pLMs makes them suitable for analysing the mutational landscapes of large tumour cohorts, connecting genomic variation to protein-level consequences at scale.
Programmable, multiplexed and orthogonal gene control in bacteria with attenuated Cas13d systems
Nature Biotechnology Published 2026-06-02 research article DOI: 10.1038/s41587-026-03160-x
CRISPR Cas13d gene regulation synthetic biology RNA targeting multiplexed bacteria genetic circuit
Summary: Develops an attenuated Cas13d-based RNA-targeting system that enables programmable, multiplexed, and orthogonal gene control in bacteria, functioning as a highly versatile genetic switch. CRISPR-Cas13 enzymes target RNA rather than DNA, cleaving transcripts in a sequence-specific manner without permanently altering the genome. This makes them attractive for applications requiring transient, reversible, and tunable gene regulation — but wild-type Cas13 enzymes have two major limitations: their strong collateral cleavage activity (non-specific RNA degradation after target recognition) causes cellular toxicity, and their large size limits delivery and multiplexing. The authors addressed both limitations by engineering attenuated Cas13d variants with substantially reduced collateral activity while retaining on-target RNA cleavage efficiency. These attenuated variants function as efficient gene switches: when guided by a target-specific CRISPR RNA (crRNA), they degrade the target mRNA and silence gene expression; when the crRNA is not expressed or targets a different sequence, gene expression proceeds normally. The system supports multiplexing — simultaneously controlling multiple genes using different crRNAs — and orthogonality, where different Cas13d variants with distinct crRNA specificities can independently regulate different sets of genes without cross-talk. The authors demonstrate applications including: metabolic pathway optimization by tuning expression of multiple enzymes simultaneously, bacterial kill switches for biocontainment, and inducible expression systems for dynamic control of gene expression in response to environmental signals.
Why it matters: Programmable gene regulation tools that are reversible, tunable, and multiplexable are essential for synthetic biology — for building genetic circuits, optimizing metabolic pathways, and engineering living therapeutics. DNA-targeting CRISPR systems (Cas9, Cas12a) make permanent genomic changes, which is useful for genome editing but limiting for applications requiring dynamic control. The attenuated Cas13d system fills this gap with a purely RNA-level tool that can be turned on and off without genomic scarring. The orthogonal variants are particularly important because they enable independent control of multiple genes — a requirement for building complex synthetic gene circuits that go beyond simple on/off switches. Engineered bacteria are being developed as living therapeutics for conditions ranging from inflammatory bowel disease to cancer (tumour-homing bacteria that deliver therapeutic payloads), and programmable gene regulation tools are essential for controlling their behaviour in vivo.
Why for Yiru: Engineered bacteria are an emerging modality for cancer therapy — certain bacterial strains naturally colonize tumours and can be engineered to deliver immunomodulatory payloads (cytokines, checkpoint inhibitors, enzymes that activate prodrugs) selectively within the TME. The Cas13d gene switch system could be used to program tumour-homing bacteria with sophisticated behaviours: expressing therapeutic payloads only upon sensing TME-specific signals (low oxygen, specific metabolites), turning off payload expression if bacteria disseminate to non-tumour tissues (safety switch), and coordinating expression of multiple payloads in sequence (first recruit immune cells, then activate them). The multiplexing capability is particularly relevant because effective TME remodeling likely requires coordinated delivery of multiple immunomodulatory signals — for example, simultaneously expressing a chemokine to recruit T cells and a checkpoint inhibitor to prevent their inactivation.
Accurate quantification in proteomics with QuantUMS
Nature Biotechnology Published 2026-05-27 research article DOI: 10.1038/s41587-026-03131-2
proteomics mass spectrometry quantification uncertainty estimation data-independent acquisition DIA computational biology bioinformatics
Summary: Introduces QuantUMS, a computational method that implements rigorous uncertainty estimation for protein quantification in mass spectrometry-based proteomics, addressing a long-standing limitation in the field. Mass spectrometry-based proteomics has advanced dramatically in throughput and coverage, with data-independent acquisition (DIA) methods now routinely quantifying thousands of proteins across hundreds of samples. However, protein quantification values are typically reported as point estimates — a single number representing the abundance of each protein in each sample — without any measure of uncertainty. This means that downstream statistical analyses (differential expression, clustering, machine learning) treat all quantification values as equally reliable, when in reality the precision of quantification varies dramatically across proteins depending on factors such as peptide detectability, spectral interference, signal-to-noise ratio, and missing data patterns. QuantUMS addresses this by propagating measurement uncertainty through the entire quantification pipeline: from raw fragment ion intensities, through peptide-level summarization, to protein-level abundance estimates, producing both a point estimate and a credible interval for each protein in each sample. The method uses a Bayesian hierarchical model that naturally handles missing data (peptides not detected in some samples) and accounts for both technical and biological sources of variation. The authors demonstrate that incorporating QuantUMS uncertainty estimates improves the sensitivity and specificity of differential expression analysis, reduces false discoveries in biomarker discovery, and enables more accurate integration of proteomics with other data modalities where measurement error is explicitly modeled.
Why it matters: The absence of uncertainty quantification in proteomics has been a methodological blind spot that affects virtually every downstream analysis. Differential expression analyses that treat all proteins as equally well-measured can produce misleading results — proteins with noisy measurements may appear differentially expressed simply because their abundance estimates are imprecise, while genuinely changing proteins with moderate effect sizes may be missed because their uncertainty intervals overlap. QuantUMS brings proteomics in line with other quantitative fields (transcriptomics with count-based models, genomics with genotype likelihoods) where uncertainty quantification is standard practice. As proteomics moves toward clinical applications — protein biomarkers for early cancer detection, drug target quantification, personalized therapy selection — rigorous uncertainty quantification becomes essential for regulatory approval and clinical decision-making.
Why for Yiru: Spatial proteomics is an increasingly important modality for TME characterization because proteins — the direct targets of most therapeutics — provide functional information complementary to transcriptomics. QuantUMS uncertainty estimates could be incorporated into TME spatial proteomics analyses to avoid overinterpreting noisy protein measurements and to appropriately weight proteins in integrative analyses with spatial transcriptomics or imaging data. More generally, the Bayesian hierarchical framework used by QuantUMS — propagating uncertainty from raw measurements through summarization to biological conclusions — is a template for how TME computational methods should handle the multi-level measurement error inherent in single-cell and spatial technologies. The clinical translation of TME proteomics for immunotherapy response prediction would directly benefit from the rigorous uncertainty quantification that QuantUMS provides.
Immune-competent new approach methodologies for a hybrid future
Nature Immunology Published 2026-05-29 perspective DOI: 10.1038/s41590-026-02539-x
new approach methodology NAM organ-on-chip immune-competent 3R animal alternatives tissue engineering immunology
Summary: Argues that in a shifting regulatory landscape increasingly favouring organ-on-chip and human cell-based systems over animal testing, embedding immune complexity into New Approach Methodologies (NAMs) is essential to advance beyond reductionist models toward a hybrid ecosystem that strategically integrates animal studies where necessary. Traditional drug development relies heavily on animal models — primarily mice — for efficacy and safety testing, but animal models frequently fail to predict human responses. The US FDA Modernization Act 2.0 (2022) and similar regulatory changes globally now permit drug developers to use NAMs — organoids, organ-on-chip devices, computational models — as alternatives to animal testing for certain applications. However, most current NAMs are reductionist: they model a single tissue or cell type in isolation, completely omitting the immune system, which is a critical mediator of both drug efficacy (immunotherapy, vaccines) and toxicity (cytokine release syndrome, hypersensitivity). This Perspective reviews the state of immune-competent NAMs — systems that incorporate immune cells into tissue models, including lymphoid-tissue-on-chip for vaccine testing, tumour-immune-on-chip for immunotherapy development, and multi-organ systems with circulating immune cells for systemic toxicity assessment. The authors propose a hybrid framework where immune-competent NAMs are used for early-stage screening and mechanistic studies, with animal studies reserved for late-stage confirmation of findings that cannot be adequately modeled in vitro — a strategic integration rather than wholesale replacement.
Why it matters: The regulatory shift toward accepting NAMs for drug approval is one of the most consequential changes in biomedical research in decades, with implications for drug development speed, cost, and ethical considerations around animal use. However, the current generation of NAMs risks repeating the failures of previous reductionist in vitro models if they do not incorporate the immune system — arguably the single most important mediator of drug response and toxicity. This Perspective provides a timely and balanced roadmap for the field: not naively claiming that NAMs will completely replace animals, but thoughtfully identifying where immune-competent NAMs can add the most value and where animal studies remain indispensable. The framework is also relevant to academic researchers developing disease models — incorporating immune components from the start rather than as an afterthought.
Why for Yiru: Tumour-immune-on-chip models are directly relevant to TME research — they enable controlled perturbation of TME components (specific immune cell types, cytokines, oxygen levels) and measurement of outcomes (tumour killing, immune cell infiltration, cytokine production) that are difficult or impossible to perform in vivo. Immune-competent NAMs could be used to test computational predictions about TME dynamics — for example, if a computational model predicts that blocking a specific chemokine will redirect macrophage polarization from M2 to M1, this can be tested in a tumour-immune-on-chip before proceeding to animal experiments. The hybrid framework also aligns with how computational TME research should interface with experiments: computational models generate hypotheses that are screened in NAMs, with the most promising validated in animal models, creating an efficient pipeline from computation to therapy.