TIR-NBS-LRR Domain Architectures: Evolutionary Patterns, Computational Identification, and Functional Validation in Plant Immunity

Jeremiah Kelly Nov 26, 2025 44

This comprehensive review explores the diversity, evolution, and function of TIR-NBS-LRR (TNL) domain architectures in plant disease resistance.

TIR-NBS-LRR Domain Architectures: Evolutionary Patterns, Computational Identification, and Functional Validation in Plant Immunity

Abstract

This comprehensive review explores the diversity, evolution, and function of TIR-NBS-LRR (TNL) domain architectures in plant disease resistance. Covering foundational concepts to advanced applications, we examine the evolutionary distribution of TNL genes across plant lineages, their absence in monocots, and structural variations. The article details computational methods for genome-wide identification, troubleshooting for accurate annotation, and validation through expression profiling and functional studies. Synthesizing recent genomic findings, this resource provides researchers and drug development professionals with methodological frameworks and future directions for leveraging TNL genes in crop improvement and disease resistance breeding.

Evolutionary Origins and Structural Diversity of TIR-NBS-LRR Proteins

Toll/Interleukin-1 Receptor Nucleotide-Binding Site Leucine-Rich Repeat (TNL) proteins represent a crucial class of intracellular immune receptors in plants, serving as specialized surveillance machinery that detects pathogen effector molecules and initiates robust defense signaling cascades. These proteins belong to the broader nucleotide-binding site leucine-rich repeat (NBS-LRR) family, which constitutes the largest and most functionally diverse group of plant disease resistance (R) genes [1]. TNL proteins are characterized by a distinctive tripartite domain architecture that facilitates their role in pathogen perception and immune activation. Understanding the precise molecular organization of these domains and their conserved motifs is fundamental to deciphering the mechanisms of plant innate immunity and engineering disease-resistant crops. This guide provides a comprehensive comparison of TNL domain architectures, detailing their structural components, conserved motifs, and the experimental methodologies employed in their characterization, thereby offering an essential resource for researchers investigating plant-pathogen interactions.

TNL Domain Architecture: A Tripartite Structure

The canonical TNL protein structure comprises three fundamental domains that work in concert to fulfill its immune receptor function. The N-terminal Toll/Interleukin-1 Receptor (TIR) domain is responsible for initiating downstream signaling, the central Nucleotide-Binding Site (NBS) domain acts as a molecular switch for activation, and the C-terminal Leucine-Rich Repeat (LRR) domain facilitates pathogen recognition and autoinhibition [1] [2]. This modular organization enables TNL proteins to perceive specific pathogen effectors and transduce this recognition into effective defense responses, often culminating in a hypersensitive response (HR) that limits pathogen spread at the infection site.

Table 1: Core Domains of TNL Proteins

Domain	Position	Primary Function	Key Characteristics
TIR	N-terminal	Signaling initiation	Shares homology with Drosophila Toll and mammalian IL-1 receptors; forms homodimers
NBS (NB-ARC)	Central	Molecular switch & nucleotide binding	Binds and hydrolyzes ATP; contains conserved kinase motifs; regulates activation state
LRR	C-terminal	Pathogen recognition & autoinhibition	Highly variable; mediates protein-protein interactions; determines recognition specificity

Beyond the typical TNL structure, genomic studies have identified related variants with distinct domain compositions. For instance, in Nicotiana benthamiana, researchers have characterized not only full-length TNLs but also truncated forms classified as TN-type (TIR-NBS), which lack the LRR domain [3]. These irregular-type NBS-LRR proteins are hypothesized to function as adaptors or regulators for their typical counterparts, adding complexity to the plant immune network [3].

Conserved Motifs and Signature Sequences

Within each major domain of TNL proteins, highly conserved sequence motifs mediate critical biochemical functions, particularly within the NBS domain where nucleotide binding and hydrolysis occur. These motifs serve as signatures for identifying TNL genes and distinguishing them from their CNL (CC-NBS-LRR) counterparts through bioinformatic analyses [2] [4].

Table 2: Conserved Motifs in TNL NBS Domains

Motif Name	Consensus Sequence (TNL-specific)	Functional Role	Subfamily Specificity
P-loop/Kinase 1a	GxGKT/S	ATP/GTP binding	Common to both TNL and CNL
RNBS-A	FLENIRExSKKHGLEHLQKKLLSKLL	Structural stability	Diagnostic for TNL [5]
Kinase-2	LLVLDDVD	ATP hydrolysis	Diagnostic (final Asp for TNL) [5]
RNBS-C	Not specified	Unknown function	Distinct in TNL vs. CNL [1]
RNBS-D	FLHIACFF	Structural role	Diagnostic for TNL [5]
GLPL	CxGLPLA/GLK	Protein interaction	Common to both TNL and CNL

The kinase-2 motif deserves special attention as its final residue provides a key diagnostic feature for distinguishing TNL from CNL proteins. TNL sequences consistently contain an aspartic acid (D) at this position, forming the "LLVLDDVD" signature, whereas CNL proteins typically feature a tryptophan (W) instead, resulting in "LLVLDDVW" [5]. This subtle but consistent difference enables reliable classification of NBS-LRR proteins through sequence analysis alone.

Comparative Genomic Distribution of TNL Genes

TNL genes demonstrate remarkable variation in their representation across plant lineages, reflecting distinct evolutionary paths in different taxonomic groups. Comprehensive genomic analyses reveal that TNLs are present in bryophytes, gymnosperms, and eudicots but are conspicuously absent from monocot genomes, with the exception of basal angiosperms like Amborella trichopoda [5] [6]. This distribution pattern suggests that TNL sequences were present in early land plants but have been significantly reduced or lost in monocot and magnoliid lineages [5].

Recent genome-wide studies illustrate this variation in specific species:

Nicotiana benthamiana: 5 TNL-type genes identified among 156 NBS-LRR homologs [3]
Capsicum annuum (pepper): Only 4 TNL genes identified among 252 NBS-LRR resistance genes [2]
Gossypium hirsutum (cotton): 122 TNL genes identified from 437 NBS-LRR genes [4]
Fragaria species (wild strawberries): TNLs present but outnumbered by non-TNL types in all eight diploid species examined [6]

This uneven distribution highlights the dynamic evolution of TNL genes and suggests that different plant families have employed distinct strategies for pathogen recognition, with some lineages expanding their TNL repertoires while others have preferentially amplified CNL-type receptors.

Experimental Protocols for TNL Characterization

Genome-Wide Identification Pipeline

The standard workflow for identifying and characterizing TNL genes combines bioinformatic predictions with experimental validation:

HMMER Search: Perform HMMsearch using the NB-ARC (PF00931) domain model from Pfam database with expectation value (E-values < 1*10â»Â²â°) against the target genome [3] [7].
Domain Verification: Confirm identified sequences using SMART tool and conserved domain database (CDD) to verify presence of TIR, NBS, and LRR domains [3].
Motif Analysis: Identify conserved motifs using MEME suite with motif count set to 10 and width lengths from 6-50 amino acids [3] [2].
Subcellular Localization: Predict localization using CELLO v.2.5 and Plant-mPLoc tools [3].
Gene Structure Analysis: Determine exon-intron organization using GFF3 annotation files and visualization with TBtools [3].
Cis-Element Analysis: Identify regulatory elements in promoter regions (1500-2000 bp upstream of ATG) using PlantCARE database [3] [8].

Functional Characterization Approaches

Several experimental methods enable functional analysis of TNL proteins:

Heterologous Expression: Express TNL genes in susceptible genotypes to validate function, as demonstrated by improved resistance to Pseudomonas syringae in Arabidopsis thaliana expressing maize NBS-LRR genes [7].
Virus-Induced Gene Silencing (VIGS): Knock down TNL expression to confirm necessity for resistance, as shown in cotton where silencing reduced resistance to Verticillium dahliae [7].
Allelic Mutagenesis: Introduce mutations in conserved motifs to determine their functional significance, as evidenced by premature senescence in wheat with mutated NBS-LRR genes [9].
In vitro Assays: Perform leaf inoculation assays with pathogens like Botrytis cinerea to correlate TNL presence with resistance levels across different genotypes [6].

TNL Activation and Characterization Pathway

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents for TNL Studies

Reagent/Resource	Primary Function	Application Example	Source/Reference
Pfam PF00931	NB-ARC domain HMM profile	Identification of NBS-containing genes	Pfam Database [3]
Pfam PF01582	TIR domain HMM profile	Verification of TIR domain presence	Pfam Database [2]
MEME Suite	Conserved motif discovery	Identification of P-loop, kinase-2, GLPL motifs	[3] [2]
PlantCARE	Cis-element prediction	Analysis of promoter regulatory elements	[3] [8]
CELLO v.2.5	Subcellular localization prediction	Determining cytoplasmic/nuclear localization	[3]
MCScanX	Gene duplication analysis	Identifying tandem and segmental duplications	[7] [6]
OrthoFinder	Orthogroup analysis	Comparing NLR genes across species	[8]

The comprehensive analysis of TNL architecture reveals a sophisticated immune receptor system whose functionality emerges from the precise arrangement and interaction of its core domains and conserved motifs. The integrated approach combining bioinformatic identification, phylogenetic analysis, motif characterization, and functional validation provides a powerful framework for deciphering TNL structure-function relationships. As genomic resources continue to expand across diverse plant species, comparative analyses of TNL genes will further illuminate their evolutionary dynamics and functional specialization. The research tools and methodologies outlined in this guide offer a foundation for systematic investigation of TNL proteins, accelerating discoveries in plant immunity and facilitating the development of novel disease control strategies in agriculture. Future research focusing on the structural basis of TNL activation and signaling will undoubtedly yield new insights into the molecular mechanisms governing plant-pathogen interactions.

The Toll/Interleukin-1 Receptor-Nucleotide-Binding Site-Leucine-Rich Repeat (TIR-NBS-LRR or TNL) class of plant disease resistance (R) genes represents a crucial component of the plant immune system, enabling recognition of diverse pathogens and triggering robust defense responses [10] [1]. Despite their functional importance, these genes exhibit a strikingly uneven distribution across the plant kingdom. A well-documented pattern in plant evolutionary biology is the predominant presence of TNL genes in dicotyledonous plants (dicots) and their conspicuous absence or extreme rarity in monocotyledonous plants (monocots) [5] [11] [1]. This comparative guide objectively analyzes the experimental evidence underpinning this phylogenetic distribution, providing researchers and drug development professionals with a synthesized overview of supporting data, methodologies, and implications for plant immunity research.

Comparative Genomic Analysis of TNL Distribution

Table 1: Genomic Distribution of TNL Genes Across Plant Species

Plant Species	Classification	Total NBS-LRR Genes Identified	TNL Genes Identified	Key Study Findings	Citation
Arabidopsis thaliana	Dicot (Eudicot)	~150	62 (of 150 NBS-LRRs)	One of two major NBS-LRR subfamilies; forms distinct clade from CNLs.	[1]
Chinese Cabbage (Brassica rapa)	Dicot (Eudicot)	Not Specified	90	Genes physically mapped to chromosomes; expansion due to whole-genome triplication.	[12]
Tung Tree (Vernicia montana)	Dicot (Eudicot)	149	12 (3 TNL, 7 TN, 2 CC-TIR-NBS)	TIR domains present, confirming retention in eudicots.	[13]
Cassava (Manihot esculenta)	Dicot (Eudicot)	228	34	TIR-containing genes identified among NBS-LRR repertoire.	[14]
Wild Strawberry (Fragaria spp.)	Dicot (Eudicot)	Varies by species	Present (Proportion < Non-TNLs)	Non-TNLs constitute >50% of NLRs, but TNLs are consistently present.	[6]
Rice (Oryza sativa)	Monocot (Cereal)	>600	0 (or nearly 0)	TIR-domain coding genes are present but have diverged from NBS-LRR genes.	[11]
Vernicia fordii	Dicot (Eudicot)	90	0	A rare documented case of TNL loss within a eudicot species.	[13]
Various Monocots (Poales, Zingiberales, etc.)	Monocot	Not Specified	0	PCR and database searches across five monocot orders failed to find TNL sequences.	[5]

The data in Table 1 demonstrates a clear phylogenetic trend: TNL genes are a standard, often expanded, component of the immune repertoire in dicots, whereas they are consistently missing from the genomes of monocots, particularly cereals. An exceptional case is the susceptible tung tree (Vernicia fordii), which has lost its TNL genes, unlike its resistant relative [13]. This loss correlates with susceptibility to Fusarium wilt, suggesting a potential fitness cost or functional redundancy.

Key Experimental Methodologies for Investigating TNL Phylogeny

Research into the distribution of TNL genes relies on a combination of bioinformatic and molecular biology techniques. Below are the detailed protocols for the key methodologies cited in the comparative studies.

Genome-Wide Identification and Domain Analysis

This bioinformatic approach is the standard for comprehensively cataloging NBS-LRR genes in sequenced genomes [13] [14] [6].

Data Retrieval: Obtain the complete proteome and genome annotation file (GFF/GTF) for the target species from public databases (e.g., Phytozome, NCBI, BRAD).
HMMER Search: Use the HMMER software suite (e.g., hmmsearch) with a pre-built Hidden Markov Model (HMM) for the NB-ARC (NBS) domain (Pfam: PF00931) to scan the proteome. An E-value cutoff (e.g., < 0.01 or < 1x10â»Â²â°) is applied for initial candidate selection [14] [6].
Domain Annotation: Subject the candidate sequences to further domain analysis using tools like PfamScan, SMART, and NCBI's CD-Search to identify associated domains (TIR: PF01582, CC, LRR: various Pfams) [14] [6].
Coiled-Coil Prediction: Since CC domains are not always identified by Pfam, use tools like COILS or Paircoil2 with a specific probability cutoff (e.g., 0.03) to predict their presence [14] [6].
Classification and Curation: Classify genes into subgroups (TNL, CNL, NL, etc.) based on their domain architecture. Manual curation is essential to remove false positives, such as genes with partial kinase domains.

Degenerate PCR and Sequence Analysis

This molecular method is used to survey species without a sequenced genome or to validate genomic findings [5] [15].

Primer Design: Design degenerate primers targeting conserved motifs within the NBS domain, such as the P-loop (kinase-1a) and the GLPL or MHD motifs [5] [11].
PCR Amplification: Perform PCR on genomic DNA using touchdown or standard cycling protocols to allow for primer degeneracy.
Cloning and Sequencing: Clone the resulting PCR products (~500-1000 bp) into a plasmid vector, transform bacteria, and sequence multiple clones to capture diversity.
Sequence Classification:
- Translate DNA sequences into amino acid sequences.
- Perform a BLAST search against databases (e.g., GenBank non-redundant) and a conserved domain search to confirm NBS identity.
- Classify sequences as TIR- or non-TIR-type based on key residues in conserved motifs, particularly the final amino acid of the kinase-2 motif (TIR-type: LLVLDDVD; non-TIR-type: LLVLDDVW) [5].

Phylogenetic and Evolutionary Analysis

This process determines the evolutionary relationships between resistance genes [5] [6].

Sequence Alignment: Extract the NBS domain region from full-length protein sequences. Perform a multiple sequence alignment using tools like MAFFT or ClustalW.
Tree Construction: Construct a phylogenetic tree using Maximum-Likelihood (e.g., with IQ-TREE or MEGA6) or Parsimony methods. Include sequences from known dicot TNLs and CNLs as references.
Evolutionary Rate Analysis: For orthologous gene pairs, calculate the ratio of non-synonymous to synonymous substitutions (Ka/Ks) using tools like KaKs_Calculator. A Ka/Ks > 1 indicates positive selection.

Visualizing Experimental and Evolutionary Pathways

Workflow for TNL Phylogenetic Analysis

The following diagram illustrates the logical workflow for a typical study investigating the presence and evolution of TNL genes, integrating the methodologies described above.

Evolutionary History of TNL and Non-TNL Genes

This diagram summarizes the current understanding of the evolutionary trajectory of NBS-LRR genes in land plants, explaining the observed distribution.

Table 2: Essential Materials for TNL Phylogenetic and Functional Studies

Reagent/Resource	Function/Application	Example Use Case
HMMER Software Suite	Scans protein sequences for NB-ARC and other domains using profile hidden Markov models.	Initial identification of NBS-encoding genes from a whole proteome [14].
Pfam Database	Repository of protein family HMMs (e.g., NB-ARC PF00931, TIR PF01582).	Curated models for domain annotation and gene classification [10] [6].
Degenerate Primers	Amplifies diverse NBS-LRR gene fragments from genomic DNA where sequence info is limited.	Surveying TNL presence/absence across diverse monocot orders [5].
Virus-Induced Gene Silencing (VIGS)	Functional validation tool to knock down candidate gene expression in plants.	Demonstrating the role of a specific NBS gene (GaNBS) in virus resistance [10].
OrthoFinder	Infers orthogroups and gene families from whole proteome data.	Evolutionary analysis of NBS genes across multiple species to identify core and lineage-specific groups [10].
RNA-seq Data	Profiling gene expression under different conditions (tissue, stress).	Identifying NBS-LRR genes upregulated in response to pathogen infection [10] [12].

The TIR-NBS-LRR (TNL) gene family, one of the largest plant disease resistance gene families, exhibits remarkable evolutionary dynamism across plant lineages. Through comparative genomic analyses, researchers have uncovered that independent duplication and loss events are the primary drivers of the diverse evolutionary patterns observed in this gene family. This guide synthesizes experimental data and bioinformatics methodologies to objectively compare the expansion and contraction of TNL genes across multiple plant species, particularly within the economically important Rosaceae family. The findings reveal that lineage-specific evolutionary pressures have shaped distinct TNL repertoires, influencing species' adaptive immune capacities against rapidly evolving pathogens.

Plant nucleotide-binding site leucine-rich repeat (NBS-LRR) genes constitute one of the largest and most variable gene families in plants, playing crucial roles in pathogen recognition and defense activation [1]. These genes are categorized into subfamilies based on their N-terminal domains, with TIR-NBS-LRR (TNL) and CC-NBS-LRR (CNL) representing the two major classes [1] [16]. TNL genes are characterized by the presence of a Toll/interleukin-1 receptor (TIR) domain at the N-terminus, which is involved in signal transduction during immune responses [17] [1].

The evolution of NBS-LRR genes follows a birth-and-death model characterized by frequent gene duplications and losses, resulting in significant variation in gene number and composition across species [1]. This dynamic evolutionary process generates the diversity needed for plants to recognize rapidly evolving pathogens. Lineage-specific expansions and contractions of TNL genes reflect adaptation to distinct pathogenic environments and contribute to species-specific resistance mechanisms [18] [19].

This guide provides a comprehensive comparison of TNL gene family evolution across plant lineages, with emphasis on methodological approaches, quantitative expansion/contraction patterns, and functional implications for disease resistance breeding.

Methodological Framework: Analyzing Gene Family Evolution

Core Bioinformatics Pipeline

Genome-wide identification of TNL genes follows a standardized bioinformatics workflow combining multiple complementary approaches:

Hidden Markov Model (HMM) Searches: The NB-ARC domain (PF00931) from Pfam database serves as the primary query to identify candidate NBS-LRR genes using HMMER software with expectation values (E-value) typically set at < 1.0 or more stringent thresholds (< 1e-20) [18] [3]. Additional searches employ TIR (PF01582), CC, and LRR domain models.
Domain Verification and Classification: Candidate genes undergo further validation using PfamScan, NCBI-CDD, and SMART tools to confirm domain architecture [17] [18] [6]. TNL classification requires presence of TIR, NBS, and LRR domains. Genes are categorized based on domain combinations into TNL, TN, CNL, CN, NL, and N types [3].
Manual Curation and Redundancy Removal: Redundant hits from different search methods are consolidated, and sequences are manually verified to ensure complete domain architecture and remove fragments [6].

Table 1: Key Bioinformatics Tools for TNL Identification and Analysis

Tool Category	Specific Tools	Primary Function	Key Parameters
Domain Search	HMMER v3.1, PfamScan	Identify conserved domains	E-value < 1.0 to < 1e-20
Domain Verification	SMART, NCBI-CDD, Pfam	Confirm domain architecture	E-value < 0.01
Motif Identification	MEME Suite	Discover conserved motifs	Maximum motifs: 10-20
Phylogenetic Analysis	IQ-TREE, MEGA7, OrthoFinder	Construct evolutionary trees	Bootstrap replicates: 1000
Gene Cluster Analysis	MCScanX, TBtools	Identify tandem duplications	Window size: 100-200 kb

Evolutionary Analysis Methods

Several computational approaches enable quantitative assessment of TNL gene family evolution:

Phylogenetic Reconstruction: Multiple sequence alignment of NBS domains using MAFFT followed by phylogenetic tree construction with IQ-TREE or MEGA7 using maximum likelihood method with 1000 bootstrap replicates [6] [3].
Orthogroup Analysis: OrthoFinder implementation using DIAMOND for sequence similarity searches and MCL clustering algorithm to identify groups of orthologous genes across species [10].
Synonymous (Ks) and Non-synonymous (Ka) Substitution Analysis: Calculation of Ka/Ks ratios (Ï‰) using codeML or similar methods to detect selection pressures, with Ï‰ < 1 indicating purifying selection, Ï‰ = 1 indicating neutral evolution, and Ï‰ > 1 indicating positive selection [19] [6].
Gene Cluster Identification: Physical clustering defined as at least two NLR genes located within 200 kb region and separated by no more than eight non-NLR genes [6].

The following diagram illustrates the core bioinformatics workflow for TNL gene identification and evolutionary analysis:

Comparative Evolutionary Patterns Across Plant Lineages

TNL Distribution Across Major Plant Groups

The presence and abundance of TNL genes varies dramatically across plant lineages, reflecting distinct evolutionary trajectories:

Monocots vs. Dicots: Comprehensive analyses across multiple monocot orders (Poales, Zingiberales, Arecales, Asparagales, and Alismatales) reveal a conspicuous absence of TNL genes in monocots, while they are prevalent in dicots and gymnosperms [5]. This suggests significant loss of TNLs in the monocot lineage, with retention of only non-TNL types.
Basal Angiosperms: TNL sequences are present in basal angiosperms like Amborella trichopoda and Nuphar advena, indicating that TNL genes were present in early land plants but underwent significant reduction in monocots and magnoliids [5].
Species-Specific Patterns: Within dicot families, substantial variation in TNL abundance exists. For example, pepper (Capsicum annuum) contains only 4 TNL genes among 252 NBS-LRR genes [16], while apple possesses 219 TNL genes out of 748 NBS-LRR genes [19].

Table 2: Evolutionary Patterns of NBS-LRR Genes Across Plant Lineages

Plant Group/Species	Total NLR Genes	TNL Count (%)	CNL Count (%)	Evolutionary Pattern
Monocots (general)	Variable	0 (0%)	Majority	TNL gene loss
Basal Angiosperms	Limited data	Present	Present	Ancestral retention
Rosaceae (family)	2188	26 ancestral	69 ancestral	Independent duplication/loss
Apple (M. domestica)	748	219 (29.3%)	529 (70.7%)	"Continuous expansion"
Strawberry (F. vesca)	144	23 (16.0%)	121 (84.0%)	"Expansion, contraction, re-expansion"
Peach (P. persica)	354	128 (36.2%)	226 (63.8%)	"Early expansion, abrupt shrinking"
Pepper (C. annuum)	252	4 (1.6%)	248 (98.4%)	Strong TNL contraction
Tobacco (N. benthamiana)	156	5 (3.2%)	151 (96.8%)	TNL contraction

Expansion and Contraction Patterns in Rosaceae

The Rosaceae family provides an excellent model for studying TNL evolution due to available genomes from diverse species and varying life histories (herbaceous vs. woody perennial). Research encompassing 12 Rosaceae genomes identified 2188 NBS-LRR genes, with evolutionary analysis revealing 26 ancestral TNL genes and 69 ancestral CNL genes that underwent independent duplication and loss events during Rosaceae diversification [18].

Distinct evolutionary patterns have been characterized across Rosaceae species:

Rosa chinensis exhibits a "continuous expansion" pattern, with recent duplications significantly contributing to TNL gene numbers [18].
Fragaria vesca (woodland strawberry) shows a "expansion followed by contraction, then a further expansion" pattern [18]. Strawberry contains relatively few TNL genes (23 out of 144 NBS-LRR genes, or 16%) compared to other Rosaceae species [19].
Three Prunus species (peach, mei, apricot) and three Maleae species (apple, pear) shared a "early sharp expanding to abrupt shrinking" pattern [18].
Rubus occidentalis, Potentilla micrantha, Fragaria iinumae and Gillenia trifoliata displayed a "first expansion and then contraction" evolutionary pattern [18].

A comparative analysis of five Rosaceae fruit species (F. vesca, M. domestica, P. bretschneideri, P. persica, and P. mume) found that species-specific duplication has mainly contributed to NBS-LRR gene expansion, with 61.81% of strawberry, 66.04% of apple, 48.61% of pear, 37.01% of peach, and 40.05% of mei NBS-LRR genes derived from species-specific duplication [19].

The following diagram illustrates the evolutionary relationships and expansion patterns of TNL genes across major plant lineages:

Molecular Evolutionary Dynamics

Evolutionary Rates and Selection Pressures

Comparative analyses of TNL and non-TNL genes reveal distinct evolutionary dynamics:

Faster evolution of TNLs: In four of five Rosaceae species studied, TNLs exhibited significantly greater Ks values and Ka/Ks ratios compared to non-TNLs, suggesting more rapid evolution and stronger selective pressures [19]. Most NBS-LRR genes show Ka/Ks ratios less than 1, indicating evolution primarily under purifying selection [19].
Differential selection between subfamilies: Analysis of eight diploid wild strawberry species revealed a significantly higher number of non-TNLs under positive selection compared to TNLs, indicating their rapid diversification [6]. Non-TNLs also demonstrated shorter gene structures and higher expression levels than TNLs [6].
Domain-specific selection: The LRR domain exhibits evidence of diversifying selection with elevated ratios of non-synonymous to synonymous nucleotide substitutions, particularly in solvent-exposed residues of Î²-sheets, suggesting adaptation for pathogen recognition [1]. In contrast, the NBS domain is subject to purifying selection but not frequent gene-conversion events [1].

Genomic Distribution and Cluster Analysis

TNL genes display non-random genomic distribution patterns that influence their evolution:

Gene clustering: In pepper, 54% of NBS-LRR genes form 47 physical clusters distributed across all chromosomes, with the highest density on chromosome 3 [16]. Similar clustering patterns are observed in apple, with clusters often containing members from the same gene subfamily, though some clusters contain genes from different subfamilies [16].
Tandem duplications: In Rosaceae species, tandem duplications represent a major mechanism for NBS-LRR gene expansion. Apple possesses the highest number of gene families (107) while strawberry has the fewest (12) [19]. The proportion of multi-gene families correlates with species-specific duplication rates.
Chromosomal distribution: Analysis of Perilla citriodora 'Jeju17' revealed 535 NBS-LRR genes with clusters on chromosomes 2, 4, and 10, while a unique RPW8-type R-gene was located on chromosome 7 [20]. This uneven distribution reflects the localized nature of gene duplication events.

Functional Correlations and Experimental Validation

Expression Profiling and Disease Resistance

Functional studies connecting TNL evolution to disease resistance outcomes:

In Rosa chinensis, transcriptome analysis revealed that RcTNL genes were dominantly expressed in leaves and responded to hormones (gibberellin, jasmonic acid, salicylic acid) and fungal pathogens (Botrytis cinerea, Podosphaera pannosa, and Marssonina rosae) [17]. RcTNL23 showed significant upregulation in response to three hormones and three pathogens, suggesting its importance in disease resistance [17].
In wild strawberries, species with higher proportions of non-TNLs (Fragaria pentaphylla and Fragaria nilgerrensis) exhibited significantly greater resistance to Botrytis cinerea compared to Fragaria vesca, which has the lowest proportion of non-TNLs [6]. This correlation suggests non-TNLs contribute substantially to pathogen defense despite the emphasis on TNL evolution in many studies.
Functional validation via virus-induced gene silencing (VIGS) of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus titering, providing experimental evidence for the functional importance of specific NBS genes in disease resistance [10].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents and Resources for TNL Evolutionary Studies

Reagent/Resource	Specific Example	Application in TNL Research
Genome Databases	Genome Database for Rosaceae (GDR), Phytozome, NCBI	Source of genome sequences and annotations for comparative analysis
Domain Databases	Pfam, SMART, NCBI-CDD	Identification and verification of TIR, NBS, LRR domains
HMM Profiles	NB-ARC (PF00931), TIR (PF01582)	Hidden Markov Models for domain identification
Sequence Alignment	MAFFT, ClustalW	Multiple sequence alignment for phylogenetic analysis
Phylogenetic Software	IQ-TREE, MEGA7, OrthoFinder	Evolutionary relationship reconstruction
Motif Discovery	MEME Suite	Identification of conserved protein motifs
Gene Cluster Analysis	MCScanX, TBtools	Identification of tandem duplications and syntenic regions
Expression Databases	IPF Database, CottonFGD	Tissue-specific and stress-responsive expression patterns
Functional Validation	VIGS (Virus-Induced Gene Silencing)	Experimental verification of gene function in disease resistance

The evolutionary patterns of TNL gene families demonstrate remarkable lineage-specificity, driven primarily by species-specific duplication and loss events. The comparative analysis presented here reveals that:

Evolutionary trajectories are highly lineage-dependent, with some species exhibiting continuous expansion (Rosa chinensis), while others show patterns of expansion and contraction (Fragaria vesca) or early expansion followed by abrupt shrinking (Prunus species).
Differential evolution between TNL and CNL subfamilies is evident across multiple plant families, with TNLs generally evolving faster in Rosaceae species but being completely lost in monocot lineages.
Functional correlations exist between evolutionary patterns and disease resistance, with species-specific TNL expansions potentially enhancing adaptive immunity to localized pathogen pressures.

Future research directions should include more comprehensive functional characterization of lineage-specific TNL clusters, investigation of the mechanisms driving TNL loss in monocots, and exploration of how evolutionary patterns translate to functional diversity in pathogen recognition. The integration of pan-genomic approaches will further refine our understanding of TNL gene family evolution and its implications for developing disease-resistant crops through informed breeding strategies.

Structural variations (SVs) represent a class of genomic alterations involving segments of DNA that are 50 base pairs or larger, including insertions, deletions, duplications, inversions, and translocations [21] [22] [23]. In plant genomes, these large-scale genomic rearrangements are now recognized as a major driver of genetic diversity, influencing phenotypes ranging from disease resistance to environmental adaptation [22] [23]. Among the most significant functional outcomes of structural variation in plants is the creation of diverse domain architectures within nucleotide-binding site leucine-rich repeat (NBS-LRR) genes, which constitute the largest family of plant disease resistance genes [10] [24].

The NBS-LRR genes (also called NLR genes) encode modular proteins typically composed of three fundamental domains: an variable N-terminal domain, a central nucleotide-binding adaptor (NBS or NB-ARC) domain, and a C-terminal leucine-rich repeat (LRR) region [10] [6]. These genes are categorized into distinct subfamilies based on their N-terminal domains: TIR-NBS-LRR (TNL) containing a Toll/interleukin-1 receptor domain, CC-NBS-LRR (CNL) containing a coiled-coil domain, and RPW8-NBS-LRR (RNL) containing a Resistance to Powdery Mildew 8 domain [10] [6] [25]. The structural variation affecting these genes creates remarkable diversity in domain arrangements, encompassing both classical architectures that are widely conserved across plant lineages and species-specific configurations that may confer specialized resistance capabilities [10].

Recent studies have revealed that structural variations affecting NBS-LRR genes can substantially alter gene function through several mechanisms: changing gene dosage via copy number variations, creating novel chimeric genes through fusion events, interrupting functional domains, or modifying regulatory sequences that control gene expression [22]. This comprehensive analysis examines the spectrum of classical and species-specific domain arrangements resulting from structural variation, their distribution across plant lineages, functional implications for disease resistance, and the experimental approaches used to characterize them.

Classical Domain Architectures and Evolutionary Patterns

Classical NBS-LRR domain architectures represent the conserved structural patterns observed across multiple plant families. Large-scale comparative genomic analyses have identified several such architectures that form the core of the plant immune receptor repertoire. A recent pan-species investigation identified 12,820 NBS-domain-containing genes across 34 plant species ranging from mosses to monocots and dicots, classifying them into 168 distinct architectural classes [10]. Among these, several classical patterns emerged, including NBS, NBS-LRR, TIR-NBS, TIR-NBS-LRR, CC-NBS, and CC-NBS-LRR [10].

The evolutionary distribution of these classical architectures reveals significant patterns across plant lineages. TNL-type genes are present in bryophytes, gymnosperms, and eudicots, but are notably rare or absent in most monocots [5]. Research examining five monocot orders (Poales, Zingiberales, Arecales, Asparagales, and Alismatales) found no TIR-NBS-LRR sequences, suggesting that although these sequences were present in early land plants, they have been significantly reduced in monocots and magnoliids [5]. In contrast, CNL-type genes appear across all major plant lineages, including monocots, suggesting their fundamental conservation in plant immunity [5] [6].

Table 1: Distribution of Classical NBS-LRR Domain Architectures Across Major Plant Lineages

Domain Architecture	Bryophytes	Gymnosperms	Monocots	Eudicots	Key Features
TIR-NBS-LRR (TNL)	Present [5]	Present [5]	Rare/Absent [5]	Present [5] [6]	TIR domain mediates signaling; homogeneous sequences [5]
CC-NBS-LRR (CNL)	Present	Present	Present [5] [6]	Present [6] [24]	CC domain; heterogeneous sequences form multiple clades [5]
RPW8-NBS-LRR (RNL)	Information Limited	Information Limited	Present [25]	Present [6]	RPW8 domain; helper function in immunity [6]
NBS-LRR (NL)	Present	Present	Present	Present	Lacks distinctive N-terminal domain [24]

The structural conservation within these classical architectures is maintained by specific functional constraints. The central NBS domain contains highly conserved motifs including the P-loop, GLPL, MHD, and Kinase-2 motifs, which are critical for nucleotide binding and hydrolysis [10] [25]. The Kinase-2 motif is particularly noteworthy as its final amino acid residue serves as a diagnostic feature for classifying NBS sequences as TIR-type (typically ending with aspartic acid) or non-TIR-type (typically ending with tryptophan) [5]. The LRR domains, while more variable, provide specificity in pathogen recognition through protein-ligand and protein-protein interactions [24] [26].

Table 2: Conserved Motifs in Classical NBS Domain Architectures

Motif Name	Consensus Sequence	Functional Role	Location in NBS Domain
P-loop	Not specified in sources	Nucleotide binding	N-terminal region
Kinase-2	TIR: LLVLDDVD; non-TIR: LLVLDDVW [5]	Hydrolytic function	Central region
RNBS-A	TIR: FLENIRExSKKHGLEHLQKKLLSKLL; non-TIR: FDLxAWVCVSQxF [5]	Structural stability	Between P-loop and Kinase-2
RNBS-D	TIR: FLHIACFF; non-TIR: CFLYCALFPED [5]	Structural stability	Between Kinase-2 and MHD
MHD	Not specified in sources	Regulation of nucleotide state	C-terminal region
GLPL	Not specified in sources	Structural role	C-terminal region

Species-Specific Domain Arrangements and Novel Architectures

Beyond the classical architectures, numerous species-specific and novel domain arrangements have emerged through lineage-specific structural variations, expanding the functional repertoire of plant immune receptors. These unusual configurations often arise from domain shuffling, fusion events, and the gain or loss of protein domains [10].

In cultivated peanut (Arachis hypogaea cv. Tifrunner), researchers identified an unusual TIR-CC-NBS-LRR architecture where both TIR and CC domains coexist in 26 NBS-LRR proteins [26]. This configuration is particularly noteworthy because TNL and CNL genes were previously thought to have distinct evolutionary origins, and no sequences containing both TIR and CC domains were found in the diploid ancestors (A. duranensis and A. ipaensis) of cultivated peanut [26]. This suggests that genetic exchange or gene rearrangement following tetraploidization facilitated the fusion of these typically distinct domains. Additionally, three sequences were found to contain NBS-WRKY fusion proteins, where an NBS domain is combined with a WRKY transcription factor domain, potentially creating direct pathways from pathogen recognition to transcriptional regulation [26].

The comprehensive analysis across 34 plant species revealed several striking species-specific domain patterns, including TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, and Sugartr-NBS architectures [10]. These unusual configurations demonstrate how structural variation can create novel gene fusions that potentially connect pathogen recognition with diverse biochemical functions. For instance, the fusion of NBS domains with Cupin1 domains (associated with metabolic enzymes) or Prenyltransf domains (involved in prenylation reactions) may represent mechanisms for directly linking pathogen detection with metabolic responses [10].

In the tung tree (Vernicia species), comparative analysis between susceptible V. fordii and resistant V. montana revealed significant species-specific differences in NBS-LRR domain architectures [24]. While V. fordii completely lacked TIR domains in its NBS-LRR genes, V. montana contained 12 VmNBS-LRRs with TIR domains (8.1% of its total NBS-LRR repertoire), including three TIR-NBS-LRR genes and two CC-TIR-NBS genes with both CC and TIR domains [24]. This discrepancy suggests that lineage-specific domain loss events may contribute to differences in disease susceptibility between related species.

Table 3: Notable Species-Specific Domain Arrangements in Plant NBS-LRR Genes

Species	Novel Domain Architecture	Potential Functional Significance	Reference
Multiple species	TIR-NBS-TIR-Cupin1-Cupin1	Links pathogen recognition with metabolic functions via Cupin domain [10]	[10]
Multiple species	TIR-NBS-Prenyltransf	Connects pathogen sensing with prenylation pathways [10]	[10]
Multiple species	Sugar_tr-NBS	Fuses sugar transporter domain with NBS domain [10]	[10]
Arachis hypogaea (peanut)	TIR-CC-NBS-LRR	Fusion of two normally distinct N-terminal domains [26]	[26]
Arachis hypogaea (peanut)	NBS-WRKY	Direct coupling of pathogen recognition and transcriptional regulation [26]	[26]
Vernicia montana (tung tree)	CC-TIR-NBS	Combination of CC and TIR domains in resistant species [24]	[24]

The functional implications of these novel architectures remain largely unexplored, but they represent fascinating evolutionary experiments in plant immunity. The fusion of NBS domains with various functional domains may create receptors with integrated recognition and response capabilities, potentially enabling more rapid or specialized defense reactions against pathogens.

Comparative Genomic Analyses and Detection Methodologies

The identification and characterization of structural variations in NBS-LRR genes relies on sophisticated bioinformatic pipelines and comparative genomic approaches. This section outlines the key methodological frameworks and analytical techniques used to detect and classify classical and species-specific domain arrangements.

Genome-Wide Identification of NBS-LRR Genes

The standard pipeline for comprehensive identification of NBS-LRR genes combines multiple complementary approaches to ensure sensitive detection while minimizing false positives [10] [6] [25]. The typical workflow begins with Hidden Markov Model (HMM) searches using the conserved NB-ARC domain (Pfam: PF00931) as a query against proteome or genome datasets, often with an E-value cutoff of < 1e-5 [10] [6] [25]. This is complemented by BLAST-based searches using reference NLR protein sequences from well-characterized species such as Arabidopsis thaliana, Oryza sativa, or related taxa, applying stringent E-value cutoffs (typically 1e-10) [6] [25]. Candidate sequences identified through these methods are then subjected to domain architecture validation using tools like InterProScan, NCBI's Batch CD-Search, or SMART to confirm the presence and arrangement of NBS, TIR, CC, RPW8, and LRR domains [6] [24] [25]. Additional domains are identified through similar domain-based searches against Pfam and related databases [10].

Orthogroup Analysis and Evolutionary Comparisons

To understand the evolutionary relationships of NBS-LRR genes across species, researchers employ orthogroup analysis using tools such as OrthoFinder [10] [25]. This approach clusters genes into orthogroups (OGs) representing groups of genes descended from a single gene in the last common ancestor. A comprehensive study identified 603 orthogroups across 34 plant species, with some core orthogroups (e.g., OG0, OG1, OG2) being widely distributed across multiple species, while unique orthogroups (e.g., OG80, OG82) were highly specific to particular lineages [10]. This analysis helps distinguish evolutionarily conserved NBS-LRR genes from those that have undergone lineage-specific expansion or diversification.

Identification of Gene Clusters and Tandem Duplications

NBS-LRR genes frequently exhibit clustered genomic arrangements, often resulting from tandem duplication events [6] [25]. Computational identification of these clusters typically defines them as genomic regions where at least two NLR genes are located within 200 kilobases of each other and separated by no more than eight non-NLR genes [6]. The MCScanX algorithm is commonly used to identify tandem and segmental duplications, with visualization tools like TBtools enabling chromosomal mapping of these arrangements [6] [25]. These analyses have revealed that different plant species exhibit substantial variation in their cluster organizations, with some species showing extensive tandem arrays of related NBS-LRR genes while others display more dispersed genomic distributions [10] [24].

Structural Variation Detection Methods

Advanced sequencing technologies and specialized computational approaches are required to detect the full spectrum of structural variations affecting NBS-LRR genes [22] [23]. Long-read sequencing technologies (such as PacBio HiFi sequencing) generate reads of 10-20 kb with high accuracy (Q30+), enabling the resolution of complex genomic regions that are often enriched for NBS-LRR genes [23]. Read-depth methods identify copy number variations (deletions and duplications) by detecting deviations from expected coverage distributions [22] [23]. Split-read approaches identify breakpoints of structural variations by detecting reads that split across rearrangement junctions [22]. Assembly-based methods construct complete genomes or genomic regions de novo and compare them to reference sequences to identify structural differences [22] [23]. For validation, PCR-based methods including quantitative PCR (for copy number validation) and breakpoint-specific PCR (for junction validation) provide orthogonal confirmation of predicted structural variants [22].

Experimental Validation and Functional Characterization

Beyond computational identification, experimental approaches are essential for validating the functional significance of structural variations in NBS-LRR genes. Several well-established methodologies enable researchers to connect genomic variations with phenotypic outcomes in disease resistance.

Expression Profiling Under Stress Conditions

Transcriptomic analyses through RNA sequencing provide critical insights into the functional roles of NBS-LRR genes with different domain architectures. Standard approaches involve treating plants with various biotic (fungal, bacterial, or viral pathogens) and abiotic (drought, salt, temperature) stresses, then extracting RNA from different tissues at multiple time points for sequencing [10]. The resulting data are processed through transcriptomic pipelines to calculate expression values (typically FPKM or TPM), which are then visualized as heatmaps to identify differentially expressed NBS-LRR genes [10]. For example, expression profiling in cotton identified putative upregulation of specific orthogroups (OG2, OG6, and OG15) in different tissues under various biotic and abiotic stresses in plants with varying susceptibility to cotton leaf curl disease [10].

Virus-Induced Gene Silencing (VIGS)

VIGS has emerged as a powerful tool for functional characterization of NBS-LRR genes. This approach uses modified viruses to deliver gene-specific sequences that trigger RNA interference and silence target genes [10] [24]. The standard protocol involves: (1) Target Selection - identifying a unique gene segment (typically 200-500 bp) specific to the NBS-LRR gene of interest; (2) Vector Construction - cloning the target segment into a VIGS vector (such as TRV-based vectors); (3) Plant Inoculation - introducing the vector into plants through agrobacterium-mediated infiltration or in vitro transcription; and (4) Phenotypic Assessment - challenging silenced plants with pathogens and evaluating disease symptoms compared to controls [10] [24]. For instance, silencing of GaNBS (from orthogroup OG2) in resistant cotton demonstrated its putative role in reducing virus titers [10]. Similarly, VIGS of Vm019719 in resistant Vernicia montana compromised its resistance to Fusarium wilt, confirming this NBS-LRR gene's critical role in disease resistance [24].

Genetic Variation Analysis Between Resistant and Susceptible Genotypes

Comparing genetic sequences between resistant and susceptible varieties can identify structural variations correlated with disease resistance phenotypes. This typically involves whole-genome sequencing of multiple accessions with contrasting resistance phenotypes, followed by variant calling to identify polymorphisms (SNPs, indels, and structural variants) specifically associated with resistance [10] [24]. For example, comparison between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified several unique variants in NBS genes of Mac7 (6,583 variants) and Coker312 (5,173 variants) [10]. Further analysis can reveal how these variations affect functional domains, gene expression, or protein function.

Protein-Ligand and Protein-Protein Interaction Studies

Understanding how different domain architectures influence molecular interactions is crucial for deciphering NBS-LRR function. Protein-ligand interaction studies examine how NBS domains bind nucleotides (ADP/ATP) and how structural variations affect nucleotide binding and hydrolysis [10]. Protein-protein interaction assays (such as yeast two-hybrid, co-immunoprecipitation, or surface plasmon resonance) investigate how LRR domains interact with pathogen effectors or host proteins, and how alternative domain arrangements affect these interactions [10] [24]. For example, interaction studies in cotton showed strong binding of certain NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus [10].

Table 4: Key Experimental Approaches for Validating NBS-LRR Gene Function

Method	Key Applications	Typical Workflow	Interpretative Considerations
Expression Profiling	Identify stress-responsive NLR genes; Compare expression in resistant vs. susceptible varieties [10]	RNA extraction from stressed tissues â†’ RNA-seq library preparation â†’ Sequencing â†’ Differential expression analysis [10]	Expression changes may be tissue-specific or temporal; Correlation â‰ causation
VIGS	Functional validation of specific NLR genes; Assess role in disease resistance [10] [24]	Target selection â†’ Vector construction â†’ Plant inoculation â†’ Pathogen challenge â†’ Phenotyping [10] [24]	Silencing efficiency varies; Potential off-target effects; Developmental impacts
Genetic Variation Analysis	Identify polymorphisms associated with resistance; Detect presence/absence variations [10] [24]	WGS of multiple accessions â†’ Variant calling â†’ Association with phenotypes [10] [24]	Requires adequate sample size; Population structure can confound associations
Interaction Studies	Characterize binding partners; Understand signaling mechanisms [10]	Recombinant protein expression â†’ Interaction assays (Y2H, Co-IP, SPR) â†’ Data analysis [10]	In vitro conditions may not reflect in vivo context; Transient vs. stable interactions

Research on structural variations in NBS-LRR genes relies on specialized bioinformatic tools, experimental reagents, and genomic resources. The following table summarizes key solutions that enable comprehensive analysis in this field.

Table 5: Essential Research Resources for Analyzing Structural Variations in NBS-LRR Genes

Resource Category	Specific Tools/Reagents	Primary Function	Application Notes
Bioinformatic Tools	HMMER [10] [6] [25]; OrthoFinder [10] [25]; MCScanX [6]	Domain identification; Orthogroup analysis; Gene duplication detection	HMMER uses Pfam models (e.g., NB-ARC: PF00931); OrthoFinder uses DIAMOND for sequence similarity [10]
Domain Databases	Pfam [10] [6]; InterPro [25]; SMART [6]	Protein domain annotation and classification	Pfam provides HMM profiles; CD-search verifies domain presence [10] [6]
Genomic Resources	Plaza Genome Database [10]; Phytozome [10]; NCBI Genome [10]	Source of genome assemblies and annotations	Multi-species comparisons require standardized annotations [10]
VIGS Vectors	TRV-based vectors [10] [24]	Functional gene silencing in plants	TRV1 and TRV2 systems; Agrobacterium delivery [10] [24]
Expression Analysis	IPF Database [10]; CottonFGD [10]; PlantCARE [6]	Tissue-specific expression data; Promoter element analysis	PlantCARE identifies cis-elements in promoters [6]
Population Genomics	DGV [22]; gnomAD-SV [22]; dbVAR [22]	Structural variation frequency in populations	Distinguish pathogenic SVs from polymorphisms [22]

The comprehensive analysis of structural variations in NBS-LRR genes reveals a complex landscape of both highly conserved classical architectures and evolutionarily dynamic species-specific arrangements. The classical TNL, CNL, and RNL configurations represent the core immune receptors maintained across broad evolutionary timescales, while novel domain arrangements resulting from recent structural variations provide raw material for evolutionary innovation in pathogen recognition [10] [6] [24].

This duality has important implications for both basic plant immunity research and applied crop improvement strategies. From a fundamental perspective, the conservation of classical architectures across diverse plant lineages underscores their essential role in core immune signaling mechanisms. Meanwhile, the discovery of species-specific arrangements highlights the remarkable plasticity of plant genomes in generating structural diversity to confront evolving pathogen populations [10] [26]. The functional characterization of these varied architectures through integrated computational and experimental approaches continues to reveal new mechanisms of pathogen recognition and defense signaling.

For crop improvement, understanding structural variations in NBS-LRR genes provides valuable insights for marker-assisted breeding and genetic engineering strategies. The identification of specific domain arrangements associated with disease resistance in crop wild relatives offers potential targets for introgression into cultivated varieties [24] [25]. Furthermore, documenting the erosion of NBS-LRR diversity during domesticationâ€”as observed in asparagus, where gene counts decreased from 63 NLR genes in wild A. setaceus to just 27 in cultivated A. officinalisâ€”informs conservation strategies for maintaining genetic diversity in breeding programs [25].

As sequencing technologies continue to advance, particularly with the widespread adoption of long-read sequencing that effectively resolves complex repetitive regions, our understanding of structural variations in NBS-LRR genes will undoubtedly expand [22] [23]. Future research integrating pangenome references, multi-omics data, and advanced functional characterization will further illuminate how classical and species-specific domain architectures collectively contribute to plant disease resistance in natural and agricultural ecosystems.

TIR-NBS-LRR (TNL) proteins constitute a major class of intracellular immune receptors that enable plants to detect pathogen effectors and initiate robust defense responses. Understanding the diversity, distribution, and evolution of these genes across the plant kingdom is fundamental to plant pathology and resistance breeding. This guide provides a comparative analysis of TNL genes, synthesizing genomic data from diverse species to elucidate patterns of expansion, contraction, and structural variation that define this critical component of the plant immune system.

Comparative Distribution of TNL Genes Across Plant Lineages

Genomic analyses reveal a striking pattern of TNL distribution across plant phylogeny. TNL genes are ubiquitous in dicotyledonous plants but are completely absent from cereal genomes, suggesting lineage-specific loss in monocots [1]. The evolutionary trajectory of TNL genes shows deep origins, with homologs present in non-vascular plants and gymnosperms, though substantial gene expansion occurred primarily in flowering plants [10] [1].

Table 1: Distribution of NBS-LRR Genes Across Representative Plant Species

Species	Total NBS/NBS-LRR Genes	TNL Genes	CNL/Non-TNL Genes	Key Evolutionary Notes
Arabidopsis thaliana	149-167 [27]	~62 [1]	~87	Representative dicot model with both major subfamilies
Brassica oleracea	157 [27]	Not specified	Not specified	Retained TNLs post-divergence from Arabidopsis
Brassica rapa	206 [27]	Not specified	Not specified	Retained TNLs post-divergence from Arabidopsis
Fragaria species (diploid strawberries)	133-325 [28] [6]	Less than non-TNLs (under 50%) [6]	Over 50% of NLR family [6]	Non-TNLs dominate in all eight diploid species studied
Oryza sativa (rice)	~400 [1]	0 [1]	~400	Complete absence of TNLs characteristic of cereals
Nicotiana benthamiana	156 NBS-LRR homologs [3]	5 TNL-type [3]	25 CNL-type [3]	Model plant for virology with limited TNL representation
Physcomitrella patens (moss)	~25 [10]	Present [1]	Present [1]	Represents ancestral NLR repertoire in non-vascular plants

The evolutionary dynamics between TNL and non-TNL genes show notable patterns. In wild strawberries, non-TNLs constitute over 50% of the NLR gene family in all eight diploid species examined, surpassing TNLs in proportion [6]. Expression analyses further indicate that non-TNLs show dominant expression under both normal and infected conditions, with RNLs exhibiting particularly high expression levels [6].

Domain Architecture and Structural Diversity

TNL proteins exhibit a characteristic tripartite domain structure consisting of an N-terminal Toll/Interleukin-1 receptor (TIR) domain, a central nucleotide-binding site (NBS) domain, and C-terminal leucine-rich repeats (LRRs) [1]. The TIR domain is involved in signaling, the NBS domain functions as a molecular switch for ATP/GTP binding and hydrolysis, and the LRR domain is responsible for protein-protein interactions and ligand binding [28] [1].

Comparative genomics has uncovered significant diversity in domain architecture beyond the classical TNL structure. A comprehensive study analyzing 12,820 NBS-domain-containing genes across 34 plant species identified 168 distinct classes with several novel domain architecture patterns [10] [29]. These include:

Classical architectures: TIR-NBS, TIR-NBS-LRR [10] [29]
Species-specific structural patterns: TIR-NBS-TIR-Cupin1-Cupin1, TIR-NBS-Prenyltransf, and Sugar_tr-NBS [10]
Truncated variants: TIR-NBS (TN) proteins that lack LRR domains [1]

Table 2: TNL Domain Architecture Variants and Their Functional Implications

Architecture Type	Domain Composition	Predicted Functional Role	Conservation Across Species
Full-length TNL	TIR-NBS-LRR	Canonical pathogen recognition and signaling	Broadly distributed across dicots
TN-type	TIR-NBS	Potential adaptors or regulators of signaling	Limited distribution
TIR-X	TIR with other domains	Specialized functional adaptations	Often species-specific
TNL with integrated domains	TIR-NBS-LRR with additional C-terminal domains	Expanded recognition capabilities	Emerging through lineage-specific evolution

Structural variations significantly impact function. The LRR domain typically contains 14 repeats on average with 5-10 sequence variants for each repeat, creating immense potential for functional variation - estimated at over 9Ã—10Â¹Â¹ variants in Arabidopsis alone [1]. This diversity generates the putative binding surface responsible for pathogen recognition specificity.

Evolution and Genomic Organization

TNL genes evolve through diverse mechanisms that drive their diversification. Phylogenetic analyses reveal that plant NBS-LRR genes are numerous and ancient in origin, with orthologous relationships difficult to determine due to lineage-specific gene duplications and losses [1]. The evolution of TNL genes follows a "birth-and-death" model characterized by several key processes:

Gene duplication: Both tandem and segmental duplications generate new genetic material for evolution [30]
Unequal crossing-over: Creates variation in copy number within clusters [1]
Sequence exchange: Gene conversion and ectopic recombination reshape sequences [30]
Diversifying selection: Maintains variation in solvent-exposed residues of LRR domains [1]

Genomic organization of TNL genes shows distinct patterns across species. These genes are frequently clustered in plant genomes as a result of both segmental and tandem duplications [1] [30]. In Arabidopsis, NBS-LRR genes are distributed as singletons and clusters, with approximately 40 clusters identified [30]. These clusters can be homogeneous (containing genes from the same phylogenetic lineage) or heterogeneous (containing genes from different lineages) [30].

Selective pressures differ significantly between TNL and CNL gene types. Comparative analysis of Fragaria species demonstrated that Ks and Ka/Ks values of TNLs were significantly greater than those of non-TNLs, suggesting TNLs are more rapidly evolving and driven by stronger diversifying selective pressures [28]. However, in diploid wild strawberries, a significantly higher number of non-TNLs were under positive selection compared to TNLs, indicating their rapid diversification in these specific lineages [6].

Expression Regulation and miRNA Interactions

TNL gene expression is tightly regulated through multiple mechanisms, with microRNAs playing a particularly important role. At least eight families of miRNAs have been described that target NBS-LRRs in plants, with most targeting highly duplicated NBS-LRRs [31]. These miRNAs typically target conserved regions of NBS-LRR genes, allowing one miRNA to regulate multiple lineage members.

Key regulatory patterns include:

miR482/2118 family: Targets the encoded P-loop region of NBS-LRR genes and is conserved from gymnosperms to dicots [31]
PhasiRNA production: 22-nt miRNAs trigger phased secondary siRNA production from their target NBS-LRR mRNAs [31]
Expression plasticity: In cotton, expression profiling revealed upregulation of specific orthogroups (OG2, OG6, OG15) in different tissues under various biotic and abiotic stresses in plants with varying susceptibility to cotton leaf curl disease [10]

The co-evolutionary relationship between miRNAs and NBS-LRRs represents an important regulatory balance. Nucleotide diversity in the wobble position of the codons in the target site drives the diversification of miRNAs, creating a dynamic evolutionary arms race between regulators and their targets [31]. This system may enable plants to maintain extensive NLR repertoires without exhausting functional NLR loci, potentially offsetting fitness costs associated with NLR maintenance [10].

Functional Validation and Disease Resistance Associations

Functional studies provide critical evidence linking TNL diversity to disease resistance phenotypes. Virus-induced gene silencing (VIGS) of GaNBS (OG2) in resistant cotton demonstrated its putative role in virus tittering, directly validating the function of specific TNL orthogroups in pathogen defense [10] [29]. Protein-ligand and protein-protein interaction analyses further showed strong interactions of putative NBS proteins with ADP/ATP and different core proteins of the cotton leaf curl disease virus [10].

Resistance correlations are evident across plant species:

In wild strawberries, Fragaria pentaphylla and Fragaria nilgerrensis with the highest proportion of non-TNLs exhibited significantly greater resistance to Botrytis cinerea compared to Fragaria vesca with the lowest proportion of non-TNLs [6]
Genetic variation analysis between susceptible (Coker 312) and tolerant (Mac7) Gossypium hirsutum accessions identified several unique variants in NBS genes, with Mac7 displaying 6,583 variants compared to 5,173 in Coker312 [10]
Expression profiling of NBS-LRR genes in Fragaria species revealed that the same gene expressed differently under different genetic backgrounds in response to pathogens [28]

Research Reagent Solutions and Methodologies

Genome-wide identification of TNL genes relies on established bioinformatic protocols and experimental reagents. The following toolkit represents essential resources for TNL gene family analysis:

Table 3: Essential Research Reagents and Resources for TNL Gene Analysis

Reagent/Resource	Specific Application	Function/Utility	Example Sources/References
HMMER Suite	Domain identification	Identifies NB-ARC domains (PF00931) using hidden Markov models	[10] [28] [27]
Pfam Database	Domain verification	Curated database of protein domains and families	[28] [27] [3]
OrthoFinder	Orthogroup analysis	Determines orthologous groups across species	[10]
MEME Suite	Motif discovery	Identifies conserved protein motifs	[6] [3]
SMART/CDD	Domain validation	Verifies domain predictions and boundaries	[28] [6]
Virus-Induced Gene Silencing (VIGS)	Functional validation	Assesses gene function through silencing	[10] [29] [3]
DIAMOND/MCL	Sequence similarity/clustering	Fast sequence similarity and clustering algorithms	[10]
RNA-seq Expression Profiling	Expression analysis	Determines differential expression under stress	[10] [6]

Standardized methodologies have emerged for comprehensive TNL analysis:

Sequence Identification: HMMER searches with NB-ARC domain (PF00931) followed by manual curation [10] [27]
Domain Architecture Classification: PfamScan, SMART, and COILS analyses for TIR, CC, and LRR domains [10] [28]
Phylogenetic Analysis: Multiple sequence alignment with MAFFT or ClustalW followed by Maximum Likelihood tree construction [10] [6]
Evolutionary Analysis: Orthogroup clustering, Ka/Ks calculations, and duplication pattern identification [10] [6]
Expression Profiling: RNA-seq data analysis across tissues and stress conditions [10] [6]

Comparative genomics of TNL genes across plant kingdoms reveals a dynamic evolutionary landscape shaped by lineage-specific expansions, contractions, and diversifying selection. The distribution of TNL genes demonstrates profound lineage-specific patterns, with complete absence in cereals contrasting with substantial diversity in dicots. Structural analyses uncover both conserved architectures and innovative domain combinations that expand functional capabilities. The regulation of TNLs through miRNA interactions represents a critical layer of control that balances defense efficacy with fitness costs. Functional studies continue to validate the role of specific TNL orthogroups in pathogen recognition and defense signaling. These insights provide a foundation for leveraging TNL diversity in crop improvement programs and understanding the fundamental principles of plant immunity evolution.

Computational Approaches for TNL Identification and Classification

Genome-wide screening for protein domains is a fundamental methodology in bioinformatics, enabling researchers to annotate gene function and understand evolutionary relationships across species. Among the most powerful techniques for this purpose are profile Hidden Markov Models (HMMs), which provide a probabilistic framework for modeling multiple sequence alignments of protein families and detecting remote homologies that simpler methods might miss [32] [33]. The HMMER software package, developed by Sean Eddy, has emerged as a de facto standard for this type of analysis, serving as the computational engine for major protein domain databases including Pfam, TIGRFAMs, and SMART [32] [33]. The critical importance of these tools is particularly evident in specialized research domains such as the study of TIR-NBS-LRR domain architectures, where accurate identification of these disease resistance genes in plants provides crucial insights into innate immune mechanisms and potential applications in crop improvement [13] [34] [6].

This comparison guide objectively evaluates HMMER's performance against alternative profile HMM implementations, with particular focus on its application in plant genomics research. We examine experimental data from comparative studies, analyze critical algorithmic differences that impact performance, and provide detailed protocols for conducting genome-wide screens for TIR-NBS-LRR genes and other important protein domains. The guidance presented here will equip researchers with the necessary knowledge to select appropriate tools and methodologies for their specific domain analysis requirements, with special consideration for the challenges inherent in large-scale genomic studies.

HMMER Versus Alternative Profile HMM Tools

Performance Comparison and Benchmarking

The landscape of profile HMM tools has been dominated by two main packages: HMMER and SAM (Sequence Alignment and Modeling System). Multiple independent studies have systematically compared their performance using standardized datasets and metrics, with results consistently highlighting a fundamental trade-off between sensitivity and accuracy in their default configurations.

Table 1: Comprehensive Performance Comparison Between HMMER and SAM

Performance Metric	HMMER	SAM	Experimental Context
Overall Sensitivity	Lower	Superior	SCOP/Pfam-based test set with local and global HMM scoring [32]
Model Estimation	Inferior	Superior	Built from identical multiple sequence alignments [32] [33]
Model Scoring Accuracy	More accurate	Less accurate	Evaluation of scoring algorithms against known structures [32]
Alignment Quality Dependency	High	High	Quality of input multiple alignment is the most critical performance factor [33]
Automated Alignment Generation	Lacks equivalent	SAM T99 script available	Iterative database search similar to PSI-BLAST [33]
Execution Speed	1-3x faster on databases >2000 sequences	Faster on smaller databases	Benchmarking tests with varying database sizes [33]

Comparative analyses reveal that SAM's model estimation capabilities generally produce more sensitive models, while HMMER's scoring algorithms provide more accurate E-values and better discrimination between true and false positives [32]. This performance difference stems primarily from how each package handles the balance between observed sequence counts and prior probabilities during model construction. SAM's implementation gives more weight to prior probabilities, which proves particularly advantageous when working with limited sequence data, whereas HMMER places greater emphasis on the actual sequence counts in the input alignment [32].

In practical applications, researchers have successfully employed HMMER for genome-wide identification of NBS-LRR genes across numerous plant species. For example, studies in Nicotiana benthamiana identified 156 NBS-LRR homologs using HMMER with an E-value cutoff of < 1Ã—10â»Â²â° [3], while investigations of Arachis hypogaea cv. Tifrunner discovered 713 full-length NBS-LRRs using similar HMMER-based approaches [26]. These implementations demonstrate HMMER's robustness for large-scale genomic surveys, particularly when appropriate domain thresholds and verification steps are implemented.

Algorithmic and Implementation Differences

The performance disparities between HMMER and SAM originate from fundamental differences in their underlying algorithms and architectural decisions:

HMM Architecture: HMMER utilizes a 7-transition model that forbids transitions from insert to delete states and vice versa, while SAM maintains the original 9-transition architecture that allows all possible transitions between states [32]. This architectural variation impacts how each model handles indels and affects the overall model flexibility.
Prior Probabilities: Both packages employ Dirichlet mixtures for modeling emission prior probabilities, but SAM defaults to a 20-component mixture compared to HMMER's 9-component mixture, providing potentially more nuanced handling of amino acid conservation patterns [32]. For transition priors, SAM assigns higher probabilities to insertions and deletions, which may contribute to its increased sensitivity in detecting remote homologs.
Sequence Weighting: The two packages employ different algorithms for calculating relative sequence weightsâ€”HMMER uses tree-based weighting while SAM implements an unpublished relative entropy-based methodâ€”though studies have shown their relative weighting schemes perform equivalently [32]. However, they differ significantly in how they calculate the total weight (effective sequence number), which governs the balance between observed sequence counts and prior probabilities.
Technical Implementation: HMMER is open-source and operates under the GNU General Public License, while SAM is free for academic use but not open source [33]. This distinction has practical implications for customization and integration into larger analysis pipelines. More recently, PyHMMER has emerged as a Python binding to HMMER, providing greater flexibility for integration with modern bioinformatics workflows and enabling direct manipulation of HMM objects within Python scripts [35].

Experimental Protocols for TIR-NBS-LRR Gene Identification

Standard Workflow for Genome-Wide Domain Screening

The following protocol outlines a comprehensive workflow for identifying and characterizing TIR-NBS-LRR genes using HMMER and Pfam domain models, synthesized from multiple published studies [13] [3] [34]:

Step 1: Domain Model Acquisition

Retrieve the NBS (NB-ARC) HMM profile (PF00931) from the Pfam database (http://pfam.xfam.org/) [3] [34]. This conserved domain serves as the foundational model for identifying NBS-LRR genes.
Additionally, obtain HMM profiles for associated domains: LRR (multiple accessions including PF00560, PF07723, PF12799, PF13516, PF13855, PF14580), TIR (PF01582), and RPW8 (PF05659) for comprehensive classification [6].

Step 2: Initial HMMER Search

Execute an HMMER search (hmmsearch or jackhmmer) against the target organism's proteome using the NB-ARC domain profile. Studies typically employ an E-value cutoff ranging from < 1Ã—10â»Â²â° for high stringency [3] to < 1Ã—10â»Â² for broader identification [6].
Alternatively, perform a BLASTP search using NB-ARC seed sequences from Pfam as queries with an E-value threshold of â‰¤ 1Ã—10â»Â² as a complementary approach [6].

Step 3: Candidate Sequence Verification

Extract all candidate sequences identified in the initial search and subject them to domain verification using the Pfam database, SMART tool (http://smart.embl-heidelberg.de/), and NCBI's Conserved Domain Database (CDD) [3] [34].
Remove duplicate entries and filter for sequences containing complete NBS domains to ensure analysis of functionally relevant genes [26].

Step 4: Classification and Subfamily Determination

Classify verified NBS-LRR genes into subfamilies based on their domain architecture:
- TNL: TIR-NBS-LRR
- CNL: CC-NBS-LRR
- RNL: RPW8-NBS-LRR
- NL: NBS-LRR (no TIR or CC domain)
- TN: TIR-NBS (no LRR)
- CN: CC-NBS (no LRR)
- N: NBS-only [3] [6]
Use COILS server (with threshold 0.1) or similar tools to identify coiled-coil domains that may not be detected by HMM profiles alone [6].

Step 5: Advanced Characterization

Conduct phylogenetic analysis using maximum likelihood methods (e.g., IQ-TREE, MEGA) on aligned NB-ARC domain sequences to elucidate evolutionary relationships [3] [6].
Perform gene structure analysis examining exon-intron organization using genomic DNA sequences and annotation files [3].
Analyze cis-regulatory elements in promoter regions (typically 1.5 kb upstream of start codons) using databases such as PlantCARE [3].

Research Reagent Solutions for Domain Analysis

Table 2: Essential Bioinformatics Tools and Resources for NBS-LRR Gene Identification

Tool/Resource	Function	Application in NBS-LRR Research
HMMER Suite	Profile HMM construction and searching	Primary tool for identifying NBS-encoding genes using PF00931 model [13] [3] [34]
Pfam Database	Curated collection of protein domain models	Source of NB-ARC (PF00931) and related domain HMM profiles [3] [34]
SMART	Protein domain annotation	Validation of identified domains and detection of additional structural features [34] [6]
NCBI CDD	Conserved domain identification	Independent verification of NBS and associated domains [3]
COILS Server	Coiled-coil domain prediction	Detection of CC domains in non-TIR-NBS-LRR genes [6]
MEME Suite	Conserved motif discovery	Identification of novel motifs beyond canonical domains [3]
PlantCARE	cis-regulatory element analysis	Detection of regulatory elements in promoter regions of NBS-LRR genes [3]

Application in TIR-NBS-LRR Research: Case Studies

The application of HMMER in TIR-NBS-LRR research has yielded significant insights into the evolution and distribution of these important disease resistance genes across plant species. Comparative genomic studies using these tools have revealed remarkable variation in the size and composition of NBS-LRR gene families, reflecting different evolutionary strategies for pathogen resistance.

In the tung tree species (Vernicia fordii and Vernicia montana), HMMER-based analysis identified 90 and 149 NBS-LRR genes respectively, with complete absence of TIR-domain containing NBS-LRRs in the Fusarium wilt-susceptible V. fordii compared to 12 TNLs in the resistant V. montana [13]. This striking difference in domain architecture distribution between closely related species provides compelling evidence for the role of specific NBS-LRR subtypes in disease resistance. Similarly, research across six Fragaria species identified 1,134 NBS-LRR genes comprising 184 gene families, with phylogenetic analyses revealing that lineage-specific duplications occurred before species divergence [34].

These large-scale comparative analyses consistently demonstrate the value of HMMER-based approaches for elucidating evolutionary patterns in disease resistance gene families. The ability to accurately identify and classify TIR-NBS-LRR genes has proven particularly valuable for understanding plant immunity mechanisms and identifying candidate genes for marker-assisted breeding programs aimed at enhancing disease resistance in crop species [13] [6].

Technical Implementation and Best Practices

HMMER Implementation and Parallelization

Recent advancements in HMMER implementation have significantly improved its utility for large-scale genomic analyses. The development of PyHMMER, a Python library binding to HMMER via Cython, provides enhanced flexibility for integration with modern bioinformatics workflows [35]. This implementation allows researchers to create queries directly from Python code, launch searches, and access results without file I/O bottlenecks, while also providing access to previously unavailable statistics such as uncorrected P-values.

A critical improvement in PyHMMER concerns its parallelization model, which demonstrates substantially better performance compared to native HMMER implementation. Benchmarking tests reveal that PyHMMER achieves approximately 96% parallelization efficiency compared to only 35% in native HMMER, resulting in dramatic reductions in processing time [35]. For example, when annotating a large protein set on a six-core machine, PyHMMER completed the task in 27 hours compared to 97 hours required by native HMMERâ€”a 72% reduction in runtime [35]. This enhanced efficiency makes large-scale comparative genomics projects substantially more feasible.

Optimization Strategies for Domain Screening

Based on published methodologies and performance characteristics, researchers can implement several strategies to optimize genome-wide domain screens:

Parameter Selection: For initial discovery phases, use less stringent E-value cutoffs (e.g., < 1Ã—10â»Â²) followed by progressive filtering, while conservative thresholds (e.g., < 1Ã—10â»Â²â°) are more appropriate for validation studies [3] [6].
Domain Verification: Always employ multiple complementary tools (Pfam, SMART, CDD) for domain verification to minimize false positives and negatives resulting from the limitations of any single method [3] [34].
Classification Rigor: Implement both sequence-based (HMMER) and structure-based (COILS) approaches for classifying NBS-LRR subfamilies, as CC domains may not always be detected by profile HMMs alone [6].
Pipeline Integration: Consider utilizing PyHMMER rather than command-line HMMER for large-scale analyses or when integrating domain searches into custom bioinformatics pipelines, taking advantage of its improved parallelization and programmability [35].

These optimization strategies, combined with the robust experimental protocols outlined in this guide, will enable researchers to conduct comprehensive and accurate genome-wide screens for TIR-NBS-LRR genes and other important protein domains across diverse biological systems.

This guide provides a comparative analysis of two prominent bioinformatics tools, NLR-Annotator and RGAugury, for identifying nucleotide-binding leucine-rich repeat (NLR) and broader Resistance Gene Analog (RGA) families in plant genomes. While both tools are essential for mining plant immune receptors, they differ fundamentally in scope, methodology, and application. NLR-Annotator specializes in de novo identification of NLR genes directly from genomic sequences, independent of pre-existing gene annotations, making it ideal for discovering novel or unannotated NLRs. In contrast, RGAugury offers a comprehensive pipeline for predicting multiple RGA families, including not only NLRs but also membrane-associated receptor-like kinases (RLKs) and receptor-like proteins (RLPs), providing a broader systems-level view of plant immunity components. Performance benchmarking against the curated RefPlantNLR dataset reveals that NLR-Annotator demonstrates high sensitivity for TNL and CNL subfamilies, whereas RGAugury provides a more versatile platform for holistic resistance gene annotation. Tool selection should therefore be guided by research objectives: NLR-Annotator for deep, annotation-independent NLR discovery, and RGAugury for complete RGA cataloging and classification.

NLR-Annotator:De NovoNLR Discovery

NLR-Annotator is designed for de novo genome-wide identification of NLR-encoding genes without relying on pre-annotated gene models, which often miss or fragment these genes due to their low, stress-induced expression and complex genomic architecture [36] [37]. Its core methodology involves dissecting genomic sequences into 20-kb fragments, translating them in all six reading frames, and screening for NB-ARC-associated motifs. Detected motifs serve as seeds to explore flanking sequences for additional NLR-associated domains (e.g., TIR, CC, LRR), finally assembling complete NLR loci [36]. This approach effectively annotates both functional genes and pseudogenized NLR traces, providing a complete repertoire of NLR loci in a genome [37].

RGAugury: Comprehensive RGA Prediction

RGAugury is an integrative pipeline for large-scale, genome-wide prediction and classification of various RGA families [38]. It automates the identification of RGA-related domains and motifsâ€”NB-ARC, LRR, TM, STTK, LysM, CC, and TIRâ€”and classifies candidates into four major families: NBS-encoding proteins, TM-CC proteins, RLKs, and RLPs [38]. A key feature is its initial filtering step, which uses BLASTP against a custom RGA database (RGAdb) to remove a significant portion of non-RGA genes, streamlining downstream domain detection and improving computational efficiency [38].

Performance Comparison and Benchmarking Data

Independent benchmarking against the RefPlantNLR datasetâ€”a comprehensive collection of 481 experimentally validated NLRs from 31 genera of flowering plantsâ€”provides critical performance insights [39]. The following table summarizes key comparative metrics for NLR identification.

Table 1: Performance Benchmarking of NLR-Annotator and RGAugury

Feature	NLR-Annotator	RGAugury
Primary Scope	NLR genes (TNL, CNL, RNL, NL) [36]	Multiple RGA families (NLR, RLK, RLP, TM-CC) [38]
Input Data	Genomic sequence, Transcript sequences [40]	Protein sequences, Genomic sequence [38] [40]
Annotation Method	De novo motif-based (independent of gene models) [36]	Domain-based, often relies on pre-annotated protein sequences [38] [39]
Key Strength	Identifies non-canonical, unannotated, or pseudogenized NLRs [37]	Comprehensive identification of the entire RGA repertoire [38]
Reported Limitation	May produce inconsistent domain architectures compared to curated references [39]	Performance can be affected by the quality of initial gene annotation [39]
Typical Output	NLR loci (genomic coordinates), GFF annotation [40]	RGA classification, genome position, GFF annotation [38] [40]

Further analysis of benchmarking results reveals that while both tools can retrieve a majority of known NLRs, they often produce domain architectures inconsistent with the manually curated RefPlantNLR annotation [39]. This highlights the importance of manual curation when precise domain architecture is critical. NLR-Annotator has demonstrated high sensitivity in identifying NLRs in well-assembled genomes, discovering 3,400 NLR loci and 1,560 complete NLRs in the wheat cultivar Chinese Spring [36] [37]. RGAugury has been validated on the Arabidopsis genome, successfully identifying 98.5% of reported NBS-encoding genes, 85.2% of RLPs, and 100% of RLKs [38].

Experimental Protocols for Tool Validation

Benchmarking Against RefPlantNLR

The RefPlantNLR dataset serves as a gold standard for validating NLR prediction tools [39]. The typical validation workflow involves:

Dataset Acquisition: Obtain the RefPlantNLR dataset (v.20210712_481), which includes 481 experimentally validated NLR sequences from 31 plant genera [39].
Tool Execution: Run the target tool (e.g., NLR-Annotator or RGAugury) on the genome sequences from which the RefPlantNLR entries were originally cloned.
Result Comparison: Compare the tool's output against the known RefPlantNLR entries. Metrics calculated include:
- Sensitivity: The proportion of known RefPlantNLRs correctly identified by the tool.
- Specificity: The proportion of the tool's predictions that are correct (true positives) versus incorrect (false positives).
- Architectural Accuracy: The rate at which the tool correctly predicts the full domain architecture (e.g., TIR-NBS-LRR vs. CC-NBS-LRR) [39].
Curation and Analysis: Manually inspect discrepancies to understand the sources of error, such as fragmented gene models, unusual intron sizes, or divergent domain sequences [39].

Genome-Wide NLR Identification in Wheat using NLR-Annotator

Steuernagel et al. (2020) detailed the application of NLR-Annotator for a comprehensive analysis of the wheat NLR repertoire [36]:

Input: The high-quality genome assembly of hexaploid wheat (Triticum aestivum) cultivar Chinese Spring (IWGSC RefSeq v1.0) [36].
Processing: The entire genome sequence was processed through the NLR-Annotator pipeline, which dissected the sequence and identified fragments containing NLR-associated motifs.
Locus Definition: The tool integrated motif data to define NLR loci, precisely mapping the boundaries of NB-ARC domains and associated LRR, TIR, or CC regions [36].
Downstream Analysis: The resulting 3,400 NLR loci were analyzed for genomic distribution (noting telomeric clustering), phylogenetic relationships, presence of integrated domains, and expression profiles under biotic stress [36] [37].

Diagram 1: NLR-Annotator workflow for wheat genome analysis.

Validation of RGAugury on the Arabidopsis Genome

The Arabidopsis thaliana genome, with its well-annotated NLR, RLP, and RLK genes, provides a robust system for validating RGAugury's performance [38]:

Input Preparation: The annotated protein sequences of Arabidopsis were used as input for the RGAugury pipeline [38].
Pipeline Execution: The pipeline executed its two main steps:
- Initial Filtering: Protein sequences were aligned via BLASTP against the custom RGAdb to remove non-RGA candidates.
- Domain/Motif Detection: The remaining candidates were analyzed for specific domains and motifs using integrated tools like nCoils (CC), Phobius (TM), and Pfam scans (NB-ARC, TIR, LRR) [38].
Classification and Validation: Candidates were classified into NBS-encoding, RLK, RLP, or TM-CC families based on their domain composition. The predictions were compared against the known Arabidopsis RGA complement, achieving high validation rates of 98.5% for NBS-encoding genes, 100% for RLKs, and 85.2% for RLPs [38].

Diagram 2: RGAugury validation workflow on Arabidopsis.

Table 2: Key Resources for NLR and RGA Research

Resource Name	Type	Function in Research	Relevance to Tool Operation
RefPlantNLR [39]	Reference Dataset	A curated collection of 481 experimentally validated plant NLRs; used for benchmarking and defining canonical features.	Essential for validating and comparing the prediction accuracy of NLR-Annotator, RGAugury, and other tools.
Pfam [3] [7]	Domain Database	Provides Hidden Markov Models (HMMs) for protein domains (e.g., NB-ARC PF00931, TIR PF01582).	Used by RGAugury and HMM-based searches for core domain identification.
NCBI CDD [7] [6]	Domain Database	The Conserved Domain Database used for verifying the presence and completeness of specific domains.	Often used as a secondary verification step in genome-wide studies.
InterProScan [39] [41]	Integrated Tool	Scans protein sequences against multiple databases to predict domains and functional sites.	Used by pipelines like NLGenomeSweeper and NLRtracker for comprehensive domain annotation.
RGAdb [38]	Custom Database	A database of known disease resistance-related sequences used for initial BLAST filtering.	Core component of the RGAugury pipeline for efficiently reducing the search space.
nCoils [38]	Prediction Tool	Predicts the presence of coiled-coil (CC) domains in protein sequences.	Integrated into RGAugury for identifying CC domains in CNL-type NLRs and other RGAs.
Phobius [38]	Prediction Tool	Predicts transmembrane (TM) topology and signal peptides.	Integrated into RGAugury for identifying TM domains in RLKs, RLPs, and TM-CC proteins.

NLR-Annotator and RGAugury represent two powerful but philosophically distinct approaches to mining plant immune system genes. The choice between them depends heavily on the specific research goals. For projects focused exclusively on the intracellular NLR repertoire, particularly in genomes with poor annotation or for discovering non-canonical NLRs, NLR-Annotator is the superior tool due to its sensitive, annotation-independent approach. For studies aiming to characterize the entire spectrum of cell-surface and intracellular immune receptors, RGAugury offers an unparalleled, integrated solution.

The field continues to evolve rapidly. The recent development of the RefPlantNLR dataset has been instrumental in standardizing tool assessments [39]. Newer tools like NLRtracker (which uses RefPlantNLR features for annotation) and Resistify (which uses optimized HMMs and machine learning to avoid reliance on InterProScan) are emerging, promising even greater accuracy and ease of use [39] [40]. As long-read sequencing improves the quality of genome assemblies, these robust annotation pipelines will become increasingly critical for unlocking the genetic basis of disease resistance across the plant kingdom.

Orthogroup analysis represents a fundamental methodology in comparative genomics, enabling the classification of gene families into monophyletic groups descended from a single gene in the last common ancestor of the species being studied. This approach has proven particularly valuable for investigating the evolution of large, diverse gene families such as those encoding TIR-NBS-LRR (TNL) and non-TIR-NBS-LRR (nTNL) plant immune receptors. By clustering genes into orthogroups, researchers can trace evolutionary trajectories, identify lineage-specific expansions, and delineate functional conservation across taxa. The application of orthogroup analysis to NBS-LRR genes has revealed fundamental insights into plant immunity evolution, from ancestral lineages to modern angiosperms, providing a framework for understanding how plants maintain diverse repertoires of resistance genes to counter rapidly evolving pathogens.

The NBS-LRR gene family constitutes one of the largest and most variable plant protein families, with significant implications for disease resistance breeding. Recent studies have documented extensive variation in NBS-LRR gene counts across plant species, ranging from just 2 in the lycophyte Selaginella moellendorffii to over 2,000 in hexaploid wheat (Triticum aestivum) [10]. This dramatic expansion in flowering plants reflects continuous evolutionary innovation driven by host-pathogen coevolution. Orthogroup analysis has been instrumental in deciphering these complex evolutionary patterns, revealing both conserved core orthogroups maintained across diverse species and lineage-specific innovations that underlie adaptation to distinct pathogenic challenges.

Comparative Genomic Distribution of NBS-LRR Genes

Diversity of NBS-LRR Gene Architectures

The NBS-LRR gene family exhibits remarkable structural diversity, with distinct domain architectures defining major functional classes. Based on conserved N-terminal domains, NBS-LRR genes are primarily classified into TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL) subfamilies. Genome-wide studies across multiple plant species have revealed significant variation in the representation of these subfamilies, with important implications for disease resistance mechanisms.

Table 1: Distribution of NBS-LRR Gene Subfamilies Across Plant Species

Plant Species	Total NBS-LRR Genes	TNL Genes	CNL/nTNL Genes	RNL Genes	Reference
Capsicum annuum (pepper)	252	4 (1.6%)	248 (98.4%)	Not specified	[2]
Vernicia fordii (tung tree)	90	0 (0%)	90 (100%)	0 (0%)	[24]
Vernicia montana (tung tree)	149	12 (8.1%)	137 (91.9%)	0 (0%)	[24]
Nicotiana tabacum (tobacco)	603	73 (12.1%)	530 (87.9%)	Not specified	[7]
Wild strawberry species (Fragaria spp.)	143-287 per species	~30-40%	~60-70%	Included in nTNL	[6]

The distribution of NBS-LRR subfamilies follows distinct evolutionary patterns. TNL genes are completely absent in monocotyledons and have been lost independently in some eudicot lineages, including Vernicia fordii and Sesamum indicum [24]. In contrast, nTNL genes (primarily CNLs) appear to represent the dominant NBS-LRR class across most angiosperms, comprising over 50% of NLR genes in all eight wild strawberry species examined and reaching 98.4% in pepper [6] [2]. This skewed distribution suggests distinct evolutionary pressures acting on different NBS-LRR subfamilies, potentially reflecting their specialized roles in plant immunity.

Genomic Organization and Gene Clustering

NBS-LRR genes typically display non-random genomic distributions, often forming clusters of tandemly duplicated genes. These clusters represent hotspots of rapid evolution and generate significant diversity through unequal crossing over and gene conversion. Comparative genomics has revealed that gene clustering is a predominant feature of NBS-LRR genomic organization across plant species.

Table 2: Cluster Analysis of NBS-LRR Genes in Plant Genomes

Plant Species	Total NBS-LRR Genes	Genes in Clusters	Percentage in Clusters	Number of Clusters	Reference
Capsicum annuum (pepper)	252	136	54%	47	[2]
Vernicia fordii (tung tree)	90	Not specified	Non-random distribution	Not specified	[24]
Vernicia montana (tung tree)	149	Not specified	Non-random distribution	Not specified	[24]
Nicotiana tabacum (tobacco)	603	Not specified	Expanded via WGD and tandem duplication	Not specified	[7]

In pepper, 54% of NBS-LRR genes are organized into 47 gene clusters, driven primarily by tandem duplications and genomic rearrangements [2]. Similarly, synteny analysis between resistant (Vernicia montana) and susceptible (Vernicia fordii) tung tree species revealed non-random distributions of NBS-LRR genes across chromosomes, with both species showing enrichment in specific genomic regions [24]. This clustered organization facilitates the generation of diversity through mechanisms such as ectopic recombination and domain swapping, enabling plants to rapidly evolve new pathogen recognition specificities.

Orthogroup Analysis Methodologies

Identification and Classification of NBS-LRR Genes

Comprehensive identification of NBS-LRR genes represents the foundational step in orthogroup analysis. The standard workflow employs a combination of homology-based searches and domain prediction algorithms to identify candidate genes and classify them into subfamilies based on domain architecture.

Hidden Markov Model Searches: The initial identification typically begins with HMMER searches using the NB-ARC domain (PF00931) from the Pfam database against proteome sequences. This approach, employed in studies of wild strawberries, pepper, and Nicotiana species, ensures comprehensive identification of NBS-containing genes [6] [2] [7]. The use of an e-value cutoff (typically < 1.0) balances sensitivity and specificity in domain detection.

Domain Architecture Analysis: Following initial identification, candidate genes undergo comprehensive domain architecture characterization using multiple resources:

Pfam domains for TIR (PF01582), LRR (multiple accessions including PF00560, PF07723, PF07725, PF12799, PF13306), and RPW8 (PF05659) domains
NCBI Conserved Domain Database (CDD) for additional validation
COILS program or similar tools with a threshold of 0.1 for predicting coiled-coil domains
SMART database for additional domain verification

This multi-step domain analysis enables precise classification of NBS-LRR genes into subfamilies (TNL, CNL, RNL) and structural variants (N, NL, NLN, etc.) based on their domain compositions [2] [7].

Chromosomal Mapping and Cluster Definition: Genes are mapped to chromosomes based on genomic coordinates, and clusters are typically defined as genomic regions where at least two NBS-LRR genes are located within 200 kb and separated by no more than eight non-NLR genes [6]. This operational definition facilitates comparative analysis of cluster organization across species.

Figure 1: Orthogroup analysis workflow for NBS-LRR genes, integrating identification, classification, phylogenetic clustering, and evolutionary analysis.

Orthogroup Construction and Phylogenetic Analysis

Orthogroup construction employs phylogenetic clustering algorithms to group genes into families descended from a single ancestral gene, enabling comparative analysis across species.

Multiple Sequence Alignment: Orthogroup analysis typically begins with multiple sequence alignment of NBS domain sequences using tools such as MAFFT v7 with default parameters [6]. The resulting alignments are often trimmed using TrimAl to remove poorly aligned regions and improve phylogenetic signal [6].

Phylogenetic Reconstruction: Maximum Likelihood phylogenetic analysis is performed using programs such as IQ-TREE v1.6.12 or FastTreeMP with branch supports assessed through 1000 ultrafast bootstraps [6] [10]. ModelFinder within IQ-TREE selects optimal substitution models based on Bayesian Information Criterion [6]. The resulting trees visualize evolutionary relationships between NBS-LRR subfamilies and facilitate orthogroup assignment.

Orthogroup Clustering: OrthoFinder v2.5.1 implements a comprehensive pipeline for orthogroup inference, employing DIAMOND for fast sequence similarity searches and the MCL algorithm for clustering [10]. This approach identifies groups of orthologous and paralogous genes across multiple species, distinguishing core orthogroups (widely conserved across species) from lineage-specific orthogroups.

Evolutionary Analysis: Reconciliation of gene trees with species trees using software such as Notung enables inference of duplication and loss events [6]. MCScanX identifies syntenic blocks and categorizes duplication events into whole-genome duplication, tandem duplication, and segmental duplication [7]. Selection pressure analysis through Ka/Ks calculation differentiates between purifying selection (Ka/Ks < 1), neutral evolution (Ka/Ks â‰ˆ 1), and positive selection (Ka/Ks > 1).

Experimental Validation of Orthogroup Functions

Functional Characterization of NBS-LRR Genes

Orthogroup predictions require experimental validation to establish biological significance. Several approaches have been successfully employed to functionally characterize NBS-LRR genes identified through orthogroup analysis.

Expression Profiling: RNA-seq analysis under pathogen infection and stress conditions provides evidence for the involvement of specific orthogroups in defense responses. Studies in tung tree identified distinct expression patterns between resistant (Vernicia montana) and susceptible (Vernicia fordii) species, with the orthologous pair Vf11G0978-Vm019719 showing contrasting expression patterns correlated with resistance to Fusarium wilt [24]. Similarly, analysis of cotton NBS-LRR genes identified differential expression of specific orthogroups (OG2, OG6, OG15) in response to cotton leaf curl disease [10].

Virus-Induced Gene Silencing (VIGS): VIGS provides direct functional validation of NBS-LRR genes in disease resistance. Silencing of GaNBS (OG2) in resistant cotton demonstrated its essential role in limiting virus accumulation [10]. In tung tree, VIGS of Vm019719 compromised resistance to Fusarium wilt in the otherwise resistant Vernicia montana, confirming its functional role in disease resistance [24].

Genetic Variation Analysis: Comparison of NBS-LRR genes between resistant and susceptible genotypes identifies sequence variations potentially underlying phenotypic differences. Analysis of Gossypium hirsutum accessions identified 6,583 unique variants in the tolerant Mac7 line compared to 5,173 in the susceptible Coker312, with variations concentrated in specific NBS genes [10].

Protein Interaction Studies: Protein-ligand and protein-protein interaction assays demonstrate physical interactions between NBS-LRR proteins and pathogen molecules. Studies have confirmed strong interactions between specific NBS proteins and ADP/ATP as well as viral proteins [10]. The direct interaction between certain NBS-LRR proteins and pathogen effectors supports their role as pathogen sensors [42].

Case Study: Orthogroup Analysis in Tung Tree Fusarium Wilt Resistance

A comprehensive study of Vernicia fordii (susceptible) and Vernicia montana (resistant) provides a compelling case study in orthogroup analysis [24]. Researchers identified 90 and 149 NBS-LRR genes in the two species, respectively, with complete absence of TNL genes in V. fordii contrasting with 12 TNLs in V. montana. Orthologous gene analysis identified 43 orthologous pairs between the species, with one pair (Vf11G0978-Vm019719) showing divergent expression patterns correlated with resistance differences.

Functional analysis revealed that Vm019719 in V. montana is activated by VmWRKY64 and confers resistance to Fusarium wilt. In contrast, the allelic counterpart in V. fordii (Vf11G0978) contains a deletion in the promoter W-box element, rendering it unresponsive to WRKY activation and explaining the susceptibility phenotype. This case demonstrates how orthogroup analysis can pinpoint functionally significant genetic differences underlying disease resistance variation.

Table 3: Essential Research Reagents and Computational Tools for Orthogroup Analysis

Category	Resource/Tool	Specific Application	Function	Reference
Domain Databases	Pfam (PF00931)	NBS domain identification	Hidden Markov Models for domain detection	[6] [7]
	NCBI Conserved Domain Database	Domain validation	Comprehensive domain annotation	[7]
	SMART	Domain architecture analysis	Additional domain verification	[6]
Analysis Tools	HMMER v3.1b2	Initial gene identification	Profile HMM searches for NBS domains	[6] [7]
	OrthoFinder v2.5.1	Orthogroup clustering	Inference of orthogroups across species	[10]
	MCScanX	Synteny and duplication analysis	Identification of WGD, tandem, and segmental duplications	[6] [7]
	KaKs_Calculator 2.0	Selection pressure analysis	Calculation of Ka/Ks ratios	[7]
Phylogenetic Software	MAFFT v7	Multiple sequence alignment	Alignment of NBS domain sequences	[6] [10]
	IQ-TREE v1.6.12	Phylogenetic reconstruction	Maximum likelihood tree building with model selection	[6]
	FastTreeMP	Phylogenetic analysis	Fast maximum likelihood approximation	[10]
Functional Validation	Virus-Induced Gene Silencing (VIGS)	Functional characterization	Knockdown of candidate genes in planta	[24] [10]
	RNA-seq Analysis	Expression profiling	Differential expression under pathogen challenge	[24] [43]

Discussion and Future Perspectives

Orthogroup analysis has emerged as a powerful framework for deciphering the complex evolutionary history of NBS-LRR genes and their role in plant immunity. The consistent finding of nTNL dominance across angiosperms, with TNLs showing more restricted distribution, suggests distinct evolutionary trajectories for these two major subfamilies [6] [2] [24]. The prevalence of gene clustering and tandem duplication as mechanisms for NBS-LRR expansion highlights the importance of localized recombination events in generating diversity for pathogen recognition [2].

The integration of orthogroup analysis with functional validation approaches has enabled researchers to move beyond cataloging gene families to identifying specific genetic determinants of disease resistance. The case of Vm019719 in tung tree demonstrates how orthogroup analysis can pinpoint causal genes underlying resistance differences [24]. Similarly, the identification of OG2, OG6, and OG15 as responsive to cotton leaf curl disease provides targets for marker-assisted breeding [10].

Future directions in orthogroup analysis will likely involve more comprehensive sampling across plant lineages, integration with pan-genome analyses, and application to breeding programs. The development of the Angiosperm NLR Atlas (ANNA) containing over 90,000 NLR genes from 304 angiosperm genomes represents a significant step toward comprehensive comparative analysis [10]. As long-read sequencing technologies improve haplotype-resolved assembly of complex cluster regions, as demonstrated in grapevine Rpv3 analysis [44], our understanding of NBS-LRR evolution and function will continue to deepen.

Orthogroup analysis provides both evolutionary insights and practical tools for crop improvement. By identifying core orthogroups conserved across species and lineage-specific innovations, researchers can prioritize targets for functional studies and breeding applications. The continued refinement of orthogroup methodologies will enhance our ability to harness plant immune system diversity for sustainable agricultural production.

Chromosomal Distribution and Gene Cluster Identification

The chromosomal distribution and organization of TIR-NBS-LRR (TNL) genes into gene clusters are fundamental characteristics that enhance our understanding of the evolution of plant disease resistance. TNL genes form one of the largest families of plant resistance (R) genes, encoding intracellular immune receptors that initiate effector-triggered immunity upon pathogen recognition [17] [45]. Comparative genomics across diverse plant species has revealed that TNL genes are frequently distributed non-randomly across chromosomes, often forming dense clusters at specific chromosomal loci [46] [24]. This clustered arrangement facilitates the rapid evolution of new resistance specificities through mechanisms such as tandem duplication, gene conversion, and unequal crossing-over [47] [10]. This guide provides a systematic comparison of TNL chromosomal distribution patterns and cluster identification methodologies across major plant families, offering experimental protocols and analytical frameworks for researchers investigating plant immunity genetics.

Comparative Genomic Distribution of TNL Genes Across Plant Species

Chromosomal Hotspots and Distribution Patterns

Table 1: Comparative Chromosomal Distribution of TNL Genes in Various Plant Species

Plant Species	Family	Total TNL Genes Identified	Primary Chromosomal Locations	Distribution Characteristics	Clustering Threshold	Reference
Rosa chinensis	Rosaceae	96	Multiple chromosomes	Dominantly expressed in leaves; responsive to multiple pathogens	Not specified	[17]
Solanum tuberosum (Potato)	Solanaceae	44 (60 transcripts)	Prominent clusters on Chr1 & Chr11	Differential expression under pathogen attack	<200 kb between genes, â‰¤8 non-NLR genes intervening	[46] [6]
Vernicia montana (Tung tree)	Euphorbiaceae	149	Vmchr2, Vmchr7, Vmchr11	Non-random, clustered distribution; 12 contain TIR domains	Not specified	[24]
Capsicum annuum (Pepper)	Solanaceae	78 CaRGAs total (TIR & non-TIR)	Not specified	Grouped into 7 subfamilies (CaRGAs I-VII)	Not specified	[47]
Nine Solanaceae species	Solanaceae	182 TNL total	Predominantly chromosomal termini	Strong conservation of NBS motifs; scattered distribution	Not specified	[45]
Wild strawberry species	Rosaceae	Varies by species (non-TNLs >50%)	Across all 7 chromosomes	Non-TNLs exceed TNLs in all species; clustered organization	<200 kb separation, max 8 non-NLR intervening genes	[6]

The distribution of TNL genes across plant chromosomes demonstrates significant conservation of organizational patterns within plant families. In Solanaceae species, TNL genes frequently localize to chromosomal terminals, with prominent clusters identified on specific chromosomes [46] [45]. Potato (Solanum tuberosum) exhibits concentrated TNL clusters on chromosomes 1 and 11, with 44 genes encoding 60 different transcripts [46]. This pattern of uneven distribution is similarly observed in Rosaceae species, where Rosa chinensis possesses 96 intact TNL genes distributed across multiple chromosomes with dominant expression in leaf tissues [17].

The non-random distribution pattern extends beyond these families, with Vernicia montana showing TNL enrichment on chromosomes 2, 7, and 11 [24]. Comparative analysis across nine Solanaceae species revealed that whole-genome duplication (WGD) events have played a significant role in the expansion and distribution of NBS-LRR genes, with the most recent whole-genome triplication (WGT) particularly impacting the TNL family [45]. These distribution patterns reflect evolutionary mechanisms that maintain diversity in the plant immune repertoire.

Gene Cluster Identification and Definition

Table 2: Gene Cluster Classification Criteria Across Species

Study System	Cluster Definition	Maximum Intergenic Distance	Maximum Non-NLR Intervening Genes	Number of NLRs Required	Identified Clusters	Reference
Potato & Wild Strawberries	Tandem/segmental duplication clusters	200 kb	8	â‰¥2	Multiple clusters detected	[46] [6]
Solanaceae family	Rearrangement-induced clustering	Not specified	Not specified	Not specified	Contribute to scattered chromosomal distribution	[45]
Vernicia species	Syntenic relationship clusters	Not specified	Not specified	Not specified	Enriched in corresponding genomic regions	[24]

Gene cluster identification employs standardized criteria to ensure comparative analyses across studies. The predominant definition, applied consistently in potato and wild strawberry studies, requires at least two NLR genes located within 200 kilobases of each other, separated by no more than eight non-NLR genes [46] [6]. This clustering pattern results primarily from tandem duplication events, though segmental duplications also contribute to the expansion and distribution of TNL gene families [6].

In Solanaceae species, gene clustering and rearrangement events within the NBS-LRR family contribute significantly to their scattered chromosomal distribution [45]. Similarly, synteny analysis between susceptible Vernicia fordii and resistant V. montana revealed enrichment of NBS-LRR genes in corresponding genomic regions, suggesting that tandem duplications of linked gene families drive resistance gene evolution [24]. These clustering patterns facilitate the coordinated evolution of resistance specificities while maintaining genomic stability.

Experimental Protocols for TNL Identification and Characterization

Genome-Wide Identification Pipeline

Diagram 1: TNL Gene Identification Workflow

The standard workflow for genome-wide TNL identification employs a sequential domain validation approach. The process begins with Hidden Markov Model (HMM) searches using the NB-ARC domain (PF00931) as the initial filter, typically with an e-value cutoff of <1.0 [17] [6] [24]. Candidate sequences then undergo comprehensive domain analysis to verify the presence of all three characteristic domains: TIR (PF01582), NBS (NB-ARC, PF00931), and LRR (multiple PFAM IDs) [17] [46].

Critical to TNL identification is the exclusion of non-TIR NLRs through complementary methods such as MARCOIL with a threshold of 90 to identify and exclude genes containing coiled-coil (CC) domains [46]. The final step involves manual curation and LRR motif verification using consensus sequences (LxxLxLxxN/CxL or LxxLxL, where x denotes any amino acid and L signifies leucine) to ensure domain integrity [46]. This multi-step approach balances sensitivity and specificity in TNL annotation.

Chromosomal Mapping and Cluster Analysis

Diagram 2: Chromosomal Mapping & Cluster Analysis

Chromosomal mapping and cluster analysis employ both automated algorithms and manual validation. Physical mapping begins with extracting positional information from General Feature Format (GFF) files and graphically portraying TNL genes on chromosomes using tools such as PhenoGram or TBtools [46]. Cluster identification applies standardized parameters, where genes located within 200 kb and separated by no more than eight non-NLR genes are classified as clustered [46] [6].

Duplication analysis utilizes MCScanX with all-vs-all BLASTP parameters (E-value 1e-10) to identify tandem and segmental duplication events driving cluster formation [46] [6]. For cross-species comparisons, synteny analysis identifies orthologous chromosomal regions using tools like OrthoFinder with BLAST (E-value=10-3) [48]. These approaches collectively enable researchers to distinguish evolutionarily conserved clusters from species-specific arrangements and infer evolutionary history.

Table 3: Essential Research Reagents and Computational Tools for TNL Studies

Category	Specific Tool/Reagent	Application Purpose	Key Features/Parameters	Reference
Domain Identification	HMMER v3.1+	NB-ARC domain identification	e-value cutoff <1.0, PF00931 model	[17] [6] [24]
Domain Verification	Batch CD-Search (NCBI)	Conserved domain verification	Default parameters, CDD database	[17] [46]
CC Domain Exclusion	MARCOIL	Coiled-coil domain prediction	Threshold: 90	[46]
LRR Validation	Manual curation	LRR motif verification	LxxLxLxxN/CxL consensus	[46]
Cluster Analysis	MCScanX	Gene duplication identification	E-value: 1e-10, BLASTP parameters	[46] [6] [48]
Phylogenetics	IQ-TREE v1.6.12	Phylogenetic tree construction	Ultrafast Bootstrap: 1000 replicates	[17] [6]
Visualization	PhenoGram/TBtools	Chromosomal mapping	Graphical gene positioning	[46]
Expression Validation	qRT-PCR	Expression profiling	Pathogen/inoculation time courses	[17] [46]

This toolkit encompasses the essential bioinformatic and experimental resources required for comprehensive TNL characterization. The computational pipeline relies heavily on domain identification tools (HMMER, CD-Search) coupled with specialized algorithms for distinguishing NLR subtypes (MARCOIL) [17] [46]. For evolutionary analyses, MCScanX and IQ-TREE provide robust solutions for duplication detection and phylogenetic reconstruction [46] [6]. Experimental validation typically employs qRT-PCR with carefully designed time courses post-pathogen inoculation to assess expression dynamics of clustered TNL genes [17] [46]. The integration of these tools enables researchers to move from genome annotation to functional characterization with consistent methodological standards.

The comparative analysis of TNL chromosomal distribution and cluster organization reveals conserved evolutionary patterns across plant families. TNL genes consistently display non-random distribution with strong tendencies toward clustering at specific chromosomal loci, particularly telomeric regions in Solanaceae species [46] [45]. These arrangements are maintained through tandem and segmental duplications, with clustering parameters (200kb maximum separation, â‰¤8 intervening non-NLR genes) providing a standardized framework for cross-species comparisons [46] [6]. The experimental pipelines and analytical tools presented herein offer a systematic approach for investigating these genomic patterns, enabling researchers to elucidate the complex relationship between genome organization and disease resistance functionality. Future studies integrating pan-genomic analyses with functional validation will further refine our understanding of how chromosomal architecture shapes plant immune system evolution.

Within the broader context of TIR-NBS-LRR domain architectures research, understanding the genetic duplication mechanisms that shape these genes is fundamental. Tandem and segmental duplications represent two distinct evolutionary pathways that expand and diversify gene families, including disease-resistant NBS-LRR genes in plants. These duplication patterns produce fundamentally different genomic architectures: tandem duplications create localized clustered gene arrays, while segmental duplications generate interspersed genomic copies across chromosomes or genomes [49]. This guide provides an objective comparison of these mechanisms, supported by current experimental data and analytical methodologies, to inform research in genomics and drug development.

Defining Characteristics and Genomic Architecture

Tandem Duplications

Tandem duplications (TDs) involve the head-to-tail duplication of a chromosomal segment within the same chromosome, leading to a quantitative increase in copy number of the affected segment [50]. The breakpoint junction represents the sole qualitative difference from the parent chromosome, joining the downstream edge of the duplicated element to its upstream edge. In cancer genomes, these events typically exhibit non-homologous breakpoint junctions with minimal sequence complementarity (often <3 nucleotide microhomology) [50].

Segmental Duplications

Segmental duplications (SDs), also termed low-copy repeats, are blocks of DNA ranging from 1 to 400 kilobases in length that occur at multiple genomic sites with >90% sequence identity [49] [51]. These duplications can be intrachromosomal (within the same chromosome) or interchromosomal (between different chromosomes), and they collectively constitute approximately 5-7% of the human genome [52] [49] [53]. SDs are significantly enriched in pericentromeric and subtelomeric regions and are major catalysts of genomic structural variation [53] [51].

Table 1: Fundamental Characteristics of Tandem and Segmental Duplications

Characteristic	Tandem Duplications	Segmental Duplications
Genomic Organization	Head-to-tail, adjacent copies	Interspersed, non-adjacent copies
Typical Size Range	~10 kb to >1 Mb [50]	1 kb - 400 kb [49] [51]
Sequence Identity	N/A (copies are adjacent)	>90% between copies [52] [49]
Breakpoint Features	Non-homologous, microhomology [50]	Flanked by large homologous repeats [51]
Primary Formation Mechanism	Replication-based mechanisms [50]	Non-allelic homologous recombination [53]

Quantitative Comparison and Functional Impact

Distribution and Frequency

Analysis of 170 human genome assemblies reveals that intrachromosomal segmental duplications demonstrate remarkable diversity, with 173.2 Mb of duplicated sequence identified, including 47.4 Mb not present in the telomere-to-telomere reference [52]. The accumulation of novel SDs follows an asymptotic relationship with increasing sample size, with African genomes harboring significantly more intrachromosomal SDsâ€”a pattern consistent with greater genetic diversity [52].

In cancer genomes, tandem duplications display distinct size distribution patterns categorized into three groups: Group 1 (modal size ~11 kb) associated with BRCA1 loss, Group 2 (modal size ~231 kb) linked to CCNE1 amplification, and Group 2/3mix (bimodal, 231 kb and 1.7 Mb) associated with CDK12 loss [50]. This trimodal distribution suggests distinct biological drivers for each TD category.

Functional Consequences in Gene Families

The functional impact of these duplication mechanisms is particularly evident in the expansion of disease-resistant gene families. In the Nicotiana benthamiana genome, researchers identified 156 NBS-LRR homologs through HMMsearch analysis, classifying them into TNL-type (5), CNL-type (25), NL-type (23), TN-type (2), CN-type (41), and N-type (60) proteins [3]. This diversity arises from both tandem and segmental duplication events followed by divergent evolution.

In humans, approximately 50% of all copy number polymorphisms >1 kb map to segmental duplications, representing a tenfold enrichment [52]. Nearly all copy number polymorphic genes in humans localize to these regions, with important implications for human disease. Genes embedded within SDs show strong signatures of positive selection and are 5-10 times more likely to display interspecies and intraspecies structural variation [51].

Table 2: Functional Impact on Gene Families and Organismal Biology

Functional Aspect	Tandem Duplications	Segmental Duplications
Role in Gene Family Expansion	Creates homogeneous arrays; common in NBS-LRR genes [3]	Generates heterogeneous families; enables neofunctionalization [51]
Association with Disease	Oncogene amplification; tumor suppressor disruption in cancer [50]	Genomic disorders via non-allelic homologous recombination [49] [53]
Selection Signature	Frequently under positive selection for rapid adaptation [3]	Strong signatures of positive selection; adaptive evolution [51]
Impact on Gene Content	Can duplicate exons or entire genes [50]	Enriched for genes; creates new genes with novel functions [51]
Example in NBS-LRR Research	N gene cluster expansion in tobacco [3]	Interspersed R-gene distribution across genomic regions

Experimental Detection and Analysis Methodologies

Computational Detection with TARDIS

The TARDIS (Tool for Analysis of Rearrangements and Duplications using Sequencing data) tool employs integrated algorithms to characterize tandem, direct, and inverted interspersed segmental duplications using short-read whole genome sequencing datasets [54]. TARDIS utilizes multiple sequence signatures including read pair, read depth, and split read information to achieve comprehensive detection.

Experimental Protocol: TARDIS Workflow

Input Data: Process short-read whole genome sequencing data (30x coverage recommended)
Read Alignment: Map sequencing reads to reference genome
Signature Extraction:
- Identify discordant read pairs suggesting structural variants
- Analyze read depth for copy number variations
- Detect split reads indicating breakpoint junctions
Cluster Formation: Generate maximal valid clusters of supporting evidence
Variant Calling: Apply likelihood scoring to classify duplication type and orientation
Output: Report precise breakpoints and classify duplication events [54]

In simulation experiments, TARDIS achieved 96% sensitivity with only 4% false discovery rate. Validation using real datasets from CHM1 and CHM13 haploid genomes showed higher accuracy than state-of-the-art methods when compared to orthogonal PacBio call sets [54].

Array-Based Detection of Copy Number Variants

Array comparative genomic hybridization (array CGH) using targeted bacterial artificial chromosome (BAC) microarrays specifically designed for segmental duplication regions enables comprehensive copy-number variation assessment [53].

Experimental Protocol: Segmental Duplication BAC Microarray

Array Design: Select 2,194 BAC clones encompassing 130 predefined rearrangement hotspot regions
Sample Preparation: Extract DNA from target samples and reference source
Labeling and Hybridization:
- Label test and reference DNAs with Cy3 and Cy5 fluorochromes
- Perform duplicate hybridizations with dye-swap to control for dye bias
- Include COT DNA for repetitive sequence suppression
Data Acquisition: Scan arrays and extract fluorescence intensity ratios
Analysis: Identify copy-number polymorphisms (CNPs) as deviations from expected 1:1 ratio [53]

This approach identified 119 regions of copy-number polymorphism in a panel of 47 normal individuals from diverse populations, showing a 4-fold enrichment of CNPs within segmental duplication hotspot regions compared to control regions [53].

Diagram 1: Tandem duplication formation process, often initiated by replication stress.

Diagram 2: Segmental duplication through non-allelic homologous recombination (NAHR).

Table 3: Key Research Reagents and Computational Tools for Duplication Analysis

Resource	Type	Primary Function	Application Context
TARDIS [54]	Computational Tool	Detects various SVs using multiple sequence signatures	Tandem and segmental duplication discovery in WGS data
Segmental Duplication BAC Microarray [53]	Experimental Platform	Array CGH for copy-number polymorphism detection	High-throughput CNP screening in segmental duplication hotspots
PacBio HiFi Sequencing [52]	Sequencing Technology	Long-read sequencing for resolving complex regions	Phasing and assembly of high-identity segmental duplications
HMMsearch (Pfam DB) [3]	Bioinformatics Tool	Protein domain identification and classification	NBS-LRR gene identification and classification
MEME Suite [3]	Bioinformatics Tool	Conserved motif discovery in sequences	Analysis of conserved domains in duplicated NBS-LRR genes

Tandem and segmental duplications represent distinct evolutionary mechanisms with characteristic genomic signatures and functional consequences. Tandem duplications create localized copy-number changes through replication-based mechanisms, while segmental duplications generate interspersed copies via homologous recombination. In TIR-NBS-LRR research, both mechanisms contribute significantly to the expansion and diversification of disease-resistant gene families. The choice of detection methodologyâ€”whether computational tools like TARDIS for sequencing data or array-based approaches for copy-number assessmentâ€”depends on the specific research questions and available resources. Understanding these duplication patterns provides crucial insights into genome evolution, disease mechanisms, and the molecular basis of resistance in plants and immunity in humans.

Overcoming Challenges in TNL Annotation and Functional Prediction

Distinguishing True TNLs from Partial Domains and Pseudogenes

TIR-NBS-LRR (TNL) genes form a major class of intracellular immune receptors in plants that confer specific disease resistance against diverse pathogens. These genes are characterized by a tripartite domain architecture: an N-terminal Toll/Interleukin-1 receptor (TIR) domain, a central nucleotide-binding site (NBS) domain, and C-terminal leucine-rich repeats (LRRs). However, the accurate identification of complete, functional TNL genes is complicated by the presence of partial domains, pseudogenes, and unusual domain integrations in plant genomes. The TNL gene family exhibits remarkable structural diversity, with several atypical architectures identified across plant species, including TIR-NBS (TN) proteins that lack LRR domains and complex integrations of additional domains such as WRKY or heavy metal-associated domains [1] [55]. This guide provides a comprehensive comparison of experimental approaches and diagnostic criteria for distinguishing true, functional TNL genes from partial domains and pseudogenes, supported by current genomic and functional evidence.

Comparative Analysis of TNL Domain Architectures

Canonical and Non-Canonical TNL Structures

Table 1: Classification of TNL and Related Domain Architectures

Architecture Type	Domain Composition	Prevalence	Functional Status
True TNL (Full-length)	TIR-NBS-LRR	All dicot plants	Functional immune receptor
TIR-NBS (TN)	TIR-NBS	Arabidopsis (21 genes)	Potential adaptor/regulator
TNL with Integrated Domains	TIR-NBS-LRR-X (e.g., X=WRKY)	Multiple angiosperms	Functional with expanded recognition
TNL Pseudogenes	Disrupted ORF, missing domains	All species	Non-functional
Species-specific Architectures	TIR-NBS-TIR-Cupin_1, TIR-NBS-Prenyltransf	Limited distribution	Potentially functional

The classical TNL architecture consists of three intact domains: the TIR domain involved in signaling, the NBS domain responsible for nucleotide binding and activation, and the LRR domain that mediates pathogen recognition [1]. However, recent genomic studies have revealed significant structural diversity beyond this canonical arrangement. In Arabidopsis, approximately 21 TIR-NBS (TN) proteins lack the LRR domain entirely, potentially functioning as adaptors or regulators of full-length TNL proteins rather than conventional receptors [1]. Additionally, integrated domain architectures (NLR-IDs), where additional protein domains are fused to TNL proteins, have been identified across multiple plant lineages. These integrated domains often mimic authentic pathogen targets and serve as "baits" for pathogen effectors, expanding the recognition capacity of the immune receptor [55].

Species-specific domain architectures have also been documented, including unusual patterns such as TIR-NBS-TIR-Cupin1-Cupin1 and TIR-NBS-Prenyltransf discovered in comprehensive comparative genomic analyses [10]. These atypical structures highlight the evolutionary innovation in plant immune receptors while complicating the distinction between functional genes and genetic artifacts.

Diagnostic Features for Identifying True TNL Genes

Table 2: Key Diagnostic Criteria for Distinguishing True TNLs

Diagnostic Feature	True TNL Signature	Pseudogene/Partial Signature
Open Reading Frame	Continuous, full-length	Premature stop codons, frameshifts
NBS Conserved Motifs	Intact P-loop, RNBS, Kinase-2, GLPL, MHDV	Disrupted/degenerate motifs
TIR Domain	~175 amino acids with conserved motifs	Truncations, critical residue losses
LRR Domain	Multiple repeats (typically 10-20)	Severely reduced repeat number
Selection Pressure	Purifying selection on NBS, diversifying selection on LRR	Neutral evolution or relaxed selection
Expression Evidence	Detectable transcript levels	No expression or aberrant splicing

True TNL genes maintain several diagnostic structural and evolutionary characteristics. The NBS domain contains strictly ordered conserved motifs including the P-loop (phosphate-binding loop), RNBS (resistance nucleotide binding site) motifs, kinase-2, GLPL, and MHDV motifs, all of which are intact in functional genes [17] [14]. The TIR domain typically spans approximately 175 amino acids with conserved structural motifs, while the LRR domain consists of multiple repeats (often 10-20 units) that form a solvent-exposed surface for molecular interactions [1]. Evolutionary analyses reveal that different domains experience distinct selection pressures: the NBS domain is typically under purifying selection to maintain structural integrity, while the LRR domain shows signatures of diversifying selection consistent with its role in pathogen recognition [1] [6].

In contrast, pseudogenes and partial domains exhibit characteristic disruptions including premature stop codons, frameshift mutations, severely truncated domains, and degenerate conserved motifs. Recent studies in peanut demonstrated that pseudogenization of NBS-LRR genes often involves preferential loss of LRR domains, significantly reducing the receptor's recognition capacity [26]. Expression evidence from transcriptome datasets provides critical functional validation, as true TNL genes typically show detectable expression across multiple tissues or specific induction upon pathogen challenge.

Experimental Approaches for TNL Validation

Genomic Identification and Domain Annotation Protocols

Step 1: Comprehensive Sequence Identification Begin with genome-wide identification of candidate TNL genes using Hidden Markov Model (HMM) searches with Pfam models for TIR (PF01582), NB-ARC (PF00931), and LRR (PF00560, PF07723, PF07725, PF12799) domains. The HMMER software suite (v3.0+) provides robust implementation with typical e-value cutoffs of < 1Ã—10â»Â¹â° for domain detection [10] [14]. For species without dedicated HMMs, iteratively build custom HMMs from initial high-confidence hits (e-value < 1Ã—10â»Â²â°) to improve detection sensitivity.

Step 2: Domain Architecture Validation Apply multiple domain prediction tools to confirm architectural integrity: NCBI's Conserved Domain Database (CDD) for initial domain boundaries, SMART for domain organization validation, and COILS with a threshold of 0.1 for detecting potential coiled-coil regions that might indicate misclassified CNL genes [6] [14]. MEME Suite analysis with maximum motifs set to 20 helps identify conserved motif patterns within each domain [6].

Step 3: Phylogenetic Classification Construct phylogenetic trees using the NB-ARC domain sequences (extracted as 250 amino acids after the P-loop) with Maximum Likelihood methods in IQ-TREE or MEGA6. Include reference TNL sequences from related species to establish orthologous relationships and identify atypical lineages that may represent pseudogenes or unusual architectures [14]. This step helps distinguish true TNL clades from non-TNL sequences that might contain partial NBS domains.

Figure 1: Experimental workflow for comprehensive TNL identification and validation, integrating bioinformatic and functional approaches.

Molecular Validation and Functional Assays

Transcriptional Validation Methods RNA-seq analysis across multiple tissues and stress conditions provides critical evidence for functional TNL genes. Calculate FPKM values to quantify expression levels, with particular attention to genes showing specific induction upon pathogen challenge or hormone treatment [10] [17]. For candidate genes with low expression, conduct reverse transcription PCR (RT-PCR) with primers spanning exon-exon junctions to confirm splicing fidelity and detect potential aberrant transcripts characteristic of pseudogenes.

Functional Verification Approaches Virus-Induced Gene Silencing (VIGS) provides efficient functional validation, as demonstrated in cotton where silencing of GaNBS (OG2) increased susceptibility to cotton leaf curl disease, confirming its functional role in disease resistance [10]. For conclusive validation, implement transgenic complementation assays in susceptible genotypes, expressing candidate TNL genes under native promoters and evaluating complementation of disease resistance phenotypes. Protein-ligand interaction studies using recombinant TNL proteins can verify nucleotide binding capacity (ADP/ATP), while yeast-two-hybrid or co-immunoprecipitation assays test interaction specificity with pathogen effectors or host guardee proteins [10].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents for TNL Gene Characterization

Reagent/Resource	Specifications	Application in TNL Research
HMMER Suite	v3.0+ with Pfam models	Domain-based identification of TNL genes
Pfam Domain Models	PF01582 (TIR), PF00931 (NB-ARC), PF00560 (LRR)	Specific domain annotation
MEME Suite	v5.0+ with maximum motifs=20	Conserved motif discovery within domains
IQ-TREE	v1.6.12+ with ModelFinder	Phylogenetic analysis and evolutionary relationships
RNA-seq Datasets	FPKM values from multiple tissues/stresses	Expression validation and functional clues
VIGS Vectors	TRV-based systems for specific plant species	Functional validation through gene silencing
Co-immunoprecipitation Kits	Commercial kits with compatible antibodies	Protein interaction studies

Evolutionary Perspectives and Technical Challenges

A significant evolutionary consideration in TNL research is their restricted distribution among plant lineages. While TNL genes are present in bryophytes, gymnosperms, and dicots, they are notably absent from most monocots, with exceptions being limited to basal monocot orders [5]. This phylogenetic distribution must be considered when designing identification strategies across different plant families.

Technical challenges in TNL annotation include distinguishing true TNL genes from non-TIR-type NBS-LRR genes (CNLs), which represent a separate evolutionary lineage with distinct signaling pathways [1]. The kinase-2 motif provides a key diagnostic residue for this distinction: TNL sequences typically contain "LLVLDDVD" while CNLs feature "LLVLDDVW" with the final aspartate (D) versus tryptophan (W) being particularly informative [5]. Additionally, the RNBS-A and RNBS-D motifs show distinct consensus patterns between these two classes.

Recent studies have revealed that some plant species contain genes with both TIR and CC domains, challenging the traditional binary classification [26]. These unusual architectures likely result from genetic recombination events and represent natural exceptions to standard domain boundaries. Furthermore, pseudogenization patterns differ among species, with some lineages showing preferential loss of LRR domains while others accumulate frameshift mutations throughout the coding sequence [26].

Figure 2: Evolutionary and mutagenic relationships between TNL genes and related sequences, illustrating pathways to pseudogenization and structural diversification.

Distinguishing true TNL genes from partial domains and pseudogenes requires an integrated approach combining bioinformatic prediction, evolutionary analysis, transcriptional evidence, and functional validation. Canonical TNL architectures maintain intact TIR, NBS, and LRR domains with characteristic conserved motifs, while pseudogenes show disruptive mutations and degenerate sequences. The expanding diversity of integrated domain architectures and species-specific innovations necessitates flexible classification frameworks. The experimental protocols and diagnostic criteria presented here provide a systematic foundation for accurate TNL annotation, supporting future efforts in plant immunity research and disease resistance breeding. As genomic resources expand across diverse plant lineages, these approaches will enable more comprehensive understanding of TNL evolution and function in plant-pathogen interactions.

Handling N-Terminal Domain Diversity and Structural Variability

Comparative Genomic Distribution and Diversity of TNL and nTNL Genes

Table 1: Genomic Distribution of NBS-LRR Genes Across Plant Species

Plant Species	Total NBS-LRR Genes	TNL Genes	nTNL (CNL/RNL/NL) Genes	Key Genomic Features	Citation
Capsicum annuum (Pepper)	252	4 (1.6%)	248 (98.4%)	54% of genes form 47 clusters; uneven chromosomal distribution.	[2]
Nicotiana benthamiana (Tobacco)	156	5 (3.2%)	151 (96.8%)	0.25% of all annotated genes; classified into 6 structural types.	[3]
Vernicia montana (Tung tree)	149	12 (8.1%)	137 (91.9%)	Genes clustered on chromosomes 2, 7, and 11; contains unique CC-TIR-NBS class.	[24]
Vernicia fordii (Tung tree)	90	0 (0%)	90 (100%)	Complete absence of TNL class; LRR1 and LRR4 domains lost.	[24]
Fragaria spp. (Wild Strawberries)	Varies by species	<50% in all species	>50% in all species	Non-TNLs show dominant expression and are under positive selection.	[6]
Solanum tuberosum (Potato)	60 TNL transcripts	60 (TNL only)	Not specified	TNLs clustered on chromosomes 1 and 11.	[56]

The genomic distribution of NBS-LRR genes reveals significant diversity in the composition of TNL and nTNL subfamilies across plant species. A prominent feature is the dominance of the nTNL subfamily over TNLs in many eudicots. For instance, in pepper, nTNLs constitute 98.4% of the identified NBS-LRR genes, while TNLs represent a mere 1.6% [2]. A similar disparity is observed in tobacco, where nTNLs make up 96.8% of the family [3]. This trend extends to wild strawberries, where non-TNLs constitute over 50% of the NLR gene family in all eight diploid species studied [6].

A key genomic mechanism driving this diversity is the formation of gene clusters via tandem duplications. In pepper, 54% of the 252 NBS-LRR genes are organized into 47 such clusters, which are considered hotspots for the evolution of new resistance specificities [2]. These clusters, along with genomic rearrangements, underscore the dynamic evolution of resistance genes and contribute to their uneven distribution across chromosomes, as also seen in potato and tung tree [2] [56] [24].

Furthermore, comparative analysis between resistant (Vernicia montana) and susceptible (Vernicia fordii) tung tree cultivars highlights that the complete loss of TNL genes, as observed in the susceptible V. fordii, may be linked to differences in disease resistance [24].

Structural Diversity and Conserved Motifs in NBS-LRR Proteins

Table 2: Conserved Motifs and Functional Domains in NBS-LRR Proteins

Protein Domain / Motif	Consensus Sequence / Key Feature	Primary Function	Subfamily Specificity	Citation
N-terminal TIR	Less than 40% identity among domains in a genome	Enzyme producing immune signals; initiates defense signaling	Specific to TNLs	[57]
N-terminal CC	Coiled-coil structure predicted by COILS	Protein-protein interactions	Specific to CNLs (a class of nTNLs)	[2] [6]
NBS / NB-ARC	Central nucleotide-binding domain	ATP/GTP binding and hydrolysis; energy provision for signaling	Universal in NBS-LRRs	[2] [24]
P-loop (kin1)	GxGKTT/S (e.g., GIGKTT)	Phosphate binding during nucleotide hydrolysis	Universal; slight sequence variation	[2]
RNBS-A	V/LxxVxxV/C... (non-TIR), RWKK... (TIR)	Structural stability and function	Divergent between TNL and nTNL	[2]
Kinase-2	K/RGPRxLVLVLDDVW...	Catalytic function	Universal; highly conserved	[2]
RNBS-C	LxLxTRxELxY...	Structural stability	Universal	[2]
GLPL	CxGLPLA	Structural stability; membrane association	Universal	[2]
C-terminal LRR	LxxLxLxxN/CxL consensus	Pathogen recognition specificity; protein interactions	Universal; highly variable	[2] [56] [24]

The NBS-LRR proteins are defined by their modular domain architecture, which correlates with their distinct functions in pathogen sensing and immune signaling. The major subfamilies are defined by their N-terminal domains: the Toll/Interleukin-1 receptor (TIR) domain in TNLs and the coiled-coil (CC) domain in a major class of nTNLs known as CNLs [58] [24]. The TIR domain itself is highly diverse, sharing less than 40% identity among members within the Arabidopsis thaliana genome, and functions as an enzyme to produce diverse small molecule immune signals [57].

The central Nucleotide-Binding Site (NBS or NB-ARC) domain is the engine of the protein. It contains several highly conserved motifs, including the P-loop (involved in phosphate binding), Kinase-2, and GLPL motifs, which are essential for ATP/GTP binding, hydrolysis, and resistance signaling [2]. While these motifs are universal, subfamily-specific differences exist, such as in the RNBS-A motif, which has distinct consensus sequences in TNL and nTNL proteins [2].

The C-terminal Leucine-Rich Repeat (LRR) domain is the most variable region and is crucial for determining pathogen recognition specificity through protein-ligand and protein-protein interactions [2] [24]. The loss of specific LRR domains (e.g., LRR1 and LRR4 in susceptible tung trees) can be a critical evolutionary event affecting resistance profiles [24].

Evolutionary Dynamics and Selective Pressure

The evolutionary trajectories of TNL and nTNL genes are shaped by different selective pressures, leading to their distinct patterns of diversification. A study on eight diploid wild strawberry species revealed that a significantly higher number of non-TNLs were under positive selection compared to TNLs, indicating their rapid diversification [6]. This rapid evolution is likely a response to changing pathogenic pressures.

Gene duplication events, particularly tandem duplications, are a primary force for the expansion and creation of new resistance specificities. A large-scale comparative analysis identified 603 orthogroups of NBS-domain genes across 34 plant species, with evidence of tandem duplications creating core and unique evolutionary lineages [29]. These duplications often lead to the formation of gene clusters, as seen in pepper and potato [2] [56].

Another key evolutionary phenomenon is the lineage-specific loss of TNL genes. While TNLs are generally present in dicots and absent in monocots, losses have been documented in some eudicot species. For example, the susceptible tung tree Vernicia fordii has completely lost its TNL genes, whereas its resistant counterpart, V. montana, has retained 12 [24]. This finding aligns with broader comparative analyses that identified the loss of TNLs not only in the Poaceae family of monocots but also in the dicot Mimulus guttatus, suggesting species-specific TNL loss occurs across flowering plants [59].

Experimental Protocols for Functional Characterization

Genome-Wide Identification and Annotation of NBS-LRR Genes

Protocol 1: Identification and Classification Pipeline

Data Retrieval: Obtain the complete proteome and genome annotation (GFF/GFF3 file) for the target species from a dedicated database (e.g., Genome Database for Rosaceae for strawberries, PGSC for potato) [6] [56].
Initial HMM Search: Use HMMER software (e.g., hmmsearch) with the NB-ARC (PF00931) Hidden Markov Model (HMM) from the Pfam database against the proteome. An E-value cutoff of < 1*10^-20 is typically used for high-confidence identification [6] [3] [24].
Domain Verification: Subject all candidate sequences to further domain analysis using:
- Pfam / SMART / CDD (NCBI): To confirm the presence of NBS, TIR (PF01582), LRR (multiple Pfam IDs), and RPW8 (PF05659) domains [6] [3].
- COILS program: To predict the presence of coiled-coil (CC) domains with a threshold of 0.1 [6].
Classification: Classify genes into subfamilies (TNL, CNL, RNL, NL, TN, CN, N) based on the presence or absence of TIR, CC, and LRR domains [3] [24].

Functional Validation Using Virus-Induced Gene Silencing (VIGS)

Protocol 2: Functional Analysis via VIGS

Candidate Gene Selection: Select target NBS-LRR genes based on expression profiling (e.g., RNA-seq under pathogen stress) or phylogenetic analysis [29] [24].
VIGS Construct Preparation: Clone a 200-300 bp fragment specific to the candidate gene into a VIGS vector (e.g., TRV-based pYL280) [24].
Agrobacterium Transformation: Introduce the constructed plasmid into an Agrobacterium tumefaciens strain (e.g., GV3101).
Plant Infiltration: Grow plants (e.g., N. benthamiana, resistant tung tree) to the 4-6 leaf stage. Infiltrate the leaves with the Agrobacterium suspension carrying the VIGS construct. A control group should be infiltrated with an empty vector [24].
Pathogen Challenge: After a period of silencing (e.g., 2-3 weeks), inoculate the silenced plants with the target pathogen (e.g., Fusarium wilt, Alternaria solani). Use mock-inoculated plants as a control [56] [24].
Phenotypic and Molecular Assessment:
- Monitor and score disease symptoms over time.
- Measure pathogen biomass in plant tissue using pathogen-specific primers via qPCR.
- Verify silencing efficiency of the target gene via qRT-PCR.
- Assess downstream defense markers, such as Reactive Oxygen Species (ROS) production [56] [24].

Key Research Reagent Solutions for NBS-LRR Studies

Table 3: Essential Research Reagents and Resources

Reagent / Resource	Specifications / Example Sources	Primary Application in Research	Citation
HMM Profiles	NB-ARC (PF00931), TIR (PF01582), LRR from Pfam Database	In-silico identification and classification of NBS-LRR genes.	[6] [3]
Full Genome & Annotation	GFF/GFF3 files from species-specific databases (e.g., GDR, PGSC, Sol Genomics Network)	Genomic localization, gene structure analysis, and synteny mapping.	[6] [3] [56]
Pathogen Strains	Alternaria solani (e.g., MTCC-10690), Fusarium wilt, Dickeya dadantii (Ech36)	Functional challenge experiments to study resistance response.	[43] [56] [24]
VIGS Vectors	Tobacco Rattle Virus (TRV)-based system (e.g., pYL280)	Functional characterization through transient gene silencing.	[29] [24]
*Agrobacterium* Strains	A. tumefaciens GV3101	Delivery of VIGS constructs or heterologous gene expression in plants.	[24]
qRT-PCR Assays	Species-specific primers, SYBR Green chemistry, reference genes (e.g., Actin, Ubiquitin)	Gene expression profiling and silencing efficiency validation.	[56] [24]

Addressing Species-Specific Annotations in Non-Model Plants

Plant nucleotide-binding site leucine-rich repeat (NBS-LRR) genes represent the largest family of disease resistance (R) genes in plants, playing crucial roles in pathogen detection and defense activation [42] [60]. These genes are characterized by a conserved NBS domain and variable LRR domains, with classification primarily based on N-terminal domains: Toll/interleukin-1 receptor (TIR), coiled-coil (CC), or resistance to powdery mildew8 (RPW8) [3]. The TIR-NBS-LRR (TNL) subclass is particularly important for effector-triggered immunity but exhibits remarkable species-specific distribution patterns across plant lineages [5] [61].

Accurate annotation of these genes in non-model plants presents significant challenges due to their dramatic diversification, lineage-specific expansions and losses, and substantial structural variation [62] [60]. This guide provides a comprehensive comparison of TNL domain architectures across species and details experimental approaches for their characterization in non-model systems, addressing the critical need for standardized methodologies in this rapidly evolving field.

Comparative Analysis of TNL Distribution and Domain Architectures

Taxonomic Distribution Patterns

Table 1: Distribution of TNL Genes Across Major Plant Lineages

Plant Category	Representative Species	TNL Presence	Key Characteristics	Supporting Evidence
Bryophytes	Physcomitrella patens	Limited (~25 NLRs total)	Small NLR repertoires	[62]
Gymnosperms	Cycas revoluta	Present	Both TIR and non-TIR sequences	[5] [61]
Basal Angiosperms	Amborella trichopoda, Nuphar advena	Present	TIR-type sequences confirmed	[5] [61]
Eudicots	Arabidopsis thaliana, Wild strawberries	Abundant	229 TNLs in peanut; varying proportions in strawberries	[63] [6]
Monocots	Grasses (Poaceae), Musa spp.	Absent or rare	Only non-TIR sequences detected	[5] [61]
Magnoliids	Persea americana	Absent	Only non-TIR sequences found	[5]

The distribution of TNL genes across plant lineages reveals significant evolutionary patterns. Research indicates that TNLs are completely absent from monocot species, based on evidence from five monocot orders (Poales, Zingiberales, Arecales, Asparagales, and Alismatales) [5] [61]. This absence persists despite their presence in basal angiosperms like Amborella trichopoda, suggesting substantial gene loss in monocot and magnoliid lineages [5] [61].

In contrast, dicot species typically maintain substantial TNL complements. Wild strawberries (Fragaria spp.) show significant variation in TNL proportions between species, with F. vesca possessing the lowest proportion among eight diploid wild species studied [6]. Cultivated peanut (Arachis hypogaea) contains 229 TNL genes, representing a substantial portion of its NBS-LRR repertoire [63].

Domain Architecture Diversity

Table 2: Domain Architecture Variants in NBS-LRR Genes

Architecture Type	Domain Structure	Representative Species	Frequency	Remarks
Typical TNL	TIR-NBS-LRR	Most dicots	Common	Classical structure
Typical CNL	CC-NBS-LRR	All angiosperms	Common	Classical structure
Truncated TN	TIR-NBS	Arabidopsis thaliana	Less common	21 TN proteins in Arabidopsis
Truncated CN	CC-NBS	Arabidopsis thaliana	Less common	5 CN proteins in Arabidopsis
Atypical Fusion	TIR-NBS-TIR-Cupin1-Cupin1	Across 34 species	Rare	Species-specific pattern
Atypical Fusion	TIR-NBS-Prenyltransf	Across 34 species	Rare	Species-specific pattern
Atypical Fusion	NBS-WRKY	Arachis hypogaea	Rare	Potential role in stress response
Dual Domain	TIR-CC-NBS-LRR	Arachis hypogaea	Rare	26 sequences in cultivated peanut

Comprehensive analyses across 34 plant species have identified 168 distinct classes of NBS domain architectures, revealing both classical patterns and numerous species-specific structural variations [62]. These include not only standard TNL and CNL configurations but also unconventional domain combinations such as TIR-NBS-TIR-Cupin1-Cupin1 and TIR-NBS-Prenyltransf [62].

Notably, some species exhibit unusual fusion proteins that may confer specialized functions. Cultivated peanut possesses 26 NBS-LRR sequences containing both TIR and CC domains, a combination not observed in its diploid ancestors (A. duranensis and A. ipaensis), suggesting these fusions arose after tetraploidization [63]. Similarly, NBS-WRKY fusion proteins, potentially involved in response to biotic stress, have been identified in A. hypogaea and other legumes [63].

Experimental Protocols for TNL Characterization

Genome-Wide Identification Pipeline

Figure 1: Workflow for genome-wide identification of NBS-LRR genes.

Protocol 1: Identification of NBS-LRR Genes

Data Acquisition: Obtain complete genome sequences and annotation files from appropriate databases (NCBI, Phytozome, Plaza, or species-specific resources) [62] [6].
HMMER Search: Conduct domain searches using HMMER v3.1 with the NB-ARC domain model (PF00931) as query, applying an e-value cutoff of < 1e-20 for initial identification [62] [3] [6].
- Command example: hmmsearch --domtblout output.txt PF00931.hmm protein_fasta.fa
Domain Validation: Verify identified sequences through multiple domain databases:
- Pfam database (http://pfam.sanger.ac.uk/) [3]
- SMART tool (http://smart.embl-heidelberg.de/) [3]
- Conserved Domain Database (https://www.ncbi.nlm.nih.gov/Structure/cdd/) [3] [6]
Classification: Categorize validated genes into subfamilies based on presence of specific domains:
- TIR domain (PF01582) [6]
- CC domain (predicted by COILS with threshold 0.1) [6]
- LRR domains (multiple Pfam models) [6]
- RPW8 domain (PF05659) [3] [6]
Manual Curation: Remove redundant entries and verify domain architecture through manual inspection [3].

Evolutionary and Expression Analysis

Protocol 2: Evolutionary and Functional Characterization

Phylogenetic Analysis:
- Perform multiple sequence alignment using MAFFT v7 or Clustal W with default parameters [62] [3]
- Construct maximum likelihood trees with IQ-TREE v1.6.12 or MEGA7 [3] [6]
- Assess branch support with 1000 bootstrap replicates [3] [6]
Orthogroup Analysis:
- Identify orthogroups using OrthoFinder v2.5.1 with DIAMOND for sequence similarity and MCL for clustering [62]
- Analyze gene duplication patterns (tandem, segmental) using MCScanX [6]
Expression Profiling:
- Retrieve RNA-seq data from relevant databases (IPF, CottonFGD, NCBI BioProjects) [62]
- Process data through transcriptomic pipelines to obtain FPKM values [62]
- Categorize expression patterns into tissue-specific, abiotic stress, and biotic stress responses [62]
Genetic Variation Analysis:
- Identify variants between susceptible and tolerant accessions [62]
- Correlate specific variants with resistance phenotypes [62]

Figure 2: Comprehensive characterization workflow for NBS-LRR genes.

Table 3: Key Research Reagent Solutions for TNL Studies

Category	Specific Tool/Reagent	Application	Technical Notes
Domain Databases	Pfam (PF00931, PF01582, PF05659)	Domain identification & verification	Curated protein family database [3]
HMM Tools	HMMER v3.1	Initial gene identification	Use e-value cutoff 1e-20 [3] [6]
Classification Tools	COILS program	CC domain prediction	Threshold 0.1 [6]
Multiple Alignments	MAFFT v7, Clustal W	Sequence alignment for phylogenetics	Default parameters [62] [3]
Phylogenetics	IQ-TREE v1.6.12, MEGA7	Phylogenetic tree construction	1000 bootstrap replicates [3] [6]
Orthology Analysis	OrthoFinder v2.5.1	Orthogroup identification	Uses DIAMOND + MCL [62]
Expression Analysis	RNA-seq pipelines	Expression profiling	FPKM normalization [62]
Functional Validation	VIGS (Virus-Induced Gene Silencing)	Functional characterization	Used for validating NBS gene function [62]
Genomic Databases	NCBI, Phytozome, Plaza, GDR	Data retrieval	Species-specific databases recommended [62] [6]

Discussion and Future Perspectives

The comparative analysis of TNL genes across plant species reveals both conserved features and remarkable lineage-specific adaptations. The complete absence of TNL genes in monocots, despite their presence in basal angiosperms, represents one of the most significant evolutionary patterns in plant immune gene evolution [5] [61]. This distribution suggests either independent losses in multiple lineages or rapid diversification in dicot lineages.

The extensive diversity in domain architectures, particularly the species-specific fusion proteins observed across multiple taxa, highlights the dynamic nature of these genes and their continuous evolution in response to pathogen pressure [62] [63]. The discovery of TIR-CC-NBS-LRR fusion proteins in cultivated peanut that are absent from its diploid progenitors demonstrates how polyploidization can generate novel domain combinations with potential functional significance [63].

Standardized annotation protocols are particularly crucial for non-model plants, where automated annotation pipelines frequently misannotate complex NBS-LRR genes due to their size, complexity, and sequence diversity. The integration of multiple complementary approachesâ€”HMM-based identification, phylogenetic analysis, orthogroup clustering, and expression profilingâ€”provides a robust framework for accurate gene characterization across diverse species [62] [3] [6].

Future research directions should include more comprehensive sampling of basal angiosperms and gymnosperms to better resolve the evolutionary history of TNL genes, functional characterization of unconventional domain architectures, and investigation of the regulatory mechanisms controlling TNL expression in different phylogenetic contexts. The continued development of specialized databases and annotation tools will be essential for addressing the challenges of species-specific annotations in non-model plants.

Optimizing Parameters for Domain Prediction and Motif Detection

In plant genomics, accurately identifying resistance (R) genes is crucial for understanding plant immunity and developing disease-resistant crops. Among these, nucleotide-binding site leucine-rich repeat (NBS-LRR) genes constitute the largest and most functionally important class, with their Toll/interleukin-1 receptor (TIR) variants playing specialized roles in pathogen recognition and defense signaling. The detection of these genes relies heavily on optimized bioinformatic parameters for domain prediction and motif discovery, yet researchers face significant challenges in selecting appropriate tools and configuration settings. This comparison guide provides an objective evaluation of current methodologies, computational tools, and experimental protocols to establish best practices for reliable identification and characterization of TIR-NBS-LRR domain architectures, enabling more efficient discovery of plant resistance genes.

Comparative Analysis of Prediction Tools and Methods

Domain Prediction Tools and Performance Metrics

Table 1: Comparison of Domain Prediction Tools for NBS-LRR Gene Identification

Tool Name	Methodology	Key Parameters	Reported Accuracy	Strengths	Limitations
PRGminer	Deep learning (CNN)	Dipeptide composition; Two-phase classification	98.75% (training), 95.72% (independent testing) [64]	High accuracy with MCC 0.98; Classifies into 8 R-gene classes [64]	Black box nature limits interpretability
HMMER3	Hidden Markov Models	E-value cutoff (<1*10^{-20}); PF00931 (NB-ARC) model [10] [3]	Varies by dataset and parameters	Statistical rigor; Well-established benchmarks	Performance drops with low homology [64]
PfamScan	HMM-based search	Default e-value (1.1e-50); Pfam-A_hmm model [10]	Dependent on domain library completeness	Comprehensive domain database	Limited to known domain architectures
NCBI CDD	Conservation-based	Default parameters; Domain validation [65]	High specificity for known domains	Integrates multiple domain resources	May miss novel domain combinations

Motif Detection and Structural Analysis Tools

Table 2: Motif Detection and Structural Analysis Tools

Tool	Function	Key Parameters	Typical Output
MEME Suite	Motif discovery	motif count: 10; width: 6-50 amino acids [3]	Conserved motif patterns
COILS	Coiled-coil prediction	Threshold: 0.1 [6]	CC domain probability
SMART	Domain architecture	E-value < 0.01; Domain validation [3]	Comprehensive domain maps
InterProScan	Integrated search	Default parameters; Multiple databases [64]	Combined domain signatures

Experimental Protocols for Domain Identification

Standardized Workflow for NBS-LRR Gene Identification

The following experimental protocol synthesizes methodologies from multiple recent studies to provide a robust pipeline for identifying and characterizing NBS-LRR genes, with emphasis on parameter optimization for domain prediction and motif detection.

Step 1: Initial Sequence Identification

Retrieve protein sequences from databases (Phytozome, Ensemble Plants, NCBI) [64]
Perform HMMER searches using NB-ARC domain (PF00931) with E-value cutoffs ranging from <110^{-20} to <110^{-2} depending on required stringency [10] [3] [6]
For deep learning approaches, use PRGminer with dipeptide composition encoding [64]

Step 2: Domain Architecture Classification

Confirm N-terminal domains (TIR, CC, RPW8) using PfamScan (TIR: PF01582) and COILS (threshold 0.1) [6]
Validate LRR domains using multiple PFAM models (PF00560, PF07723, PF07725, PF12799, PF13306, PF13516, PF13855, PF14580) [6]
Classify genes into structural types (TNL, CNL, RNL, TN, CN, N, NL) based on domain composition [3] [65]

Step 3: Motif Discovery and Validation

Extract NBS domain sequences and submit to MEME Suite for motif discovery
Set motif count to 10 with width range of 6-50 amino acids [3]
Validate conserved motifs (P-loop, RNBS-A, kinase-2, RNBS-B, RNBS-C, GLPL) against known profiles [16]

Step 4: Evolutionary and Structural Analysis

Perform multiple sequence alignment using MAFFT or ClustalW [10] [3]
Construct phylogenetic trees using Maximum Likelihood method with 1000 bootstrap replicates [3]
Identify gene clusters using physical proximity criteria (genes within 200kb separated by â‰¤8 non-NLR genes) [6]

Figure 1: Workflow for comprehensive NBS-LRR gene identification and classification, integrating domain prediction and motif detection steps with optimized parameters.

Parameter Optimization Strategies

Based on comparative analysis of multiple studies, optimal parameters for domain prediction vary by taxonomic group and specific research goals. For strict identification of NBS domains, HMMER with E-value <110^{-20} provides high specificity [3], while broader searches for evolutionary studies may use E-value <110^{-2} [6]. For motif detection, setting the motif count to 10 with variable width (6-50 amino acids) effectively captures conserved regions without excessive redundancy [3].

Deep learning approaches like PRGminer achieve highest accuracy with dipeptide composition encoding, achieving Matthews correlation coefficient of 0.98 in training and 0.91 in independent testing [64]. For coiled-coil domain prediction, a threshold of 0.1 in COILS provides optimal balance between sensitivity and specificity [6].

Table 3: Key Research Reagent Solutions for NBS-LRR Studies

Category	Specific Resource	Function/Application	Key Features
Database Resources	Pfam (PF00931)	NBS domain model	Curated HMM profiles for NB-ARC domain [10]
	ANNA: Angiosperm NLR Atlas	Comparative genomics	90,000+ NLR genes from 304 angiosperm genomes [10]
Software Tools	PRGminer webserver	R-gene prediction/classification	Deep learning-based; 8-class categorization [64]
	OrthoFinder v2.5.1	Evolutionary analysis	Orthogroup inference; Gene duplication analysis [10]
Experimental Validation	VIGS (VIGS)	Functional characterization	Virus-induced gene silencing for gene function testing [10]
	qRT-PCR	Expression validation	Confirm differential expression of candidate NLR genes [66]

Data Interpretation and Analysis Frameworks

Classification Systems for NBS-LRR Genes

The domain architecture of NBS-LRR genes follows specific classification schemes based on domain composition. Current systems categorize these genes into eight main classes: CC-NBS (CN), CC-NBS-LRR (CNL), NBS (N), NBS-LRR (NL), RPW8-NBS (RN), RPW8-NBS-LRR (RNL), TIR-NBS (TN), and TIR-NBS-LRR (TNL) [65]. The distribution of these classes varies significantly across plant species, with CN-type and N-type generally more prevalent than TNL-type genes [66] [65].

Studies across multiple species reveal consistent patterns in genomic distribution. NBS-LRR genes frequently organize in clusters, with reported clustering percentages ranging from 54% in pepper [16] to over 83% in sweet potato [66]. These clusters predominantly form through tandem duplication events, facilitating rapid evolution and functional diversification in response to pathogen pressure.

Figure 2: Hierarchical classification system for plant NBS-LRR resistance genes based on domain architecture, showing main categories and subtypes.

Evolutionary Analysis Parameters

Selective pressure analysis using Ka/Ks ratios provides insights into evolutionary dynamics. Non-synonymous (Ka) to synonymous (Ks) substitution rates help identify genes under positive selection. Studies in wild strawberries revealed significantly higher numbers of non-TNLs under positive selection compared to TNLs, indicating their rapid diversification [6]. Calculation of these rates typically employs KaKs_Calculator 2.0 with evolutionary models such as Nei-Gojobori (NG) [65].

Gene duplication analysis requires specific parameters for identifying duplication types. Tandem duplications are defined as closely related genes located within 200kb regions [6], while segmental duplications are identified through synteny analysis using tools like MCScanX [66] [65]. These analyses reveal lineage-specific expansion patterns, with most plant genomes showing predominance of either tandem or segmental duplications depending on species.

Optimizing parameters for domain prediction and motif detection in TIR-NBS-LRR research requires careful consideration of taxonomic context and research objectives. Integration of traditional HMM-based approaches with emerging deep learning methods like PRGminer provides complementary advantages for comprehensive gene identification. Standardized workflows incorporating optimized e-value thresholds, motif detection parameters, and evolutionary analysis frameworks enable more accurate and reproducible characterization of plant resistance gene architectures. As genomic data continues to expand, these parameter optimization strategies will play an increasingly critical role in elucidating the complex evolutionary dynamics and functional diversity of plant immune receptors.

Integrating Multi-Omics Data for Improved Functional Annotation

Plant innate immunity frequently relies on a sophisticated surveillance system governed by intracellular nucleotide-binding site leucine-rich repeat (NLR) proteins. Among these, TIR-NBS-LRR (TNL) proteins represent a major subclass characterized by an N-terminal Toll/Interleukin-1 receptor (TIR) domain, which is exclusively present in dicotyledonous plants [6] [1]. These proteins function as essential immune receptors that detect pathogen effectors and activate effector-triggered immunity (ETI), often culminating in a hypersensitive response to restrict pathogen spread [17] [1]. The accurate functional annotation of TNL genes is paramount for understanding plant defense mechanisms and advancing molecular breeding strategies for disease-resistant crops.

Traditional genome annotation methods often struggle with the complex genomic architecture of NLR genes, which are frequently clustered, exhibit high sequence diversity, and undergo rapid evolution [1]. Multi-omics approachesâ€”integrating genomic, transcriptomic, proteomic, and metabolomic dataâ€”are revolutionizing functional annotation by providing complementary evidence layers that resolve gene models, verify expression, characterize protein functions, and elucidate metabolic consequences of immune activation [10] [67]. This guide objectively compares the performance of various multi-omics integration strategies for TNL functional annotation, providing experimental data and methodologies to inform research decisions.

Comparative Analysis of Multi-Omics Approaches for TNL Characterization

Genomic and Phylogenetic Frameworks

Table 1: Genomic Identification and Phylogenetic Analysis of TNL Genes Across Plant Species

Plant Species	Total TNL Genes Identified	Genome-Wide Identification Method	Phylogenetic Grouping	Key Conserved Domains	Reference
Rosa chinensis (Rose)	96	BLAST + HMMER (TIR: PF01582, NB-ARC: PF00931)	Not specified	TIR, NBS, LRR	[17]
Wild Strawberry (Fragaria spp.)	Varies across 8 diploid species	HMMER v3.1 (NB-ARC: PF00931) + CD-search	TNLs diverged into two subclades	TIR, NBS, LRR	[6]
Arabidopsis thaliana	~150 total NLR genes	Orthology-based clustering	8 TNL subfamilies	TIR, NBS, LRR	[1]
Sugarcane	TIR-only and TPK genes identified	DaapNLRSeek pipeline	Paired NLRs identified	TIR, NBS, LRR	[68]
Passion fruit (Passiflora edulis Sims.)	25 CNL genes	BLASTp + domain verification	3 phylogenetic groups	CC, NBS, LRR	[69]

Experimental Protocol for Genomic Identification:

Sequence Acquisition: Obtain complete genome sequences and annotation files from relevant databases (e.g., Genome Database for Rosaceae, Ensembl Plants) [6] [17].
Domain Searching: Perform HMMER searches with NB-ARC (PF00931) and TIR (PF01582) domain profiles against proteomes using an e-value cutoff of <1.0 [6] [17].
Complementary BLAST: Conduct BLASTP searches with known TNL sequences as queries (expectation value â‰¤1e-2) [6].
Domain Verification: Verify domain architecture using CD-search tool and SMART database [6].
Coiled-Coil Prediction: Predict CC domains using COILS with threshold 0.1 [6].
Phylogenetic Construction: Perform multiple sequence alignment with MAFFT, trim with TrimAl, and construct Maximum Likelihood trees using IQ-TREE with 1000 ultrafast bootstraps [6].

Transcriptomic and Expression Profiling

Table 2: Transcriptomic Approaches for TNL Functional Annotation

Plant System	Experimental Conditions	Technology Platform	Key TNL Expression Findings	Regulatory Elements Identified	Reference
Rosa chinensis	Hormones (GA, JA, SA), Pathogens (B. cinerea, P. pannosa, M. rosae)	RNA-seq	RcTNL23 significantly upregulated under all treatments	Promoter cis-elements for hormones and stress	[17]
Sweetpotato (Ipomoea batatas)	Dickeya dadantii infection at four time points	RNA-seq	Identification of R and transcription factor genes	Not specified	[43]
Potato (Solanum tuberosum)	BABA-induced resistance to Phytophthora infestans	Microarray + proteomics	PR proteins accumulation, sesquiterpene phytoalexin biosynthesis	GO terms for hormone processes	[70]
Cotton (Gossypium hirsutum)	Cotton leaf curl disease (CLCuD)	RNA-seq (FPKM analysis)	OG2, OG6, OG15 upregulated in resistant accession	Not specified	[10]
Passion fruit	Cucumber mosaic virus and cold stress	RNA-seq	PeCNL3, PeCNL13, PeCNL14 differentially expressed	cis-elements for stress response	[69]

Experimental Protocol for Transcriptomic Analysis:

Treatment Design: Apply biotic/abiotic stresses with appropriate controls and biological replicates (typically â‰¥3) [17] [43].
RNA Extraction: Isolate RNA using commercial kits (e.g., RNeasy Mini Kit), verify concentration/purity (NanDrop 260/280>1.8), and check integrity [70].
Library Preparation and Sequencing: Prepare stranded mRNA-seq libraries and sequence on Illumina platforms [43].
Bioinformatic Analysis:
- Quality control (FastQC) and adapter trimming (Trimmomatic)
- Read alignment to reference genome (HISAT2/STAR)
- Transcript assembly and quantification (StringTie)
- Differential expression analysis (DESeq2/edgeR) [43]
Validation: Confirm key expression patterns via qRT-PCR with housekeeping genes for normalization [17].

Proteomic and Metabolomic Integration

Experimental Protocol for Apoplastic Proteomics:

Apoplast Fluid Extraction: Infiltrate leaves with extraction buffer, centrifuge, and collect apoplastic washing fluid [70].
Protein Preparation: Concentrate proteins, quantify, and digest with trypsin.
LC-MS/MS Analysis: Perform liquid chromatography-tandem mass spectrometry with label-free quantification.
Protein Identification: Search spectra against protein databases using Sequest or similar algorithms.
Functional Annotation: Conduct GO enrichment analysis and pathway mapping [70].

Experimental Protocol for Metabolomic Analysis:

Metabolite Extraction: Use methanol/water or acetonitrile-based extraction from frozen tissue.
LC-MS Profiling: Analyze using reversed-phase chromatography coupled to high-resolution mass spectrometry.
Data Processing: Perform peak detection, alignment, and annotation using XCMS, CAMERA, and in-house databases.
Statistical Analysis: Apply multivariate statistics (PCA, PLS-DA) to identify differentially accumulated metabolites [67].

Visualization of Multi-Omics Integration for TNL Annotation

Multi-Omics Workflow for TNL Functional Annotation

NLR-Mediated Immune Signaling Network

Table 3: Key Research Reagent Solutions for TNL Functional Studies

Reagent/Resource Category	Specific Examples	Function/Application	Experimental Validation
Bioinformatics Tools	HMMER v3.1, OrthoFinder, MCScanX, MEME Suite	Domain prediction, orthogroup analysis, gene duplication, motif discovery	Accurate TNL identification in strawberry and passion fruit [6] [69]
Genomic Databases	Genome Database for Rosaceae (GDR), PLAZA, Phytozome, NCBI	Reference genomes, comparative genomics, gene family analysis	Multi-species NLR evolutionary studies [6] [10]
Expression Databases	IPF Database, CottonFGD, NCBI BioProjects	RNA-seq data retrieval, expression profiling across conditions	Identification of stress-responsive TNLs [10]
Domain Databases	Pfam, CDD, InterPro, SMART	Domain architecture verification, conserved motif identification	Validation of TIR, NBS, LRR domains [6] [17]
Pathogen Inoculation Systems	Botrytis cinerea, Dickeya dadantii, Marssonina rosae, CMV	Phenotypic resistance assays, functional validation	TNL response characterization in rose and sweetpotato [6] [17] [43]
Hormone Treatments	Salicylic acid, Jasmonic acid, Gibberellin	Defense signaling pathway activation	RcTNL23 response profiling in rose [17]
Machine Learning Algorithms	Random Forest classifier	Multi-stress responsive gene prediction	Identification of passion fruit PeCNL stress responders [69]

The integration of multi-omics data provides a powerful framework for advancing functional annotation of TNL genes beyond what any single approach can achieve. Genomic analyses establish evolutionary relationships and conserved domains; transcriptomics reveals dynamic expression patterns under various stresses; proteomics validates protein production and interactions; and metabolomics connects TNL activation to downstream physiological changes. The comparative analysis presented herein demonstrates that species-specific TNL expansions require customized annotation strategies, with emerging machine learning approaches offering promising avenues for predicting multi-stress responsive NLR genes. As omics technologies continue to evolve, their integration will progressively unravel the complex functional landscape of plant immune receptors, accelerating the development of disease-resistant crop varieties through molecular breeding.

Functional Validation and Expression Profiling of TNL Genes

Differential Expression Analysis Under Biotic and Abiotic Stresses

Plant survival in natural environments depends on sophisticated immune systems to counteract diverse biotic and abiotic stresses. Effector-triggered immunity (ETI), a robust defense mechanism often culminating in programmed cell death, is primarily mediated by intracellular nucleotide-binding site and leucine-rich repeat receptors (NLRs) [71]. Among these, TIR-NBS-LRR (TNL) proteins constitute a major subclass characterized by an N-terminal Toll/Interleukin-1 receptor (TIR) domain, a central nucleotide-binding site (NBS), and a C-terminal leucine-rich repeat (LRR) region [72]. The TIR domain is pivotal in signal transduction, often initiating immune signaling cascades [73]. This guide provides a comparative analysis of TNL research methodologies, expression profiles under stress conditions, and genomic distribution across species, offering experimental protocols and resources to advance this dynamic field.

Genomic Distribution and Evolutionary Analysis of TNL Genes

The genomic architecture of TNL genes reveals significant diversity and specialization across plant species. Comparative analysis demonstrates that TNL presence varies markedly among evolutionary lineages, with gymnosperms like Pinus taeda exhibiting notable TNL expansion (constituting 89.3% of typical NBS-LRRs), while complete TNL loss occurs in monocots such as rice, wheat, and maize [72]. Among dicots, Salvia species (e.g., Salvia miltiorrhiza) show marked TNL degeneration, with only two TNL proteins identified in its genome [72].

Chromosomal Organization and Gene Clustering

TNL genes frequently reside in complex clusters that function as genomic hotspots for diversification. Tomato (Solanum lycopersicum) exemplifies this organization, with approximately 65% of its NB-LRR genes clustered within small genomic regions spanning 200 kb or less [71]. The largest tomato cluster contains 14 CNL genes within a ~110-kb region on chromosome 4, sharing high sequence similarity with resistance genes from wild potato [71]. Chromosome 1 hosts the largest tomato TNL concentration (43%), while chromosomes 3, 6, and 10 completely lack TNL genes [71]. This non-random genomic distribution underscores the adaptive evolution of TNL loci in response to species-specific pathogen pressures.

Table 1: Comparative Genomic Distribution of NBS-LRR Subfamilies Across Plant Species

Plant Species	Total NBS-LRR Genes	TNL Count	CNL Count	RNL Count	Notable Genomic Features
Arabidopsis thaliana	~207 [72]	101 [72]	Information missing	Information missing	Reference model species
Oryza sativa (Rice)	~505 [72]	0 [72]	Information missing	Information missing	Complete TNL absence
Salvia miltiorrhiza	196 [72]	2 [72]	75 [72]	1 [72]	Severe TNL reduction
Solanum lycopersicum (Tomato)	~320 [71]	Information missing	Information missing	Information missing	20 clusters; Chr1 TNL-rich
Secale cereale (Rye)	582 [74]	0 [74]	581 [74]	1 [74]	TNL absence; High CNL
Pinus taeda (Loblolly Pine)	311 (typical) [72]	89.3% of typical [72]	Information missing	Information missing	Significant TNL expansion

Differential Expression Analysis of TNL Genes Under Stress Conditions

TNL gene expression undergoes complex regulation during plant-pathogen interactions, with distinct transcriptional patterns emerging between resistant and susceptible cultivars. RNA-seq analysis of sweetpotato responding to Dickeya dadantii infection revealed that resistant cultivars activate more defense genes, including NLR receptors and transcription factors [43]. Similar expression dynamics occur in cowpea, where whole-genome sequencing identified 2,188 R-genes (including numerous TNLs) that respond to environmental challenges through transcriptional and translational reprogramming [75].

Expression Profiling Methodologies

Protocol 1: RNA Sequencing for TNL Expression Analysis

Sample Preparation and Stress Induction: Inoculate plant tissues with pathogen suspensions (e.g., D. dadantii on sweetpotato) or apply abiotic stress treatments. Include mock-treated controls [43].
RNA Extraction: Use trizol-based methods or kits (e.g., Qiagen RNeasy) to extract high-quality RNA. Verify integrity via Agilent 2200 TapeStation or similar systems [43].
Library Preparation and Sequencing: Construct libraries using Illumina kits (e.g., NEXTFLEX Rapid DNA-seq). Perform paired-end sequencing (150 bp) on Illumina platforms (HiSeq X Ten) [75].
Bioinformatic Analysis:
- Trim raw reads with Trimmomatic or similar tools
- Align reads to reference genome using HISAT2/STAR
- Assemble transcripts and identify differentially expressed genes with DESeq2 or edgeR [43]
- Annotate TNL genes using NLR-parser or domain-based HMM searches [74]

Protocol 2: Genome-Wide TNL Identification and Characterization

Sequence Data Acquisition: Perform whole-genome sequencing using hybrid approaches (Illumina and Nanopore) for comprehensive coverage [75].
Domain Identification: Search for TNL genes using Hidden Markov Models (HMM) of NB-ARC domain (PF00931) and TIR domain (PF01582) with HMMER suite [74].
Phylogenetic Analysis: Align NB-ARC domains with ClustalW, construct maximum-likelihood trees with IQ-TREE, and visualize clades with iTOL [74].
Expression Validation: Corroborate in silico findings with RNA-seq data and qRT-PCR under stress conditions [72].

TNL-Mediated Signaling Pathways in Plant Immunity

The TNL signaling cascade involves a complex network of interactions and downstream components that ultimately establish disease resistance. The following diagram illustrates the core TNL-mediated immune signaling pathway:

TNL proteins function as intracellular immune receptors that directly or indirectly recognize pathogen effector proteins, initiating ETI [71]. This recognition often occurs through the LRR domain, which exhibits high variability suited to diverse effector detection [74]. Upon effector binding, conformational changes in the TNL protein activate the NBS domain for ATP/GTP binding and hydrolysis, enabling signal transduction [72]. The TIR domain contributes to signaling through putative NADase activity or interaction with downstream components [73]. Successful TNL activation triggers a hypersensitive response (HR) and programmed cell death (PCD) at infection sites, restricting pathogen spread [71]. This signaling cascade synergizes with pattern-triggered immunity (PTI) for amplified defense responses [43]. Recent evidence identifies helper NLRs (RNLs like NRG1 and ADR1) that support TNL signaling, increasing system robustness against rapidly evolving pathogens [71].

Research Reagent Solutions for TNL Studies

Table 2: Essential Research Reagents and Resources for TNL Characterization

Reagent/Resource	Function/Application	Example Specifications
SNP Genotyping Arrays	High-density genotyping for gene mapping	48K 'Axiom_Arachis-v2' array (5,706 polymorphic SNPs in peanut) [76]
Long-Read Sequencing	Genome assembly and structural variation	GridION X5 (Oxford Nanopore); ~20x coverage [75]
Hybrid Assembly Tools	Integration of sequencing data for quality genomes	MaSuRCA v3.4.2 [75]
Domain Databases	Identification and annotation of TNL domains	Pfam (NB-ARC: PF00931; TIR: PF01582) [74]
HMMER Suite	Domain searches and gene family identification	HMMER-3.0 with E-value 1.0 [74]
Phylogenetic Software	Evolutionary analysis and subclass classification	IQ-TREE with ModelFinder [74]

Comparative Analysis of TNL Function Across Plant Species

Functional characterization of TNL genes reveals diverse recognition specificities and resistance mechanisms across plant species. In tomato, the Bs4 TNL gene confers resistance against Xanthomonas campestris pv. vesicatoria [71]. Arabidopsis TNLs include RPS2 (resistance against Pseudomonas syringae) and RPW8-NLR helpers that mediate immune signaling [72] [71]. Peanut research identified Arahy.1PK53M, a TNL candidate within the PSWDR-1 locus, contributing to Tomato spotted wilt virus resistance [76].

Expression Dynamics Under Combined Stresses

TNL regulation involves complex hormonal crosstalk, particularly between jasmonic acid (JA) and salicylic acid (SA) pathways [43]. Sweetpotato studies show JA accumulates faster than SA after pathogen challenge, potentially negatively regulating resistance against D. dadantii [43]. Reactive oxygen species (ROS) and antioxidant enzymes like superoxide dismutase (SOD) also contribute significantly to TNL-mediated resistance responses [43].

Table 3: Experimentally Validated TNL Genes and Their Functions

TNL Gene	Plant Species	Pathogen Stress	Function/Mechanism	Reference
RPS2	Arabidopsis thaliana	Pseudomonas syringae	First cloned plant NBS-LRR; recognizes AvrRpt2 effector	[72]
Bs4	Solanum lycopersicum	Xanthomonas campestris	Confers resistance against bacterial spot disease	[71]
Arahy.1PK53M	Arachis hypogaea	Tomato spotted wilt virus	Candidate resistance gene within PSWDR-1 locus	[76]
RPW8-NLR	Arabidopsis thaliana	Multiple pathogens	"Helper" NLR mediating immune signaling	[71]
Pita	Oryza sativa	Magnaporthe oryzae	CNL protein recognizing AVR-Pita effector via LRR domain	[72]

This comparison guide demonstrates that TNL genes exhibit remarkable diversity in genomic organization, expression patterns, and functional mechanisms across plant species. While complete TNL absence characterizes monocots, functional TNLs in dicots and gymnosperms play crucial roles in pathogen recognition and immunity activation. Advanced genomic technologiesâ€”including high-density SNP arrays, long-read sequencing, and sophisticated bioinformatic toolsâ€”enable increasingly precise TNL characterization. These resources empower researchers to dissect the intricate regulatory networks governing TNL expression under biotic and abiotic stresses, ultimately facilitating the development of crops with enhanced, durable disease resistance.

Virus-Induced Gene Silencing (VIGS) is a powerful reverse genetics tool that leverages the plant's natural antiviral defense mechanism to achieve transient silencing of endogenous genes. This approach is grounded in the RNA-mediated defense mechanism of Post-Transcriptional Gene Silencing (PTGS), where plants recognize and degrade double-stranded RNA (dsRNA) and homologous mRNA sequences. The significance of VIGS has grown substantially with the advent of high-throughput sequencing, which rapidly generates lists of candidate genes requiring functional validation. While traditional methods for validating gene function often require the generation of stable transgenic plantsâ€”a process that can take considerable timeâ€”VIGS provides a faster alternative for characterizing gene function, particularly in challenging species such as cereals [77] [78].

The application of VIGS is particularly relevant for the study of plant Nucleotide-Binding Site Leucine-Rich Repeat (NBS-LRR) genes, which constitute one of the largest families of plant disease resistance (R) genes. These genes are central to the plant immune system, encoding proteins that recognize pathogen effectors and initiate defense responses. The functional characterization of specific NBS-LRR domain architectures, including TIR-NBS-LRR (TNL) and CC-NBS-LRR (CNL), is crucial for understanding plant-pathogen interactions and for developing durable disease-resistant crops [10] [13]. VIGS enables researchers to rapidly link these specific gene structures to their immune functions by observing the phenotypic consequences of their silencing.

Molecular Mechanisms of VIGS

The VIGS process is initiated when a recombinant viral vector, carrying a fragment of a plant gene of interest, is introduced into the plant. The underlying mechanism can be broken down into several key stages, illustrated in the diagram below:

Diagram: The Core VIGS Mechanism. This figure illustrates the key steps of Virus-Induced Gene Silencing, from initial viral infection and double-stranded RNA formation to the final phenotypic outcome.

Viral Infection and dsRNA Formation: The process begins with the inoculation of the plant using a recombinant viral vector that has been engineered to carry a fragment (typically 200â€“500 base pairs) of the plant's endogenous gene that is targeted for silencing [78]. As the virus replicates and spreads systemically through the plant, it produces double-stranded RNA (dsRNA), a common intermediate during viral replication [79].
siRNA Biogenesis: The plant's innate antiviral defense system recognizes this dsRNA. Dicer-like (DCL) enzymes, which are RNase III-type nucleases, process the long dsRNA into short fragments called small interfering RNAs (siRNAs), which are 21 to 24 nucleotides in length [79] [78].
RISC Assembly and Target Silencing: These siRNAs are then incorporated into an RNA-induced silencing complex (RISC). Within RISC, the siRNA acts as a guide, enabling the complexâ€”catalyzed by an Argonaute (AGO) proteinâ€”to seek out and cleave complementary mRNA sequences. This leads to the sequence-specific degradation of the target endogenous mRNA before it can be translated into a functional protein [80] [79]. The process can be amplified by host RNA-directed RNA polymerases (RDRPs), which use the cleaved mRNA as a template to generate more dsRNA, leading to the production of secondary siRNAs and a stronger, systemic silencing signal [79].

In some cases, the silencing signal can also lead to Transcriptional Gene Silencing (TGS) via RNA-directed DNA methylation (RdDM) if the siRNA is complementary to a gene's promoter region, resulting in stable, heritable epigenetic modifications [79].

VIGS Workflow and Key Experimental Considerations

A generalized VIGS experiment follows a sequence of critical steps, from vector construction to phenotypic analysis. The workflow and its key decision points are summarized below:

Diagram: Generalized VIGS Experimental Workflow. This chart outlines the key stages of a VIGS experiment, highlighting critical decision points like vector selection.

Critical Factors for Experimental Success

Vector Selection: The choice of viral vector is paramount and depends on the host plant species. Tobacco Rattle Virus (TRV) is one of the most versatile and widely used vectors, especially for dicots like Nicotiana benthamiana and tomato, due to its broad host range, efficient systemic movement, and mild symptoms [81] [78]. For monocot plants like barley, the Barley Stripe Mosaic Virus (BSMV) vector has been successfully optimized and is a powerful tool [77] [82]. Other vectors include Bean pod mottle virus (BPMV) for soybean and Cotton leaf crumple virus (CLCrV) for cotton [10] [78].
Insert Design and Agroinfiltration: The fragment of the target gene inserted into the vector is typically 200-500 nucleotides long and should be unique to the gene of interest to avoid off-target silencing. The constructed vector is then introduced into Agrobacterium tumefaciens, and the bacterial culture is infiltrated into the leaves of young plants, often using a needless syringe [78] [83]. The concentration of the agrobacterium (OD600 typically ~0.8-1.5) and the developmental stage of the plant are critical factors that influence silencing efficiency [78] [83].
Validation of Silencing: The success of gene knockdown must be confirmed using molecular techniques. Reverse-Transcriptase Quantitative PCR (RT-qPCR) is the standard method. Accurate normalization using stably expressed reference genes (e.g., GhACT7 and GhPP2A1 in cotton) is essential for reliable quantification, especially under biotic stress conditions or viral infection [83]. A positive control, such as silencing the Phytoene Desaturase (PDS) gene which causes a visible white photobleaching phenotype, is routinely used to confirm that the VIGS system is working effectively in the plant [82] [78].

Application of VIGS in NBS-LRR Gene Characterization

VIGS has proven to be an indispensable tool for functionally characterizing members of the large NBS-LRR gene family. The table below summarizes key experimental data from recent studies using VIGS to investigate NBS-LRR genes and their roles in disease resistance.

Table 1: VIGS-Mediated Functional Analysis of NBS-LRR and Associated Genes

Plant Species	Gene Silenced (Orthogroup/Name)	Gene Type / Domain Architecture	Pathogen / Stress Assayed	Key Phenotypic Outcome Post-Silencing	Experimental Validation Method
Gossypium arboreum (Cotton)	GaNBS (OG2) [10]	NBS domain gene	Cotton leaf curl disease (CLCuD)	Increased viral titer, demonstrating putative role in virus resistance	Virus-induced gene silencing & viral DNA quantification
Vernicia montana (Tung tree)	Vm019719 [13]	NBS-LRR gene (Upregulated in resistant species)	Fusarium wilt	Loss of resistance, increased disease susceptibility	VIGS, RT-qPCR, fungal inoculation
Barley (Hordeum vulgare)	Rar1, Sgt1, Hsp90 [82]	Chaperone complex (Co-factors for NBS-LRR)	Blumeria graminis (Powdery mildew)	Resistance-breaking phenotype, successful fungal penetration & haustoria formation	RT-PCR, protein level detection, fungal development scoring
Gossypium hirsutum (Upland Cotton)	NBS genes in Mac7 vs Coker 312 [10]	NBS domain genes	Cotton leaf curl disease (CLCuD)	6583 unique variants in tolerant (Mac7) vs 5173 in susceptible (Coker312) accessions	Genetic variation analysis, expression profiling

The data in Table 1 demonstrates the power of VIGS in validating gene function across diverse plant species. For instance, in tung trees, silencing a specific NBS-LRR gene (Vm019719) in the resistant Vernicia montana compromised its resistance to Fusarium wilt, confirming the gene's essential role in the defense response [13]. Similarly, in barley, VIGS was used to demonstrate that the co-chaperone proteins Rar1, Sgt1, and Hsp90 are required for the function of the Mla13 NBS-LRR resistance gene, as their silencing led to a breakdown of resistance against powdery mildew [82].

Comparative studies have also leveraged VIGS to understand the genetic basis of resistance. Research in cotton used VIGS to link the expression of specific NBS gene orthogroups (e.g., OG2, OG6, OG15) to tolerance against cotton leaf curl disease, and further identified significant genetic variation in NBS genes between resistant and susceptible cotton accessions [10].

Essential Research Reagents and Protocols

A successful VIGS experiment relies on a suite of specialized reagents and standardized protocols. The table below lists key materials and their functions.

Table 2: Research Reagent Solutions for VIGS Experiments

Reagent / Material	Function in VIGS Workflow	Examples & Key Details
Viral Vectors	To deliver the plant gene insert, replicate, and spread systemically, triggering silencing.	TRV (TRV1 + TRV2 plasmids for dicots), BSMV (for monocots like barley), CLCrV (for cotton) [77] [10] [78].
Agrobacterium tumefaciens Strain	A biological delivery vehicle to introduce the viral vector DNA into plant cells.	GV3101 is a commonly used disarmed strain for agroinfiltration [83].
Induction Buffer Components	Prepares agrobacteria for efficient plant cell transformation.	MES buffer (pH stabilizer), MgClâ‚‚ (for membrane stability), Acetosyringone (induces virulence genes) [83].
Reference Genes for RT-qPCR	Essential internal controls for accurate measurement of target gene knockdown.	GhACT7 & GhPP2A1 (stable in cotton under VIGS & herbivory) [83]. Avoid less stable genes like GhUBQ7 and GhUBQ14 in these conditions.
Positive Control Silencing Construct	Visual confirmation that VIGS is working systemically.	TRV:PDS or BSMV:PDS - Silencing Phytoene desaturase causes photobleaching [82] [78].
Empty Vector / Null Construct	Critical negative control to distinguish silencing effects from viral infection symptoms.	e.g., TRV:00 or BSMV:GFP (targeting a non-endogenous gene) [83].

Detailed Protocol: BSMV-VIGS in Barley

This protocol, adapted from studies in barley, outlines the key steps for functional characterization of disease resistance genes [77] [82]:

Vector Preparation: The BSMV vector is used in a tripartite genome system (Î±, Î², Î³). The target gene fragment (e.g., ~300 bp) is cloned into the Î³-BSMV vector in an inverted repeat orientation to enhance silencing efficiency. The recombinant vectors are then transformed into Agrobacterium tumefaciens strain GV3101.
Plant Growth and Selection: Barley cultivars are screened for their ability to support BSMV replication without exhibiting excessive viral symptoms. Cultivars like 'Clansman' harboring the Mla13 resistance gene have been identified as suitable hosts [82].
Inoculum Preparation and Inoculation: Agrobacterium cultures harboring the BSMV vectors are grown to an OD600 of ~0.8, pelleted, and resuspended in an induction buffer containing acetosyringone. For barley, the second leaves of 7-10 day-old seedlings are mechanically inoculated by gently rubbing the leaf surface with a mixture of the BSMV constructs using a gloved finger or carborundum as an abrasive [82].
Phenotypic Assessment: After 2-3 weeks, silenced plants are challenged with the pathogen of interest. For barley powdery mildew, this involves inoculation with Blumeria graminis f. sp. hordei isolate carrying the corresponding AvrMla13 avirulence gene. The interaction phenotype is scored 7 days post-inoculation; a successful silencing of a required R-gene or co-factor results in a transition from an incompatible (resistant) to a compatible (susceptible) interaction, characterized by fungal colonization and sporulation [82].
Molecular Verification: Silencing efficiency is confirmed by:
- RT-qPCR: Total RNA is extracted from leaf tissue, reverse-transcribed, and the abundance of the target gene's mRNA is quantified. It is crucial to use validated reference genes (e.g., ubi3, EF-1Î± in barley) for normalization [81] [82]. A successful experiment typically shows a 70-90% reduction in target transcript levels.
- Protein Analysis: In some cases, western blotting is performed to confirm the reduction of the corresponding protein, as demonstrated for Rar1, Sgt1, and Hsp90 in barley [82].

VIGS stands as a robust, rapid, and versatile technique for the functional characterization of genes, particularly within the complex and expansive NBS-LRR family. Its ability to provide transient loss-of-function phenotypes without the need for stable transformation makes it an invaluable tool for validating genes identified through comparative genomics and sequencing studies. As research continues to unravel the intricacies of plant immune receptors, VIGS will remain a cornerstone methodology for linking specific gene domain architectures, such as TIR-NBS-LRR, to their biological functions in disease resistance, ultimately accelerating the development of improved crop varieties.

Intracellular immune signaling in plants is predominantly mediated by nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, which function as sophisticated molecular switches for pathogen detection [42]. These proteins, categorized into Toll/interleukin-1 receptor (TIR-NBS-LRR or TNL) and coiled-coil (CC-NBS-LRR or CNL) subfamilies based on their N-terminal domains, share a conserved nucleotid-binding architecture that controls their activation state [1]. The central nucleotide-binding site (NBS) domain, also referred to as the NB-ARC (nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4) domain, contains characteristic motifs including the phosphate-binding loop (P-loop), kinase-2, and kinase-3a (GLPL) that facilitate nucleotide binding and hydrolysis [84] [47]. This review comprehensively compares the ADP/ATP binding specificity across TNL proteins from various plant species, examining how this molecular switching mechanism enables pathogen recognition and defense activation.

Structural Organization of TNL Proteins and Conserved Motifs

Domain Architecture and Classification

TNL proteins exhibit a characteristic tripartite domain structure beginning with an N-terminal TIR domain, followed by the central NBS domain, and terminating with C-terminal LRR regions [17] [46]. The TIR domain is primarily involved in protein-protein interactions and downstream signaling, while the LRR domain is responsible for pathogen recognition specificity [42]. The NBS domain serves as the regulatory core, housing the nucleotide-binding pocket that alternates between ADP-bound (inactive) and ATP-bound (active) states [1]. Beyond typical TNLs, plants also encode truncated forms including TIR-NBS (TN) proteins that lack LRR domains and may function as adaptors or regulators in immunity signaling networks [3].

Table 1: Conserved Motifs in the NBS Domain of TNL Proteins

Motif Name	Consensus Sequence	Functional Role	Structural Location
P-loop	GxPPSGKTT	Phosphate binding	N-terminal subdomain
RNBS-A	GxPLLFGD	Nucleotide binding	N-terminal subdomain
Kinase-2	LVLDDVW/D	MgÂ²âº coordination	Central subdomain
RNBS-D	CFLYCALF/Y	Structural stability	C-terminal subdomain
GLPL	GMGLPLA	Domain rearrangement	ARC2 subdomain
MHD	MHDIV	Nucleotide state regulation	C-terminal subdomain

Motif Conservation and Structural Considerations

The NBS domain contains several highly conserved motifs critical for nucleotide binding and hydrolysis [47]. The P-loop (phosphate-binding loop) facilitates phosphate binding, while the kinase-2 motif contains an aspartate residue that coordinates MgÂ²âº ions essential for catalytic activity [1]. The MHD (Met-His-Asp) motif at the C-terminal end of the ARC subdomain serves as a critical sensor for monitoring nucleotide state and facilitating conformational changes [84]. Sequence alignment of TNL proteins across species reveals that these motifs exhibit remarkable conservation, though the RNBS-A and RNBS-D motifs display distinct sequence features that differentiate TNLs from CNLs [47]. Structural modeling based on the APAF-1 protein suggests these motifs assemble into a compact nucleotide-binding fold that undergoes significant conformational rearrangement during nucleotide exchange [1].

Comparative Analysis of ADP/ATP Binding Specificity

Molecular Switching Mechanism

The NBS domain functions as a molecular switch through controlled nucleotide exchange and hydrolysis, transitioning between ADP-bound "off" and ATP-bound "on" states [42]. In the absence of pathogen effectors, TNL proteins maintain an autoinhibited conformation with ADP tightly bound to the NBS domain [1]. Upon pathogen recognition, often through direct or indirect detection mechanisms, nucleotide exchange occurs where ADP is replaced by ATP, triggering a significant conformational change that activates downstream signaling [42]. This activated state initiates defense responses, including hypersensitive response (HR) and systemic acquired resistance (SAR), ultimately leading to programmed cell death at infection sites to limit pathogen spread [46].

Table 2: Experimental Evidence for Nucleotide Binding in Plant NBS-LRR Proteins

Protein	Species	Experimental Method	Nucleotide Specificity	Functional Outcome
Rx (CNL)	Potato	Site-directed mutagenesis	ADP/ATP	P-loop mutation (K255R) disrupts function [84]
I2 (CNL)	Tomato	ATP binding/hydrolysis assays	ATP	Binds and hydrolyzes ATP [1]
Mi (CNL)	Tomato	ATP binding/hydrolysis assays	ATP	Binds and hydrolyzes ATP [1]
N (TNL)	Tobacco	Oligomerization assay	ADP/ATP	Nucleotide-dependent oligomerization [1]
StTNLC7G2	Potato	Functional validation	ADP/ATP	Reactive oxygen species generation [46]

Determinants of Binding Specificity

Several conserved residues within the NBS domain directly determine nucleotide binding specificity and affinity. The lysine residue within the P-loop motif forms critical interactions with the Î²- and Î³-phosphates of ATP, while aspartate residues in the kinase-2 motif coordinate MgÂ²âº ions that stabilize ATP binding [84] [1]. The MHD motif appears to function as a nucleotide state sensor, with mutations in this region often leading to constitutive activation or complete loss of function [84]. Research on the potato Rx protein demonstrated that a single point mutation (K255R) in the P-loop motif disrupts both nucleotide binding and complementation with paired domains, highlighting the essential nature of these residues [84]. This suggests that nucleotide binding is a prerequisite for proper protein interactions and immune signaling.

Experimental Approaches for Studying Nucleotide Binding

Site-Directed Mutagenesis of Conserved Motifs

Protocol: Site-directed mutagenesis of conserved NBS motifs followed by functional complementation assays provides compelling evidence for nucleotide binding requirements [84].

Mutagenesis Design: Introduce specific point mutations in conserved motifs (e.g., P-loop lysine, kinase-2 aspartate, MHD histidine) using PCR-based methods
Functional Testing: Express mutant constructs in planta via transient expression or stable transformation
Phenotypic Analysis: Assess ability to trigger hypersensitive response (HR) and confer disease resistance
Interaction Studies: Evaluate impact on protein-protein interactions using co-immunoprecipitation

Key Findings: Studies of the potato Rx protein demonstrated that a K255R mutation in the P-loop disrupts physical interaction between CC and NBS-LRR domains, indicating nucleotide binding is essential for proper conformational dynamics [84]. Similar mutagenesis approaches in tobacco N protein revealed the necessity of intact nucleotide-binding motifs for oligomerization and defense activation [1].

Biochemical Analysis of Nucleotide Binding and Hydrolysis

Protocol: Direct measurement of nucleotide binding and hydrolysis kinetics provides quantitative assessment of binding specificity [1].

Protein Purification: Express and purify recombinant NBS domains using E. coli or insect cell systems
Radiolabeled Binding Assays: Incubate protein with [Î±-Â³Â²P]ATP or [Î±-Â³Â²P]GTP, separate bound/free nucleotide via filter binding or gel filtration
Hydrolysis Measurements: Use thin-layer chromatography to monitor phosphate release from Î³-Â³Â²P-labeled nucleotides
Specificity Competition: Perform cold nucleotide competition assays to determine binding preferences

Key Findings: Biochemical studies of tomato I2 and Mi proteins demonstrated specific ATP binding and hydrolysis activities, with mutation of conserved kinase-2 and P-loop residues abolishing both binding and enzymatic function [1]. These findings established the NBS domain as a functional STAND family ATPase capable of nucleotide-dependent conformational regulation.

Diagram 1: Nucleotide-Dependent Activation Cycle of TNL Proteins. The transition from ADP-bound to ATP-bound states triggers immune signaling.

Comparative Functional Validation Across Plant Species

Expression Profiling Under Pathogen Challenge

Comprehensive expression analyses across multiple plant species reveal that TNL genes are frequently upregulated in response to pathogen infection, supporting their crucial role in immunity. In roses (Rosa chinensis), systematic identification of 96 TNL genes showed that many respond significantly to fungal pathogens including Marssonina rosae (black spot), Podosphaera pannosa (powdery mildew), and Botrytis cinerea (gray mold) [17]. Particularly, RcTNL23 exhibited strong upregulation in response to three different hormones (gibberellin, jasmonic acid, salicylic acid) and all three tested pathogens, suggesting it functions as a central component in defense signaling networks [17]. Similar comprehensive studies in potatoes identified 44 TNL genes, with expression profiling after Alternaria solani infection revealing dynamic induction patterns, particularly in disease-tolerant varieties [46].

Functional Characterization Through Silencing and Overexpression

Virus-Induced Gene Silencing (VIGS): VIGS has emerged as a powerful tool for functional characterization of TNL genes. In cotton, silencing of GaNBS (orthogroup OG2) demonstrated its essential role in virus tolerance, with silenced plants showing increased viral titers and susceptibility to cotton leaf curl disease [10]. Similarly, silencing of GbaNA1 in cotton reduced resistance to Verticillium dahliae, further supporting the critical function of NBS-LRR proteins in fungal defense [85].

Heterologous Expression: Conversely, overexpression of specific TNL genes frequently enhances disease resistance across plant species. The grape TNL gene VaRGA1 when overexpressed in tobacco enhanced resistance to multiple pathogens as well as improved drought and salt tolerance [85]. Similarly, soybean GmKR3 overexpression conferred resistance to multiple viruses without affecting yield or quality traits [10]. These gain-of-function approaches provide direct evidence for the protective function of TNL proteins and their nucleotide-dependent activation mechanisms.

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents for Studying TNL Nucleotide Binding

Reagent/Category	Specific Examples	Function/Application	Experimental Context
Expression Vectors	Gateway-compatible vectors, pCambia series	Protein expression in planta	Heterologous expression, subcellular localization [46]
Antibodies	Anti-HA, Anti-MYC, Anti-GFP	Protein detection, immunoprecipitation	Co-IP, Western blot, protein interaction studies [84]
Nucleotide Analogs	ATPÎ³S, AMP-PNP, ADP-BeFâ‚ƒ	Nucleotide binding specificity	Biochemical assays, conformational stabilization [1]
Pathogen Cultures	Alternaria solani, Marssonina rosae	Pathogen challenge assays	Expression profiling, functional validation [17] [46]
qRT-PCR Primers	Gene-specific primers	Expression analysis	Transcript quantification, pathogen response [46]

The ADP/ATP binding specificity of TNL proteins represents a conserved molecular switching mechanism that has been maintained across diverse plant species despite extensive sequence divergence. Comparative analyses reveal that while the fundamental nucleotide-dependent activation mechanism is shared, different plant families have evolved distinct TNL repertoires with specialized functions [17] [10] [46]. The essential nucleotide-binding motifs (P-loop, kinase-2, GLPL, MHD) remain highly conserved, indicating strong purifying selection on these functional elements [47]. Future research directions should focus on obtaining high-resolution structures of TNL proteins in different nucleotide states, developing more specific nucleotide analogs to modulate immune signaling, and engineering nucleotide-binding domains for expanded disease resistance in crop species. The continuing integration of comparative genomics, structural biology, and protein engineering approaches will undoubtedly yield new insights into this fundamental aspect of plant immunity and provide novel strategies for crop protection.

Comparative Expression Patterns in Resistant versus Susceptible Varieties

Plant immunity relies heavily on a diverse family of disease resistance (R) genes, with the TIR-NBS-LRR (TNL) subclass playing a particularly vital role in effector-triggered immunity [17] [60]. These genes encode intracellular proteins that detect pathogen effectors, activating robust defense responses [47] [24]. A critical strategy in plant pathology involves comparing the expression and structural characteristics of these genes between disease-resistant and susceptible varieties. Understanding these differential patterns provides fundamental insights into resistance mechanisms and informs the development of disease-resistant crops through molecular breeding [10] [24]. This guide synthesizes experimental data and methodologies from recent studies to objectively compare TNL gene expression and architecture across a range of plant species and pathogenic challenges.

Domain Architecture and Genomic Distribution of TNL Genes

Structural Classification and Conserved Motifs

TNL genes belong to the larger NBS-LRR superfamily, characterized by a tripartite domain structure. The TIR (Toll/Interleukin-1 Receptor) domain at the N-terminus is involved in signal transduction, the central NBS (Nucleotide-Binding Site) domain functions as a molecular switch for ATP/GTP binding and hydrolysis, and the C-terminal LRR (Leucine-Rich Repeat) domain is responsible for pathogen recognition specificity [2] [60]. The NBS domain contains several conserved motifs, including the P-loop, kinase-2, RNBS, and GLPL motifs, which are crucial for nucleotide binding and protein function [47] [2].

Table 1: Prevalence of TNL Genes Across Plant Species

Plant Species	Total NBS-LRR Genes Identified	TNL Genes Identified	Key Structural Features	Reference
Rosa chinensis (Rose)	Not Specified	96	Intact TIR, NBS, and LRR domains; 8 conserved NBS motifs	[17]
Capsicum annuum (Pepper)	252	4	Classified into TN subclass (TIR + NB-ARC domains)	[2]
Nicotiana benthamiana (Tobacco)	156	5	Full-length TIR-NBS-LRR architecture	[3]
Vernicia montana (Tung Tree)	149	3	TIR-NBS-LRR and TIR-NBS architectures	[24]
Fragaria spp. (Wild Strawberry)	Varies by species	Minority of NLRs	TIR domain at N-terminus; phylogenetically distinct from CNLs/RNLs	[6]

Beyond these typical TNLs, many plant genomes encode numerous NBS-LRR-related genes that lack the full complement of domains. These include TIR-NBS (TN) and CC-NBS (CN) proteins that may function as adaptors or regulators of full-length TNL and CNL proteins [60].

Genomic Organization and Evolutionary Patterns

TNL genes are frequently organized in clusters within plant genomes, a result of both segmental and tandem duplications [2] [60]. In pepper, 54% of the 252 identified NBS-LRR genes form 47 gene clusters, driven by tandem duplications and genomic rearrangements [2]. This clustered organization facilitates the generation of diversity through unequal crossing-over and gene conversion, creating variation in the LRR domain that alters pathogen recognition specificities [60].

A significant evolutionary distinction exists between monocots and dicots regarding TNL distribution. TNL genes are completely absent from cereal genomes, suggesting their loss in the cereal lineage after the divergence of monocots and dicots [6] [60]. Across dicot species, the proportion of TNLs within the total NBS-LRR repertoire varies substantially. In wild strawberries, non-TNLs constitute over 50% of the NLR gene family, surpassing the proportion of TNLs [6], while in pepper, TNLs represent a very small minority (4 out of 252) [2].

Comparative Expression Analysis Under Pathogen Stress

Case Study: Fungal Disease Response in Roses

A comprehensive study of Rosa chinensis investigated the expression of 96 intact TNL genes in response to three fungal pathogens: Botrytis cinerea, Podosphaera pannosa, and Marssonina rosae (black spot pathogen) [17]. Transcriptome analysis revealed that TNL genes were dominantly expressed in leaves, the primary site of pathogen attack. Several RcTNL genes showed significant responses to pathogen infection, with RcTNL23 demonstrating particularly strong upregulation to all three pathogens and three defense hormones (gibberellin, jasmonic acid, and salicylic acid) [17]. Expression pattern analysis after inoculation with the black spot pathogen indicated that different TNL members are activated during different periods of pathogen infection, suggesting a coordinated temporal defense response [17].

Table 2: TNL Gene Expression in Resistant vs. Susceptible Varieties

Plant System	Pathogen Challenge	Resistant Variety Response	Susceptible Variety Response	Key Differentially Expressed Gene	Reference
Tung Tree (Vernicia)	Fusarium wilt	V. montana: Strong upregulation of defense genes	V. fordii: Downregulation or weak response	Vm019719 (upregulated in V. montana) vs. Vf11G0978 (downregulated in V. fordii)	[24]
Rose (Rosa chinensis)	Black Spot (M. rosae)	Temporal expression pattern changes; specific TNLs activated	Not explicitly compared	RcTNL23 (significant upregulation)	[17]
Wild Strawberry (Fragaria)	Botrytis cinerea	Higher proportion of non-TNLs correlated with resistance	Lower proportion of non-TNLs in susceptible F. vesca	Non-TNLs showed dominant expression under infection	[6]
Bottle Gourd (Lagenaria siceraria)	Powdery Mildew	RNL gene Lsi04g015960 identified as candidate	Not specified	Lsi04g015960 (RPW8 domain)	[86]

Case Study: Fusarium Wilt Resistance in Tung Trees

A compelling comparative analysis between resistant Vernicia montana and susceptible V. fordii revealed distinct expression patterns of NBS-LRR genes in response to Fusarium wilt [24]. The orthologous gene pair Vf11G0978-Vm019719 exhibited markedly different expression patterns: Vm019719 was upregulated in the resistant V. montana, while its allelic counterpart Vf11G0978 was downregulated in the susceptible V. fordii [24]. Functional validation through virus-induced gene silencing (VIGS) confirmed that Vm019719 mediates resistance against Fusarium wilt in V. montana. The differential expression was attributed to a deletion in the promoter's W-box element in the susceptible variety, which prevented activation by the transcription factor VmWRKY64 [24].

Hormonal Regulation of TNL Expression

Beyond direct pathogen recognition, TNL gene expression is modulated by defense-related hormones. In pepper, quantitative RT-PCR analysis demonstrated that both salicylic acid (SA) and abscisic acid (ABA) induce the expression of TNL genes (CaRGAs), suggesting their involvement in defense-associated signaling pathways [47]. Similarly, in roses, RcTNL genes responded to gibberellin, jasmonic acid, and salicylic acid treatments, with RcTNL23 showing significant upregulation in response to all three hormones [17]. This hormonal induction highlights the complex regulatory networks controlling TNL-mediated defense responses.

Experimental Protocols for Expression Analysis

Genome-Wide Identification of TNL Genes

Objective: To systematically identify all TNL gene family members within a plant genome.

Materials & Reagents:

High-quality genome assembly and annotation files
Reference TIR (PF01582) and NB-ARC (PF00931) HMM profiles from Pfam database
Bioinformatics tools: HMMER software, Batch CD-Search tool, SMART domain analysis
Sequence alignment software: MAFFT, Clustal W
Phylogenetic analysis tools: IQ-TREE, MEGA7

Methodology:

Sequence Retrieval: Obtain reference TNL protein sequences from related species or databases like TAIR (The Arabidopsis Information Resource).
HMM Search: Use HMMER software with TIR (PF01582) and NB-ARC (PF00931) Hidden Markov Models (HMMs) to search the target proteome (E-value < 1Ã—10â»Â²â°) [17] [3].
Domain Verification: Confirm the presence of TIR, NBS, and LRR domains using Pfam, SMART, and CD-search tools [6] [3].
Classification: Categorize identified genes into subfamilies (TNL, CNL, RNL) based on N-terminal domains and architecture [2].
Phylogenetic Analysis: Perform multiple sequence alignment of NBS domains and construct a phylogenetic tree using maximum likelihood methods [17] [2].

Expression Profiling Using RNA-seq and qRT-PCR

Objective: To quantify and compare TNL gene expression patterns in resistant and susceptible varieties under pathogen stress.

Materials & Reagents:

Plant materials: Resistant and susceptible varieties
Pathogen isolates or hormone elicitors (e.g., SA, JA, ABA)
RNA extraction kit (e.g., TRIzol reagent)
cDNA synthesis kit
qRT-PCR system with SYBR Green chemistry
RNA-seq library preparation kit and sequencing platform

Methodology:

Treatment and Sampling: Inoculate leaves with pathogen suspensions (e.g., M. rosae for black spot) or apply hormone solutions. Collect tissue samples at multiple time points post-inoculation [17].
RNA Extraction: Isolve total RNA from treated tissues, ensuring high purity (A260/A280 ratio ~2.0) and integrity (RIN > 7.0).
Transcriptome Sequencing: Prepare RNA-seq libraries and sequence on an Illumina platform. Map reads to the reference genome and calculate gene expression values (FPKM or TPM) [17] [48].
qRT-PCR Validation: Design gene-specific primers for candidate TNLs. Perform qRT-PCR using standard protocols with housekeeping genes (e.g., Actin, Ubiquitin) for normalization [17] [24].
Differential Expression Analysis: Identify significantly differentially expressed genes (e.g., |log2FC| > 1, adjusted p-value < 0.05) between resistant and susceptible lines using tools like DESeq2 or edgeR.

Functional Validation Through VIGS

Objective: To confirm the functional role of candidate TNL genes in disease resistance.

Materials & Reagents:

VIGS vector (e.g., TRV-based system)
Agrobacterium tumefaciens strain GV3101
Gene-specific fragment (300-500 bp) for silencing
Plant growth facilities
Pathogen inoculation materials

Methodology:

Vector Construction: Clone a unique fragment of the target TNL gene into a VIGS vector (e.g., pTRV2) [24].
Agrobacterium Transformation: Introduce the constructed vector into Agrobacterium.
Plant Infiltration: Infiltrate young leaves of resistant plants with the Agrobacterium suspension containing the VIGS construct.
Pathogen Challenge: Inoculate silenced plants with the target pathogen after silencing confirmation (typically 2-3 weeks post-infiltration).
Phenotypic Assessment: Evaluate disease symptoms and measure pathogen biomass compared to control plants (e.g., empty vector or non-silenced) [24].
Molecular Verification: Confirm reduced expression of the target gene via qRT-PCR and correlate with enhanced disease susceptibility.

Signaling Pathways and Molecular Mechanisms

The following diagram illustrates the central signaling pathway involving TNL genes in plant immunity, particularly highlighting the differences between resistant and susceptible varieties:

Diagram 1: TNL-Mediated Immunity Pathway in Resistant vs. Susceptible Varieties. Resistant varieties (green background) maintain functional TNL genes with intact promoters, enabling pathogen perception and defense activation. Susceptible varieties (red background) often possess compromised TNL genes or regulatory elements, leading to disease progression.

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Research Reagents for TNL Gene Expression Studies

Reagent / Solution	Function / Application	Example Specifications
HMMER Software	Identification of TNL gene family members using profile hidden Markov models	E-value cutoff < 1Ã—10â»Â²â°; Pfam domains: TIR (PF01582), NB-ARC (PF00931)
Pfam Database	Repository of protein families and domain architectures	Source for TIR, NBS, and LRR domain HMM profiles
RNA Extraction Kit	Isolation of high-quality total RNA from plant tissues	Capable of handling polyphenol-rich tissues; DNase I treatment included
qRT-PCR System	Quantitative measurement of gene expression	SYBR Green or TaqMan chemistry; requires gene-specific primers
VIGS Vector System	Functional validation through transient gene silencing	TRV-based vectors (pTRV1, pTRV2); Agrobacterium-delivered
Illumina Sequencing Platform	Transcriptome profiling of resistant vs. susceptible varieties	Minimum recommended depth: 30 million reads per sample; paired-end
MAFFT / IQ-TREE	Multiple sequence alignment and phylogenetic analysis	Default parameters; maximum likelihood method with 1000 bootstraps

Comparative analyses of TNL gene expression between resistant and susceptible varieties consistently reveal that functional, highly expressed TNL genes are fundamental to effective disease resistance. Key patterns emerge across plant-pathogen systems: resistant varieties typically exhibit strong, timely upregulation of specific TNL genes upon pathogen challenge [17] [24], often controlled by transcriptional regulators binding to intact promoter elements [24]. The expression of these genes is frequently modulated by defense hormones like salicylic acid [17] [47], and their protein products may function in interconnected networks rather than in isolation.

The experimental framework presentedâ€”combining genome-wide identification, expression profiling, and functional validationâ€”provides a robust methodology for identifying candidate resistance genes across diverse crop species. These approaches facilitate the development of molecular markers for breeding programs and potential genetic engineering strategies to enhance crop resistance, ultimately contributing to more sustainable agricultural practices with reduced dependence on chemical pesticides.

Syntenic Analysis and Orthologous Gene Conservation

In plant genomics, disease resistance (R) genes encoding nucleotide-binding site and leucine-rich repeat (NBS-LRR) proteins constitute one of the largest and most critical gene families for plant immunity. Among these, TIR-NBS-LRR (TNL) genes play a vital role in effector-triggered immunity by recognizing pathogen effectors and activating defense responses. Understanding the evolutionary mechanisms that shape this gene family requires sophisticated analytical approaches, with syntenic analysis serving as a powerful method for tracing orthologous gene conservation across related species. This conservation provides insights into evolutionary relationships and functional preservation of disease resistance mechanisms.

The comparative analysis of syntenic relationships has revealed that NBS-LRR genes exhibit dynamic evolutionary patterns across plant lineages, with significant expansion and contraction events influencing resistance gene repertoires. These patterns are driven by various molecular mechanisms, including tandem duplications, segmental duplications, and gene loss events, which collectively contribute to the species-specific adaptation against pathogens. This guide objectively compares experimental approaches and their applications in syntenic analysis of TNL genes across diverse plant species, providing researchers with methodological frameworks for conducting such analyses in their systems.

Comparative Genomic Distribution of TNL Genes

Table 1: Comparative Genomic Distribution of TNL Genes Across Plant Species

Plant Species	Family	Total NBS Genes	TNL Genes	Distribution Pattern	Study Reference
Rosa chinensis	Rosaceae	Not specified	96	Dominant expression in leaves	[17]
Fragaria pentaphylla	Rosaceae	Not specified	Lower proportion than non-TNL	Clustered distribution	[6]
Fragaria nilgerrensis	Rosaceae	Not specified	Lower proportion than non-TNL	Clustered distribution	[6]
Fragaria vesca	Rosaceae	Not specified	Lowest proportion among wild strawberries	Clustered distribution	[6]
Ipomoea batatas (sweet potato)	Convolvulaceae	889	Present (exact count not specified)	83.13% in clusters	[66]
Ipomoea trifida	Convolvulaceae	554	Present (exact count not specified)	76.71% in clusters	[66]
Ipomoea triloba	Convolvulaceae	571	Present (exact count not specified)	90.37% in clusters	[66]
Ipomoea nil	Convolvulaceae	757	Present (exact count not specified)	86.39% in clusters	[66]
Arachis duranensis	Fabaceae	393	Present (exact count not specified)	Tandem duplication prevalent	[87]
Arachis ipaÃ«nsis	Fabaceae	437	Present (exact count not specified)	More clusters than A. duranensis	[87]
Vernicia montana	Euphorbiaceae	149	3 TNL, 7 TIR-NBS, 2 CC-TIR-NBS	Non-random chromosomal distribution	[24]
Vernicia fordii	Euphorbiaceae	90	0	Non-random chromosomal distribution	[24]

The distribution of TNL genes across plant genomes demonstrates significant variation, with most species exhibiting clustered chromosomal arrangements. In Rosaceae species, independent analyses have confirmed that NBS-LRR genes are distributed non-randomly across all chromosomes, typically showing a clustered distribution pattern [88]. This clustering is particularly evident in wild strawberry species, where comparative studies have revealed that species with higher proportions of non-TNL genes like Fragaria pentaphylla and F. nilgerrensis exhibit greater resistance to pathogens such as Botrytis cinerea compared to F. vesca, which has the lowest proportion of non-TNL genes [6].

The syntenic analysis of NBS-LRR genes across 12 Rosaceae species revealed 102 ancestral genes (7 RNLs, 26 TNLs, and 69 CNLs), which underwent independent gene duplication and loss events during the divergence of the Rosaceae family [88]. These dynamic evolutionary patterns explain the discrepancy of NBS-LRR gene number among Rosaceae species, with different species exhibiting distinct evolutionary patterns including "first expansion and then contraction," "continuous expansion," and "early sharp expanding to abrupt shrinking" patterns [88].

Methodological Framework for Syntenic Analysis

Genomic Identification of TNL Genes

Experimental Protocol 1: Genome-Wide Identification of TNL Genes

Data Collection: Obtain complete genome sequences and annotation files from relevant databases such as Genome Database for Rosaceae (GDR), Phytozome, NCBI, or Plaza genome databases [10] [6].
Sequence Retrieval: Identify candidate NBS-LRR genes using:
- BLAST searches with threshold expectation value of 1.0
- HMMER searches using hidden Markov models of NB-ARC domain (PF00931), TIR domain (PF01582), and LRR domains with default parameters [17] [6]
Domain Verification: Confirm domain architecture using:
- Batch CD-Search tool from NCBI
- Pfam database analysis with E-value cutoff of 10â»â´
- SMART database for additional domain verification
- COILS program with threshold of 0.1 for predicting CC domains [17] [6]
Classification: Categorize verified genes into TNL, CNL, and RNL subclasses based on N-terminal domains.
Validation: Remove redundant hits and manually curate the final gene set.

Synteny and Orthologous Gene Analysis

Experimental Protocol 2: Syntenic Analysis of Orthologous TNL Genes

Orthogroup Construction: Use OrthoFinder v2.5.1 with DIAMOND tool for sequence similarity searches and MCL clustering algorithm to identify orthogroups [10].
Multiple Sequence Alignment: Perform alignment using MAFFT v7.0 with default parameters, followed by trimming with TrimAl [10] [6].
Phylogenetic Analysis: Construct maximum likelihood trees using:
- IQ-TREE v1.6.12 with 1000 ultrafast bootstraps
- Model selection via ModelFinder within IQ-TREE
- Visualization using iTOL v6 [6]
Synteny Mapping: Identify syntenic blocks using:
- MCScanX with default parameters
- TBtools for visualization of collinear relationships [66]
Evolutionary Analysis: Calculate selective pressure using:
- PAML 4.0 or similar tools for Ka/Ks analysis
- Notung software for reconciliation of gene trees and species trees [6] [87]

The following diagram illustrates the complete workflow for syntenic analysis and ortholog identification:

Expression and Functional Validation

Experimental Protocol 3: Expression and Functional Analysis of Syntenic TNL Genes

Expression Profiling:
- Retrieve RNA-seq data from relevant databases (IPF database, CottonFGD, NCBI BioProjects)
- Analyze FPKM values across tissues and stress conditions
- Identify differentially expressed genes (DEGs) using transcriptomic pipelines [10]
qRT-PCR Validation:
- Design primers using tools like Beacon Designer 8.0
- Perform quantitative PCR with reference genes (e.g., actin)
- Analyze expression patterns under pathogen infection [17] [87]
Functional Validation:
- Implement Virus-Induced Gene Silencing (VIGS) to knock down candidate genes
- Assess changes in disease susceptibility
- Perform protein-ligand and protein-protein interaction studies [10] [24]

Key Research Reagent Solutions

Table 2: Essential Research Reagents and Tools for Syntenic Analysis

Category	Specific Tool/Reagent	Function/Application	Example Use Case
Bioinformatics Tools	OrthoFinder v2.5.1	Orthogroup construction and orthology inference	Identifying orthogroups across 34 plant species [10]
	MCScanX	Synteny detection and visualization	Identifying collinear blocks in Ipomoea species [66]
	DIAMOND	Sequence similarity searches	Fast alignment for large-scale orthogroup analysis [10]
	HMMER v3.1	Hidden Markov Model searches	Identifying NB-ARC domains in protein sequences [6]
Databases	Pfam Database	Protein family annotation	Verifying TIR, NBS, and LRR domains [17]
	Genome Database for Rosaceae (GDR)	Genomic data repository	Accessing genome sequences for 12 Rosaceae species [88]
	NCBI CDD	Conserved domain detection	Confirming domain architecture of NBS-LRR genes [17]
Experimental Methods	Virus-Induced Gene Silencing (VIGS)	Functional validation of candidate genes	Silencing GaNBS (OG2) in resistant cotton [10]
	qRT-PCR	Expression validation	Verifying NBS-LRR gene expression after pathogen infection [17] [87]
Primer Sets	Degenerate PCR primers	Amplification of NBS domains	Isolating NBS fragments from Asteraceae species [15] [89]

Case Studies in Syntenic Analysis

Asteraceae Family Analysis

A comparative analysis of NBS domain sequences from sunflower, lettuce, and chicory revealed that Asteraceae species share distinct families of R-genes composed of both CC and TIR domain-containing NBS-LRR R-genes [15] [89]. The study demonstrated that between the most closely related species (lettuce and chicory), there was a striking similarity of CC subfamily composition, while sunflower showed less similarity in structure. When compared to Arabidopsis thaliana, Asteraceae NBS gene subfamilies appeared to be distinct from Arabidopsis gene clades, suggesting that NBS families in the Asteraceae family are ancient, with gene duplication and gene loss events changing the composition of these gene subfamilies over time [89].

The following diagram illustrates the syntenic relationships and evolutionary events in TNL genes:

Ipomoea Species Comparative Genomics

A comprehensive syntenic analysis of NBS-encoding genes across four Ipomoea species (sweet potato, I. trifida, I. triloba, and I. nil) identified 201 NBS-encoding orthologous genes that formed synteny gene pairs between any two of the four species, suggesting that each synteny gene pair was derived from a common ancestor [66]. The study revealed that the distribution of NBS-encoding genes among the chromosomes was non-random and uneven, with 83.13%, 76.71%, 90.37%, and 86.39% of the genes occurring in clusters in sweet potato, I. trifida, I. triloba, and I. nil, respectively. The duplication pattern analysis showed higher segmentally duplicated genes in sweet potatoes than tandemly duplicated ones, while the opposite trend was found for the other three species [66].

Vernicia Species Functional Divergence

A comparative analysis of NBS-LRR genes between Fusarium wilt-susceptible Vernicia fordii and its resistant counterpart Vernicia montana identified 43 orthologous gene pairs between the two species [24]. The orthologous gene pair Vf11G0978-Vm019719 exhibited distinct expression patterns: Vf11G0978 showed downregulated expression in V. fordii, while its orthologous gene Vm019719 demonstrated upregulated expression in V. montana, indicating that this pair may be responsible for the resistance to Fusarium wilt. Functional characterization revealed that Vm019719 from V. montana, activated by VmWRKY64, conferred resistance to Fusarium wilt, while in the susceptible V. fordii, its allelic counterpart Vf11G0978 exhibited an ineffective defense response due to a deletion in the promoter's W-box element [24].

Syntenic analysis has proven to be an powerful approach for identifying orthologous TNL genes and understanding their conservation patterns across related species. The methodological frameworks presented in this guide provide researchers with standardized protocols for conducting such analyses across diverse plant systems. The case studies demonstrate that while syntenic conservation of TNL genes is common across related species, the evolutionary trajectories of these genes can vary significantly due to species-specific duplication and loss events.

The functional significance of syntenically conserved orthologs is particularly evident in disease resistance, where orthologous gene pairs often maintain similar functions, though regulatory differences can lead to varying resistance capabilities, as observed in the Vernicia species comparison. These insights highlight the value of syntenic analysis not only for evolutionary studies but also for practical applications in crop improvement and disease resistance breeding.

Conclusion

The comprehensive analysis of TIR-NBS-LRR domain architectures reveals their crucial role in plant immunity, characterized by significant evolutionary diversity and structural specialization. Key findings confirm the lineage-specific distribution of TNL genes, with absence in monocots but conservation in dicots and basal angiosperms, alongside expanding computational methods for accurate identification and annotation. Future research should focus on structural characterization of non-canonical TNL architectures, developing machine learning approaches for improved prediction, and functional validation through genome editing in crop species. The integration of TNL gene discovery with molecular breeding programs holds significant promise for developing durable disease resistance in agricultural crops, potentially reducing pesticide dependence and enhancing global food security.