The human contains roughly 20,000 -coding . The human proteome — the complete set of — contains well over 100,000 distinct forms. How? The doesn't encode 100,000 . The discrepancy is resolved largely by : the ability of a single pre- to be spliced in multiple ways, producing different combinations of and therefore different isoforms.
is not an exception or an edge case. It's the rule: approximately 95% of human multi- undergo . It's one of the primary mechanisms through which eukaryotic complexity arises from a surprisingly small .
Splicing Recap
As established in the chapter, after , the pre- contains all . The spliceosome — a large complex of snRNAs (U1, U2, U4, U5, U6) and ~150 — identifies - boundaries by recognizing consensus sequences (5' splice site GU, branch point, polypyrimidine tract, 3' splice site AG) and catalyzes removal.
Constitutive removes every and joins all — the same outcome every time. uses different combinations of splice sites to produce distinct isoforms.
Modes of Alternative Splicing
There are five main patterns:
Exon Skipping
The most common mode (~40% of events). An is included in some transcripts and excluded from others. Inclusion/skipping is controlled by the relative strength of the splice sites flanking the and by regulatory .
Pre-mRNA: [Exon 1]—[Exon 2]—[Exon 3]—[Exon 4]
Isoform A: [Exon 1]—[Exon 2]—[Exon 3]—[Exon 4] (include exon 3)
Isoform B: [Exon 1]—[Exon 2]—[Exon 4] (skip exon 3)
Alternative 5' Splice Site
Different 5' splice sites are used, changing the 5' boundary of an — thus including or excluding a portion of the upstream .
Alternative 3' Splice Site
Different 3' splice sites are used, changing the 3' boundary of an — including or excluding a portion of the downstream .
Intron Retention
An is retained in the mature rather than being spliced out. Often produces a non-functional transcript (with a premature stop codon, triggering NMD) but can also produce functional isoforms. More common in plants; less common in animals, though prevalent in .
Mutually Exclusive Exons
Two or more that are never included in the same transcript. The transcript always includes exactly one of them.
Imagine a codebase where certain modules can be compiled in or out depending on build flags. The source is the same; the compiled binary differs. is this mechanism operating at the level. The (source) is fixed. Different , at different times or in different conditions, produce different "builds" by including or excluding .
The downstream consequence: two with identical can express functionally distinct from the same .
Regulation: Splicing Enhancers and Silencers
Splice sites alone don't fully determine which pattern occurs. Local sequences in the pre- regulate spliceosome assembly:
- Exonic (ESEs): sequences within that promote inclusion
- Exonic Silencers (ESSs): sequences within that promote skipping
- Intronic (ISEs): promote inclusion when in adjacent
- Intronic Silencers (ISSs): promote skipping
These sequences are bound by -binding (RBPs), especially SR (serine/arginine-rich — activators) and hnRNPs (heterogeneous nuclear ribonucleoproteins — often repressors). The balance of these RBPs determines which isoform is produced.
Key RBPs:
- SRSF1 (ASF/SF2): canonical SR activator
- hnRNP A1: often antagonizes SR ; promotes skipping
- NOVA1/NOVA2: -specific regulators; control of many
- PTBP1: represses inclusion of -specific in non-neural ; PTBP1 downregulation during differentiation allows inclusion of neural-specific
Functional Consequences of Isoforms
can change:
- domain composition: including or excluding a domain changes the 's interactions and functions
- Subcellular localization: a localization signal in an alternatively spliced can redirect the
- stability: some isoforms are more stable; others have shorter half-lives
- Enzymatic activity: active site residues can be affected
- Dimerization: isoforms can differ in their ability to form homo- or heterodimers
Classic examples:
BRCA1 produces multiple isoforms through . Isoforms lacking functional BRCT domains can have altered repair and tumor suppressor activity.
BCL-X (BCL2L1 ): the long isoform BCL-XL is anti-apoptotic (prevents programmed death); the short isoform BCL-XS is pro-apoptotic. These are produced from the same by alternative 5' splice site usage. The balance between the two isoforms helps determine whether a lives or dies.
VEGF-A (vascular endothelial growth factor) has multiple isoforms with different binding affinities and diffusion properties — controlling whether the angiogenic signal is local or diffuses widely.
Tau (MAPT ): multiple are alternatively spliced, producing 6 isoforms with different microtubule-binding properties. An imbalance in tau isoforms is implicated in tauopathies including Alzheimer's disease.
Disease-Causing Splicing Mutations
that disrupt splice sites are a major class of pathogenic . They can cause:
- skipping: loss of the downstream → truncated
- retention: retained → premature stop codon → NMD
- Cryptic splice site activation: a nearby sequence with partial homology to a splice site gets activated → aberrant isoform
~15–50% of pathogenic single- affect , either at canonical splice sites or in ESEs/ESSs. Many "missense" in coding sequence actually disrupt by eliminating an ESE rather than (only) changing the .
This has important implications for interpretation: a in the middle of an , with no predicted change, can still be pathogenic if it destroys an ESE. Standard annotation pipelines that only consider effects miss this class.
The ability to manipulate has become therapeutically useful. Antisense oligonucleotides (ASOs) can be designed to bind pre- sequences and block or expose splice sites:
- Nusinersen (Spinraza): treats spinal muscular atrophy (SMA) by blocking an ISS in the SMN2 , forcing inclusion of 7 and producing functional SMN
- Eteplirsen (Exondys 51): treats Duchenne muscular dystrophy by skipping 51, restoring the reading frame of the dystrophin
Splice-switching is now a validated therapeutic mechanism, with multiple approved drugs.
Measuring Alternative Splicing with RNA-seq
Standard workflows count per , averaging over all isoforms. Detecting requires:
Isoform-level quantification tools: Kallisto, Salmon, and RSEM directly quantify transcript isoforms (not just ) by probabilistically assigning to known transcripts. Output: TPM/estimated counts per transcript.
Differential analysis: rMATS, SUPPA2, and DEXSeq identify which events change between conditions. They quantify Percent Spliced In (PSI, ψ) — the fraction of transcripts that include a given — and test for differences.
PSI ranges from 0 ( always skipped) to 1 ( always included). A PSI change of 0.2 between conditions means the shifts from, say, 40% to 60% included — often biologically meaningful.
Long- (Oxford Nanopore, PacBio) full-length transcripts, directly revealing isoform structures without inference from short . Increasingly used for isoform discovery in tissues with complex patterns (especially brain).
Alternative Splicing and the Proteome
dramatically expands diversity beyond the ~20,000 count:
- Multiple isoforms per
- Isoforms with distinct interaction partners, localization, stability
- Isoform ratios that change during differentiation, disease, and aging
This means that the same ( ) can have different effects depending on which isoforms are expressed in a given type. A might affect a domain present in the ubiquitous isoform but absent in the brain-specific isoform — so the is not brain-related.
For anyone working in genomics, is not an advanced topic. It's part of the baseline: every call, every analysis, every annotation query involves decisions about which isoforms count and how to handle them.