Part 6·6.3·14 min read

Evolution as Optimization

Evolution is a blind optimization process operating on heritable variation — the same logic as stochastic gradient descent, but running for billions of years on a fitness landscape we're still mapping.

evolutionnatural selectionphylogeneticspopulation genetics

Evolution is the explanation for why every biological system we've examined in this book exists. The cycle has checkpoints because that didn't have them accumulated and died. fold because ones that didn't fold were degraded. The immune system recognizes pathogens because organisms without immune systems didn't survive long enough to reproduce.

Every feature of molecular biology is the output of 4 billion years of optimization — not by a designer, but by the relentless differential reproduction of entities with heritable variation. Understanding this process mathematically is the domain of population genetics; understanding its products in sequence space is phylogenetics. Both are foundational to modern bioinformatics.

The Logic of Natural Selection

Three conditions are necessary and sufficient for natural selection to occur:

  1. Heritable variation: individuals differ, and those differences are transmitted to offspring
  2. Differential fitness: some reproduce more than others in a given environment
  3. Sufficient population size: must persist long enough to spread

Given these conditions, selection will increase the frequency of fitter and decrease less fit ones. It's not a force — it's a logical consequence of differential reproduction.

{ }Evolution as distributed gradient descent

Natural selection is remarkably similar to gradient descent in machine learning, with some key differences:

  • Population = batch of candidate solutions
  • Fitness = objective function (but usually multi-dimensional and non-stationary)
  • Reproduction = update step
  • = noise added to parameters
  • Genetic drift = stochastic gradient (especially in small populations)

The optimizer has no gradient information — it's zero-order optimization, learning only from whether the current parameters work. Selection has no foresight — it cannot move through a fitness valley to reach a higher peak. This is a known limitation: evolution can get stuck in local optima just like stochastic hill-climbing.

But it has run for 4 × 10⁹ "iterations" with parallelism across ~10¹⁸ individual organisms at any given time. The compute budget is extraordinary.

Genetic Drift: Random Walks in Frequency Space

Natural selection is not the only evolutionary force. Genetic drift — random fluctuations in frequency due to finite population size — can fix or eliminate regardless of fitness.

The key relationship: the strength of genetic drift relative to selection is determined by effective population size (Ne) and selection coefficient (s):

  • When |s| >> 1/Ne: selection dominates ( fate determined by fitness)
  • When |s| << 1/Ne: drift dominates ( fate is essentially random)

Most changes in are nearly neutral — they have very small fitness effects that are dominated by drift, especially in populations with small Ne (humans, large mammals). This is the basis of neutral theory (Kimura, 1968): most observed sequence variation is selectively neutral, maintained or eliminated by drift.

Practical implication: when comparing homologous between species, synonymous substitution rate (dS) reflects drift (neutral), while nonsynonymous substitution rate (dN) reflects selection + drift. The ratio dN/dS (ω):

  • ω < 1: purifying selection (most changes are deleterious; removed by selection)
  • ω ≈ 1: neutral evolution
  • ω > 1: positive selection ( changes are beneficial; actively accumulated)

dN/dS analysis is used to identify under positive selection (rapidly evolving under adaptive pressure) and to test whether specific codons in an are under adaptive evolution.

Coalescent Theory: Looking Backward in Time

Instead of asking "where will this go?" we can ask "when did all copies of this share a common ancestor?" Coalescent theory models the genealogy of copies backward in time.

Key result: in a population of Ne diploid individuals, two randomly chosen copies share a common ancestor, on average, 2Ne generations ago. The deeper implication:

  • Humans have Ne ≈ 10,000–20,000 (bottleneck from ancient population history)
  • Most pairs of human copies coalesce ~200,000–400,000 years ago
  • This defines the time depth of human sequence variation

Coalescent-based methods underlie:

  • Divergence time estimation in phylogenetics
  • Demographic inference from population genomic data (detecting ancient bottlenecks, expansions, admixture)
  • Genealogical reasoning in forensic genetics

Molecular Clocks: Sequences as Time Records

Neutral accumulate at a roughly constant rate per generation per site — the molecular clock. This allows sequences to function as molecular clocks: the more different two sequences are, the longer ago they diverged.

Molecular clock applications:

  • Dating phylogenetic divergences: when did humans and chimpanzees last share a common ancestor? (5–7 Ma, calibrated by the molecular clock from genomic divergence)
  • Dating outbreaks: Bayesian phylogenetic methods (BEAST, TreeTime) use sampling dates and substitution rates to date the origin of outbreaks (SARS-CoV-2 origin estimated at November–December 2019)
  • Dating cancer : with tumor rates, it's possible to estimate when the first driver occurred (some cancers begin 10–20 years before diagnosis)

The molecular clock is not perfectly constant — it varies with rate (higher in ), generation time, and selection pressure. Relaxed clock models account for rate variation across lineages.

Phylogenetics: Reading Evolutionary History from Sequences

Phylogenetics reconstructs the evolutionary relationships between sequences (and the organisms or that carry them). The output is a phylogenetic tree — a branching diagram showing relationships and divergence times.

Distance-Based Methods

Compute pairwise sequence distances (percent divergence, Jukes-Cantor corrected), then using algorithms:

  • UPGMA: assumes a molecular clock (all lineages evolve at the same rate)
  • Neighbor-joining (NJ): does not assume a clock; the standard fast method for large datasets

NJ is used in preliminary analyses and large-scale phylogenetics where parsimony and likelihood methods are too slow.

Maximum Parsimony

Selects the tree that minimizes the total number of evolutionary events () required to explain the observed sequences. Computationally hard (NP-hard for large trees). Used for closely related sequences.

Maximum Likelihood (ML)

Selects the tree and model parameters that maximize the probability of observing the under an explicit substitution model. The gold standard for phylogenetic accuracy. Tools: RAxML, IQ-TREE (both implement fast ML heuristics; IQ-TREE is now preferred for most analyses).

Substitution models describe the rates at which or change:

  • JC69 (Jukes-Cantor): simplest; all substitutions equally probable
  • GTR+G+I: General Time Reversible with Gamma-distributed rates and invariant sites; most flexible and commonly used

ModelTest or IQ-TREE's built-in model selection identifies the best-fit model for a given dataset.

Bayesian Methods

BEAST, MrBayes: incorporate prior distributions on parameters and sample from the posterior using MCMC. Can estimate divergence times, population sizes, and migration rates simultaneously. Computationally intensive but powerful — the standard for dated phylogenies and demographic inference.

NextStrain and real-time phylogenetics

NextStrain (nextstrain.org) maintains real-time phylogenetic analyses of influenza, SARS-CoV-2, Ebola, Zika, and dozens of other pathogens. It updates automatically as new sequences are deposited.

The pipeline (Augur + Auspice) runs MAFFT for , IQ-TREE for phylogenetic inference, and TreeTime for molecular clock dating, then renders an interactive visualization. During COVID-19, this pipeline was how the global scientific community tracked the spread of and the emergence of new lineages in near-real-time.

Positive Selection in the Human Genome

Not all human evolution is neutral drift. Regions of the under recent positive selection show characteristic signatures:

  • Selective sweeps: when a beneficial rapidly rises to fixation, it carries surrounding with it (hitchhiking). The result: a region of reduced diversity and extended haplotype homozygosity around the selected . Detected by extended haplotype homozygosity (EHH) statistics and iHS (integrated haplotype score).

  • Population-specific sweeps: at high frequency in one population but rare in others suggest recent local adaptation. Classic examples: LCT (lactase persistence in dairy-farming populations), HbS (sickle in malaria-endemic regions), EPAS1 (altitude adaptation in Tibetans).

  • Balancing selection: some are maintained at intermediate frequencies by selection that favors heterozygotes or alternates over time. HLA show extreme balancing selection — diversity is maintained because a diverse MHC repertoire protects against a diverse pathogen landscape.

Population Genomics: Mapping Human History

Modern population genomics uses -wide SNP data from thousands of individuals to infer:

  • Population structure: () and ADMIXTURE reveal corresponding to ancestral populations and admixture proportions
  • Migration patterns: F-statistics and D-statistics test for flow between populations
  • Bottlenecks and expansions: effective population size trajectories inferred from the distribution of pairwise coalescence times (PSMC, SMC++ methods)
  • Archaic admixture: Neanderthal and Denisovan sequences introgressed into modern humans at levels of 1–4% in non-African populations — detectable from ancient introgressed haplotypes

The 1000 Project, gnomAD, and the UK Biobank provide population genomic reference panels used routinely for frequency lookup, ancestry estimation in clinical genetics, and GWAS interpretation.

Evolutionary Thinking in Bioinformatics Practice

Evolutionary concepts pervade bioinformatics:

  • Ortholog vs. paralog: related by speciation (orthologs) vs. duplication (paralogs). Ortholog identification is essential for comparative genomics and function prediction.
  • Synteny: conserved order between of different species, reflecting ancestral organization
  • Conservation scores: evolutionarily conserved positions in multiple sequence are functionally important — the basis of GERP, PhastCons, and PhyloP scores used in pathogenicity prediction
  • Ancestral sequence reconstruction: inferring the sequence of an ancestral to study the evolution of function

Phylogenetic conservation is one of the strongest lines of evidence for pathogenicity: an perfectly conserved in 100 vertebrate species is very likely to be functionally important, and a disrupting it is likely to be damaging.

DECODER
Biology

Evolution by natural selection is the accumulation of heritable variation filtered by differential reproductive success. Mutations generate variation; selection retains beneficial variants; genetic drift introduces randomness. Evolution has no foresight — it is a greedy local search over fitness landscapes.

{ } For Developers

Evolution is a stochastic gradient descent running in parallel across a population. Each organism is a candidate solution; reproduction with mutation is the perturbation step; fitness is the loss function. Sexual recombination is crossover — mixing two high-fitness genomes to explore new regions of the solution space. Genetic drift is noise that prevents getting stuck in local optima. The algorithm has been running for 3.8 billion iterations with no termination condition.

LAB · Genetic Algorithm: Evolution as Optimization
Python · Pyodide