The Central Dogma | Bio for Devs

In 1958, Francis Crick articulated what he called the "central dogma of molecular biology": information flows from DNA to RNA to protein, and not in reverse. This framework is as fundamental to biology as the OSI model is to networking — a layered abstraction that clarifies how information moves through a system.

But like the OSI model, the central dogma is a simplification that becomes more nuanced the deeper you go. Understanding both the rule and its exceptions is essential for making sense of modern genomics, epigenetics, and the RNA biology revolution.

The Core Flow

The canonical flow is:

DNA → RNA → Protein

This encodes two processes:

Transcription: DNA is copied into RNA
Translation: RNA is decoded into protein

These are the two steps every molecular biologist learns first, and they underlie essentially all of gene expression analysis.

The logic is straightforward: DNA is the stable, heritable store. It's precious — errors are permanent. So the cell doesn't expose DNA to the translation machinery directly. Instead, it makes a temporary working copy (mRNA) and translates that. The mRNA can be adjusted, destroyed, or regulated without touching the source.

{ }Central dogma as the read-only master branch policy

The central dogma enforces a read-only policy on the source of truth. DNA is the master branch — you don't execute from it directly. You check out a working copy (mRNA), build from that, and let the build artifact (protein) do the actual work. If a build is bad, you delete the artifact. The master branch stays intact.

This design decouples transcription rate (how many mRNA copies are made) from translation rate (how many proteins are made per mRNA) from protein stability (how long each protein lasts). Three independent dials, compounding into enormous regulatory range.

What Crick Actually Said

Crick's original formulation distinguished between "general" information transfers (which can occur in nature) and "special" transfers (which would require unusual mechanisms):

General (can occur):

DNA → DNA (replication)
DNA → RNA (transcription)
RNA → Protein (translation)

Special (require unusual enzymes):

RNA → DNA (reverse transcription)
RNA → RNA (RNA replication)
Protein → DNA or Protein → RNA (these have never been observed)

The key constraint is the last one: protein sequence information does not flow back to nucleic acids. Once translated, the sequence of a protein cannot feed back to modify the DNA that encoded it. This is why acquired traits are not heritable through the germline — the sequence information in protein cannot write back to DNA.

The Exceptions: Where the Dogma Gets Interesting

The "special" transfers are real and biologically important:

Reverse Transcriptase: RNA → DNA

Retroviruses (HIV, HTLV) carry RNA genomes and use reverse transcriptase — an RNA-dependent DNA polymerase — to convert their RNA genome into DNA after infecting a cell. This DNA integrates into the host genome as a provirus, where it can persist indefinitely.

Reverse transcriptase is also responsible for retrotransposons — transposable elements that amplify themselves through an RNA intermediate. About 40% of the human genome consists of retrotransposon-derived sequences. Many of our "junk" sequences are fossil retrotransposons.

In the lab, reverse transcriptase is essential for RNA-seq: because sequencers read DNA, mRNA is first converted to cDNA (complementary DNA) using reverse transcriptase, then sequenced.

RNA-dependent RNA Polymerase: RNA → RNA

RNA viruses (influenza, SARS-CoV-2, polio) replicate their genomes using RNA-dependent RNA polymerases (RdRp). No such enzyme exists in normal human cells — which is why RdRp inhibitors (like remdesivir) are selective antivirals.

The lack of an inherent proofreading mechanism in most RdRps means RNA viruses mutate rapidly — orders of magnitude faster than DNA-based organisms. This high mutation rate enables rapid viral evolution and immune evasion but also produces many defective variants.

Prions: Protein → Protein (Structural)

This one is the most philosophically uncomfortable. Prions are misfolded proteins that can induce normal copies of the same protein to misfold. The misfolded form is self-propagating without any nucleic acid template.

The prion protein PrP^Sc (found in Creutzfeldt-Jakob disease, kuru, and bovine spongiform encephalopathy) converts normal PrP^C to the pathological form through direct protein-protein contact. This is not a sequence change — same amino acids, different fold. Information (the abnormal fold) propagates from protein to protein.

Strictly speaking, this doesn't violate the central dogma because no sequence information is flowing backward. But it does mean that heritable phenotypic information can be transmitted without nucleic acids — a deep exception to the intuitive picture.

Gene Expression: Reading the Dogma Dynamically

The central dogma describes potential information flow. Gene expression describes which flows are active at any moment in a given cell.

Every cell in your body carries the same genome (~20,000 protein-coding genes). But different cell types express different subsets of those genes. A liver cell expresses albumin and coagulation factors. A pancreatic β-cell expresses insulin. A retinal photoreceptor expresses opsins. The same DNA, radically different outputs.

This cell-type specificity is controlled at multiple levels:

Level	Mechanism	Chapter
Transcriptional	Transcription factors, enhancers, chromatin state	3.1
Epigenetic	DNA methylation, histone modification	3.2
Post-transcriptional	Alternative splicing, RNA stability	3.3
Translational	miRNA regulation, ribosome occupancy	3.1, 2.4
Post-translational	Phosphorylation, ubiquitination	2.5

★Why mRNA abundance ≠ protein abundance

RNA-seq measures mRNA levels, but protein levels are what drive cell behavior. The correlation is real but imperfect — typically r ≈ 0.4–0.6 in matched samples. Genes can have high mRNA but low protein (translational repression by miRNAs), or low mRNA but abundant stable protein (high half-life). For clinical biomarkers and drug target validation, protein measurements (proteomics, immunoassays) are often more relevant.

Measuring Gene Expression

The central dogma suggests three ways to measure what a cell is doing:

Genomics — reads the source code. Variant calling, structural variants, copy number. Stable across time and cell types (mostly). Doesn't tell you what's active.

Transcriptomics (RNA-seq) — reads the active mRNA. Dynamic, varies by cell type and condition. Tells you which genes are being transcribed. The dominant approach in molecular biology today.

Proteomics — reads the executing programs. Mass spectrometry-based. More directly functional but technically harder and less deep than RNA-seq.

Epigenomics (ATAC-seq, ChIP-seq, bisulfite sequencing) — reads the regulatory state. Which regions of DNA are accessible? Which histones are modified? This layer controls which genes can be transcribed.

Modern multi-omics integrates all four layers to get a full picture of cell state. Single-cell technologies (scRNA-seq, single-cell ATAC-seq) apply these measurements to individual cells, revealing heterogeneity that bulk measurements average away.

The Central Dogma as a Framework

The central dogma is most useful as a framework for asking questions:

If a protein is overexpressed in cancer, where in the dogma did it go wrong? More gene copies (genomic)? Promoter mutation (transcriptional)? mRNA stabilization (post-transcriptional)? Reduced protein degradation (post-translational)?
If a drug targets a protein, what happens when cells become resistant? They can mutate the protein (protein level), amplify the gene (genomic), upregulate a bypass pathway (transcriptional), or activate post-translational modifications that block drug binding.
If you're designing a diagnostic test, which layer are you measuring? cfDNA (genomic), circulating RNA (transcriptional), protein biomarkers (proteomics)?

The power of the central dogma is not that it's a complete description — it isn't. The power is that it gives you a map of where to look when something goes wrong, and where to intervene when you want to change what a cell does.