Part 5·5.1·10 min read

What is a Virus

A virus is minimal self-replicating code — a nucleic acid genome wrapped in a protein shell, dependent on host cell machinery for every step of its lifecycle.

virologyvirus structuregenome types

A is, in the most literal sense, a piece of code looking for a machine to run on. It carries a — a complete program for making copies of itself — but possesses none of the cellular machinery needed to execute that program. It must find a host and co-opt that 's ribosomes, polymerases, and energy supply to replicate.

This dependency defines the . It is not alive in the usual sense (no metabolism, no homeostasis, no independent reproduction), but it is the most successful self-replicating entity on Earth by copy number. There are an estimated 10³¹ individual in the biosphere — more than all other biological entities combined.

Understanding is essential for bioinformatics because sequences appear everywhere: in datasets as contaminants or co-infections, as integrated retroelements in every vertebrate , and as tools ( vectors) for delivery in research and therapy.

What a Virus Actually Is

At minimum, a requires:

  1. A — nucleic acid containing the information to encode and direct replication
  2. A capsid — a shell that protects the during transmission
  3. A mechanism for entering host

Many also have: 4. An envelope — a lipid bilayer (derived from host membranes) that surrounds the capsid in some 5. Accessory — regulatory, immune evasion, or structural encoded in the

{ }Viruses as executable code without a runtime

A is like a compiled binary on a USB drive. The binary contains valid instructions, but it can't do anything without a computer to run on. When it finds a computer (a host ), it takes over the host's resources — CPU (ribosomes), RAM (cytoplasm), I/O (transport machinery) — to execute its program: replicate itself and prepare new copies for distribution.

The host immune system is the antivirus software: it scans incoming files, flags suspicious patterns, and attempts to quarantine or delete the binary before it can execute.

Genome Types: More Diversity Than Cellular Life

Unlike cellular life, which uses only double-stranded as its , use virtually every possible nucleic acid configuration:

Genome typeExample virusesNotes
dsDNAHerpesviruses, poxviruses, adenovirusesMost similar to cellular genomes; can be very large (poxviruses ~200 kb)
ssDNAParvoviruses, circovirusesSmall, simple genomes
dsRNAReoviruses, rotavirusesReplicate in the cytoplasm via RNA-dependent RNA polymerase
+ssRNACoronaviruses (SARS-CoV-2), flaviviruses (dengue, Zika), picornaviruses (polio)Genome directly functions as mRNA; can be immediately translated
−ssRNAInfluenza, rabies, EbolaGenome is complement of mRNA; must be transcribed first
ssRNA-RTHIV, HTLVRNA genome, but replicates through DNA intermediate via reverse transcriptase
dsDNA-RTHepatitis BDNA genome, replicates via RNA intermediate

The "+" and "−" designations for refer to strand polarity relative to : positive-sense (+ssRNA) can be directly by ribosomes; negative-sense (−ssRNA) must first be copied into .

Baltimore Classification

The Baltimore classification system (1971, Nobel Prize 1975) categorizes all by type and replication strategy into 7 classes. It remains the foundational taxonomy for virology and directly predicts which host the can hijack vs. which it must bring itself. For example, negative-sense must carry their own -dependent polymerase in the virion because host have no such .

Capsid Symmetry: Geometry Matters

capsids self-assemble from repeated copies of one or a few . Two fundamental geometries have evolved:

Icosahedral symmetry: Most non-enveloped animal . 20 equilateral triangular faces. Efficient packing — close to a sphere, maximizing volume-to-surface ratio. Adenoviruses, poliovirus, HPV, hepatitis B use icosahedral capsids.

Helical symmetry: Capsid spiral around the . Used by negative-sense (tobacco mosaic , rabies) and influenza. Flexible length — accommodates variable sizes.

Complex capsids: Some (poxviruses, bacteriophages) have more complex, asymmetric structures that don't fit either category.

Self-assembly from symmetric, repeated units is elegant: the only needs to encode a small number of capsid sequences rather than a unique capsid structure. It's the same principle as building complex 3D structures from identical Lego bricks.

Enveloped vs. Non-Enveloped: Implications for Transmission

The presence or absence of a lipid envelope has major consequences:

Enveloped (HIV, SARS-CoV-2, influenza, herpesviruses):

  • Acquire their envelope by budding through host membranes during exit
  • Envelope contains host and (spike , etc.)
  • More sensitive to detergents, heat, and drying — which disrupt the lipid bilayer
  • Spread more effectively through direct contact or droplets; less durable on surfaces

Non-enveloped (adenoviruses, rotaviruses, poliovirus, norovirus):

  • Naked capsid — resistant to detergents, acid, and drying
  • Can survive on surfaces for hours to days
  • Typically transmitted via the fecal-oral route or contaminated surfaces
  • This is why alcohol-based hand sanitizers are less effective against non-enveloped (alcohol disrupts lipid envelopes but less effectively disrupts bare capsids)

Viral Genome Size: The Minimal Program

span an enormous range:

  • Smallest: Hepatitis D (1.7 kb, encodes only 1 ; requires hepatitis B for replication)
  • Largest animal : Mimivirus (~1.2 Mb — larger than some bacterial )
  • Typical human pathogens: Influenza ~13 kb, SARS-CoV-2 ~30 kb, HIV ~9.7 kb, HSV-1 ~152 kb

The pressure to minimize size drives extreme coding density in small :

  • Overlapping reading frames: the same sequence encodes two different in different frames
  • Polyproteins: one large is and then cleaved by proteases into multiple functional
  • Multifunctional : one serves as capsid component, replicase, and immune antagonist

This coding compactness is one reason analysis is particularly interesting computationally — a single can affect multiple simultaneously if it falls in an overlapping region.

Viral Protein Functions

The in even small encode a complete set of functions:

Structural (capsid, envelope, matrix): pack and protect the during transmission

Replication (polymerases, helicases, proteases): carry out replication and, in negative-sense , initial

Entry : surface that recognize host and mediate fusion or endocytosis

Immune evasion : found in all successful ; mechanisms include blocking interferon signaling, hiding from MHC presentation, and expressing host-like to avoid detection

Accessory/regulatory : control the timing and rate of ; determine whether infection is acute or latent

Viral Diversity in the Human Genome

About 8% of the human consists of recognizable retrovirus-derived sequences — endogenous retroviruses (ERVs) that integrated into ancestral germ millions of years ago. Most are degenerate and non-functional, but some ERV-derived sequences have been co-opted:

  • Syncytin-1 and Syncytin-2 (derived from ERV envelope ) are essential for placental development in primates — the that form the placenta fuse together using a repurposed fusion
  • Some ERV drive in tissues where the original had no
  • ERV sequences contribute to regulatory elements (, CTCF binding sites)

This makes the human literally a palimpsest of ancient integrations — and makes "contamination" in genomic non-trivial to detect, since ERV sequences can to reference .

Viruses as Bioinformatics Tools

Beyond their role as pathogens, are essential tools in molecular biology and medicine:

vectors: Adeno-associated (AAV), lentivirus, and adenovirus are engineered as delivery vehicles for therapy. Understanding the natural biology of each vector type is necessary to understand its tropism (which it infects), capacity (how much it can carry), and immunogenicity.

CRISPR delivery: Most clinical CRISPR therapies use vectors to deliver the guide and Cas9. Vector choice determines delivery efficiency, immune response, and editing duration.

Research tools: Bacteriophages ( that infect bacteria) are used for phage display, library screening, and as model systems. Lambda phage was one of the first sequenced and led directly to recombinant technology.

DECODER
Biology

A virus is a minimal genetic parasite: a nucleic acid genome (DNA or RNA) enclosed in a protein capsid, sometimes with a lipid envelope. Viruses cannot replicate independently — they must hijack host cell machinery to copy their genome and produce new viral particles.

{ } For Developers

A virus is a self-replicating exploit payload. Its genome is shellcode that, once delivered into the host cell, redirects the cell's ribosomes (compute), membranes (infrastructure), and polymerases (build system) to manufacture copies of itself. The capsid is the delivery vehicle — stripped off after successful injection. Retroviruses (HIV) additionally integrate into the host genome: persistence via source code modification.

In the next chapter, we'll examine how actually get into and replicate — the infection mechanism.