A is, in the most literal sense, a piece of code looking for a machine to run on. It carries a — a complete program for making copies of itself — but possesses none of the cellular machinery needed to execute that program. It must find a host and co-opt that 's ribosomes, polymerases, and energy supply to replicate.
This dependency defines the . It is not alive in the usual sense (no metabolism, no homeostasis, no independent reproduction), but it is the most successful self-replicating entity on Earth by copy number. There are an estimated 10³¹ individual in the biosphere — more than all other biological entities combined.
Understanding is essential for bioinformatics because sequences appear everywhere: in datasets as contaminants or co-infections, as integrated retroelements in every vertebrate , and as tools ( vectors) for delivery in research and therapy.
What a Virus Actually Is
At minimum, a requires:
- A — nucleic acid containing the information to encode and direct replication
- A capsid — a shell that protects the during transmission
- A mechanism for entering host
Many also have: 4. An envelope — a lipid bilayer (derived from host membranes) that surrounds the capsid in some 5. Accessory — regulatory, immune evasion, or structural encoded in the
A is like a compiled binary on a USB drive. The binary contains valid instructions, but it can't do anything without a computer to run on. When it finds a computer (a host ), it takes over the host's resources — CPU (ribosomes), RAM (cytoplasm), I/O (transport machinery) — to execute its program: replicate itself and prepare new copies for distribution.
The host immune system is the antivirus software: it scans incoming files, flags suspicious patterns, and attempts to quarantine or delete the binary before it can execute.
Genome Types: More Diversity Than Cellular Life
Unlike cellular life, which uses only double-stranded as its , use virtually every possible nucleic acid configuration:
| Genome type | Example viruses | Notes |
|---|---|---|
| dsDNA | Herpesviruses, poxviruses, adenoviruses | Most similar to cellular genomes; can be very large (poxviruses ~200 kb) |
| ssDNA | Parvoviruses, circoviruses | Small, simple genomes |
| dsRNA | Reoviruses, rotaviruses | Replicate in the cytoplasm via RNA-dependent RNA polymerase |
| +ssRNA | Coronaviruses (SARS-CoV-2), flaviviruses (dengue, Zika), picornaviruses (polio) | Genome directly functions as mRNA; can be immediately translated |
| −ssRNA | Influenza, rabies, Ebola | Genome is complement of mRNA; must be transcribed first |
| ssRNA-RT | HIV, HTLV | RNA genome, but replicates through DNA intermediate via reverse transcriptase |
| dsDNA-RT | Hepatitis B | DNA genome, replicates via RNA intermediate |
The "+" and "−" designations for refer to strand polarity relative to : positive-sense (+ssRNA) can be directly by ribosomes; negative-sense (−ssRNA) must first be copied into .
The Baltimore classification system (1971, Nobel Prize 1975) categorizes all by type and replication strategy into 7 classes. It remains the foundational taxonomy for virology and directly predicts which host the can hijack vs. which it must bring itself. For example, negative-sense must carry their own -dependent polymerase in the virion because host have no such .
Capsid Symmetry: Geometry Matters
capsids self-assemble from repeated copies of one or a few . Two fundamental geometries have evolved:
Icosahedral symmetry: Most non-enveloped animal . 20 equilateral triangular faces. Efficient packing — close to a sphere, maximizing volume-to-surface ratio. Adenoviruses, poliovirus, HPV, hepatitis B use icosahedral capsids.
Helical symmetry: Capsid spiral around the . Used by negative-sense (tobacco mosaic , rabies) and influenza. Flexible length — accommodates variable sizes.
Complex capsids: Some (poxviruses, bacteriophages) have more complex, asymmetric structures that don't fit either category.
Self-assembly from symmetric, repeated units is elegant: the only needs to encode a small number of capsid sequences rather than a unique capsid structure. It's the same principle as building complex 3D structures from identical Lego bricks.
Enveloped vs. Non-Enveloped: Implications for Transmission
The presence or absence of a lipid envelope has major consequences:
Enveloped (HIV, SARS-CoV-2, influenza, herpesviruses):
- Acquire their envelope by budding through host membranes during exit
- Envelope contains host and (spike , etc.)
- More sensitive to detergents, heat, and drying — which disrupt the lipid bilayer
- Spread more effectively through direct contact or droplets; less durable on surfaces
Non-enveloped (adenoviruses, rotaviruses, poliovirus, norovirus):
- Naked capsid — resistant to detergents, acid, and drying
- Can survive on surfaces for hours to days
- Typically transmitted via the fecal-oral route or contaminated surfaces
- This is why alcohol-based hand sanitizers are less effective against non-enveloped (alcohol disrupts lipid envelopes but less effectively disrupts bare capsids)
Viral Genome Size: The Minimal Program
span an enormous range:
- Smallest: Hepatitis D (1.7 kb, encodes only 1 ; requires hepatitis B for replication)
- Largest animal : Mimivirus (~1.2 Mb — larger than some bacterial )
- Typical human pathogens: Influenza ~13 kb, SARS-CoV-2 ~30 kb, HIV ~9.7 kb, HSV-1 ~152 kb
The pressure to minimize size drives extreme coding density in small :
- Overlapping reading frames: the same sequence encodes two different in different frames
- Polyproteins: one large is and then cleaved by proteases into multiple functional
- Multifunctional : one serves as capsid component, replicase, and immune antagonist
This coding compactness is one reason analysis is particularly interesting computationally — a single can affect multiple simultaneously if it falls in an overlapping region.
Viral Protein Functions
The in even small encode a complete set of functions:
Structural (capsid, envelope, matrix): pack and protect the during transmission
Replication (polymerases, helicases, proteases): carry out replication and, in negative-sense , initial
Entry : surface that recognize host and mediate fusion or endocytosis
Immune evasion : found in all successful ; mechanisms include blocking interferon signaling, hiding from MHC presentation, and expressing host-like to avoid detection
Accessory/regulatory : control the timing and rate of ; determine whether infection is acute or latent
Viral Diversity in the Human Genome
About 8% of the human consists of recognizable retrovirus-derived sequences — endogenous retroviruses (ERVs) that integrated into ancestral germ millions of years ago. Most are degenerate and non-functional, but some ERV-derived sequences have been co-opted:
- Syncytin-1 and Syncytin-2 (derived from ERV envelope ) are essential for placental development in primates — the that form the placenta fuse together using a repurposed fusion
- Some ERV drive in tissues where the original had no
- ERV sequences contribute to regulatory elements (, CTCF binding sites)
This makes the human literally a palimpsest of ancient integrations — and makes "contamination" in genomic non-trivial to detect, since ERV sequences can to reference .
Viruses as Bioinformatics Tools
Beyond their role as pathogens, are essential tools in molecular biology and medicine:
vectors: Adeno-associated (AAV), lentivirus, and adenovirus are engineered as delivery vehicles for therapy. Understanding the natural biology of each vector type is necessary to understand its tropism (which it infects), capacity (how much it can carry), and immunogenicity.
CRISPR delivery: Most clinical CRISPR therapies use vectors to deliver the guide and Cas9. Vector choice determines delivery efficiency, immune response, and editing duration.
Research tools: Bacteriophages ( that infect bacteria) are used for phage display, library screening, and as model systems. Lambda phage was one of the first sequenced and led directly to recombinant technology.
A virus is a minimal genetic parasite: a nucleic acid genome (DNA or RNA) enclosed in a protein capsid, sometimes with a lipid envelope. Viruses cannot replicate independently — they must hijack host cell machinery to copy their genome and produce new viral particles.
A virus is a self-replicating exploit payload. Its genome is shellcode that, once delivered into the host cell, redirects the cell's ribosomes (compute), membranes (infrastructure), and polymerases (build system) to manufacture copies of itself. The capsid is the delivery vehicle — stripped off after successful injection. Retroviruses (HIV) additionally integrate into the host genome: persistence via source code modification.
In the next chapter, we'll examine how actually get into and replicate — the infection mechanism.