Endogenous retroviruses (ERVs) are viral sequences permanently integrated into germline DNA and inherited across generations. They constitute approximately 8% of the human genome1 and serve as molecular fossils of ancient infections spanning tens of millions of years.
How ERVs Form
Retroviruses replicate by inserting a DNA copy of their RNA genome into host chromosomes. This occurs through a multi-step process:
Reverse transcription: The viral enzyme reverse transcriptase converts viral RNA into double-stranded DNA in the cytoplasm.
Nuclear import: The pre-integration complex enters the nucleus.
Integration: Viral integrase inserts the proviral DNA into chromosomal DNA, creating 4-6 base pair target site duplications (TSDs) flanking the insertion.2
Germline transmission: If integration occurs in a germ cell (sperm or egg), the provirus becomes heritable.
Once integrated, ERVs evolve at the neutral mutation rate of the host genome (~2 × 10-9 substitutions per site per year).3
ERV Structure
A complete ERV contains:
LTRs (Long Terminal Repeats): Flanking sequences containing promoter and regulatory elements. Identical at integration, they accumulate mutations independently over time.
Most ERVs have accumulated mutations rendering them non-functional. Many exist as "solo LTRs" — remnants left after recombination between the two LTRs deleted the internal genes.4
Class I (related to gammaretroviruses): HERV-H, HERV-W, HERV-E, ERV-9
Class II (related to betaretroviruses): HERV-K (HML-1 through HML-10)
Class III (related to spumaviruses): HERV-L
Evidence for Common Ancestry
1. Orthologous Insertions
Humans and chimpanzees share ERV insertions at identical genomic positions. The 2005 chimpanzee genome project identified 336 orthologous ERV-containing sequences between humans and chimps across syntenic chromosomes.6
For any single ERV, the probability of independent insertion at the exact same location in two species is approximately:
P ≈ 1/(3×109 × 0.01) ≈ 1 in 30 million
For independent insertion of hundreds of thousands of ERVs at identical positions, the probability becomes vanishingly small. The observed pattern is consistent with inheritance from a common ancestor.
2. Phylogenetic Congruence
ERV phylogenies match established species phylogenies. Benveniste (1999) demonstrated that trees constructed from HERV sequences are consistent with primate evolutionary relationships derived from morphology and other genetic data.7
The distribution of ERVs reflects proviral age: older insertions appear in widely divergent species, while younger insertions are limited to closely related species.
3. Species-Specific Insertions
Some ERV insertions occurred after lineages diverged:
Twelve of the 29 human-specific HERV-K elements are polymorphic in modern human populations, indicating recent insertion.8
4. The PtERV1 Example
PtERV1 (Pan troglodytes endogenous retrovirus 1) has over 200 copies in the chimpanzee genome, more than half still full-length. It is present in:
Chimpanzees
Gorillas
Rhesus macaques
Olive baboons
PtERV1 is absent in humans, orangutans, and gibbons.6 This pattern indicates the virus invaded the germline after the human-chimpanzee divergence but was present in the common ancestor of African great apes.
Functional Co-option
Natural selection has repurposed some ERV elements for host functions.
Syncytins: Placental Development
In 2000, Mi et al. discovered that Syncytin-1, derived from the HERV-W envelope gene, is essential for human placental development.9 The protein mediates fusion of cytotrophoblasts into the syncytiotrophoblast layer.
Syncytin
Origin
Species
Integration Time
Syncytin-1
HERV-W env
Catarrhine primates
>25 mya
Syncytin-2
HERV-FRD env
Catarrhine primates
>40 mya
Syncytin-A, -B
Murine ERVs
Muridae (mice, rats)
~20 mya
Syncytin-Car1
Carnivore ERV
Carnivora (dogs, cats)
~85 mya
Syncytin-Rum1
Ruminant ERV
Ruminantia
~30 mya
Syncytins have been independently captured at least 10 times across mammalian evolution — a striking example of convergent molecular evolution.10
Arc: Neuronal Communication
The Arc (Activity-regulated cytoskeleton-associated) gene derives from a Ty3/gypsy retrotransposon Gag protein. Arc self-assembles into virus-like capsids that encapsulate RNA and mediate intercellular transfer between neurons.11
Arc is essential for synaptic plasticity, long-term potentiation, and memory consolidation. It is present across all tetrapods, indicating co-option occurred hundreds of millions of years ago.
Regulatory Elements
ERV LTRs contain transcription factor binding sites and have been co-opted as promoters and enhancers:
HERV-H: Accounts for ~2% of all polyadenylated RNA in human embryonic stem cells and correlates strongly with pluripotency markers (OCT4, NANOG, SOX2).12
MER21A LTR: Drives placental expression of the CYP19A1 aromatase gene.
~110,000 ERV/LTR elements contain transcription factor binding sites marked by epigenetic enhancer signatures.13
Contemporary Endogenization: Koala Retrovirus
The koala retrovirus (KoRV) provides a real-time example of endogenization. KoRV-A became endogenous between 50,000 and 120 years ago — the youngest known endogenizing retrovirus.14
Geographic Distribution
Population
KoRV Status
Copy Number
Northern Australia (Queensland)
Endogenous in all individuals
~70 copies/genome
Southern Australia (Victoria)
25.8% positive
<1 copy average
A 2024 study analyzing 111 pedigreed koalas documented both elimination of ERV insertions from the population (714 integrations lost) and de novo germline integrations (21 new insertions absent in parents).15 This demonstrates ERV dynamics occurring on observable timescales.
ERVs and Disease
Aberrant ERV expression has been implicated in several conditions:
Tarlinton RE, et al. (2006). Real-time reverse transcriptase PCR for the endogenous koala retrovirus reveals an association between plasma viral load and neoplastic disease. Journal of Virology 80:3401-3407.