Overview
- The 2005 Chimpanzee Sequencing and Analysis Consortium found that human and chimpanzee genomes are 98.77% identical at aligned nucleotide positions, confirming chimpanzees as our closest living relatives.
- When insertions and deletions (indels) are included, total genomic divergence rises to approximately 4–5%, but the 98.7% figure for single-nucleotide differences in aligned regions is well established and not in scientific dispute.
- The small fraction of DNA that does differ between the species includes genes involved in speech, brain development, and immune function—and much of what makes humans distinct may stem from changes in gene regulation rather than in protein-coding sequences themselves.
The idea that humans and chimpanzees share the vast majority of their DNA has been a cornerstone of molecular biology since Mary-Claire King and Allan Wilson first reported in 1975 that the two species are more than 99% identical at the protein level.1 Three decades later, the completion of the chimpanzee genome sequence allowed a direct, genome-wide comparison that confirmed and refined this finding: across 2.4 billion bases of aligned sequence, humans and chimpanzees differ at only 1.23% of nucleotide positions.2 This figure—often rounded to 98.7% or 98.8% identity—is one of the most replicated results in comparative genomics and provides powerful evidence that humans and chimpanzees share a recent common ancestor.2, 3
The chimpanzee genome project
In September 2005, the Chimpanzee Sequencing and Analysis Consortium published the initial draft sequence of the chimpanzee genome in Nature. The genome was sequenced from a single captive-born male chimpanzee named Clint (Pan troglodytes verus) using a whole-genome shotgun approach that covered approximately 94% of the chimpanzee genome.2 By aligning this sequence against the human reference genome, the consortium generated what remains the most comprehensive catalog of genetic differences that have accumulated since the two lineages diverged from their common ancestor.
The headline finding was straightforward: in regions of the genome where human and chimpanzee sequences could be reliably aligned one-to-one, the two species differed at approximately 1.23% of nucleotide positions. This figure represents single-nucleotide substitutions only—places where one base has been replaced by another in one lineage or the other since the two species last shared a common ancestor.2 The consortium also identified approximately 35 million single-nucleotide changes between the two genomes, along with roughly 5 million insertion and deletion events (indels) and various larger-scale chromosomal rearrangements.2
The 1.23% substitution rate was not surprising. Earlier studies comparing smaller stretches of DNA had produced similar estimates, and the protein-level comparisons dating back to King and Wilson's pioneering work had long suggested a figure in this range.1, 4 What the genome project added was comprehensiveness: rather than extrapolating from a handful of genes, researchers could now examine the full breadth of the two genomes side by side.2
What the number means and does not mean
The "98.7% identical" figure is both accurate and incomplete, and understanding what it represents is essential to interpreting it correctly. The number refers specifically to single-nucleotide identity in aligned regions—that is, stretches of DNA where the human and chimpanzee sequences can be matched up base by base. In these regions, for every 100 nucleotides, roughly 99 are the same and roughly 1 differs.2
This figure does not, however, capture the full extent of genomic differences between the species. In addition to single-nucleotide substitutions, the two genomes differ through insertions and deletions (indels)—stretches of DNA that are present in one genome but absent from the other. These range from single bases to segments tens of thousands of bases long and collectively affect approximately 3% of the genome.2, 5 In 2002, Roy Britten of the California Institute of Technology published an analysis in PNAS showing that when indels are counted alongside substitutions, the total divergence between human and chimpanzee DNA rises to approximately 5% in his sample of 779 kilobases.5 The Chimpanzee Sequencing Consortium's own analysis confirmed that indels account for roughly 3% additional divergence on top of the 1.23% substitution rate, bringing the total to approximately 4%.2
Furthermore, some regions of the genome simply cannot be aligned between the two species at all. These include areas where large-scale duplications, deletions, or rearrangements have occurred in one lineage but not the other. Approximately 600 megabases of human sequence could not be aligned to the chimpanzee genome, and a similar amount of chimpanzee sequence could not be aligned to the human genome.2 These non-alignable regions are excluded from the 98.77% identity calculation entirely.
None of this undermines the significance of the 98.7% figure. It simply means that the number describes one specific and well-defined measurement—single-nucleotide identity in aligned sequences—rather than a vague claim about overall similarity. The figure is robust, reproducible, and not in scientific dispute.2, 3
Sources of genomic divergence between humans and chimpanzees2, 5
Comparative genomics across primates
The human-chimpanzee comparison gains additional meaning when placed in the context of other primate genomes. In 2012, Aylwyn Scally, Richard Durbin, and an international consortium published the genome sequence of the western lowland gorilla (Gorilla gorilla gorilla) in Nature. Their analysis confirmed that humans and chimpanzees are more closely related to each other than either is to gorillas.6 The average sequence divergence between humans and gorillas is approximately 1.62%, compared with 1.23% between humans and chimpanzees and 1.63% between chimpanzees and gorillas.6 This pattern—with humans and chimpanzees forming a clade to the exclusion of gorillas—is supported by the majority of the genome, though approximately 15% of the human genome is actually closer to the gorilla genome than to the chimpanzee genome, a phenomenon explained by incomplete lineage sorting.6
The broader primate phylogeny tells a consistent story. The orangutan genome, published in 2011, shows approximately 3.1% sequence divergence from the human genome.7 Rhesus macaque divergence from humans is approximately 6.5%.8 These increasing levels of divergence correspond precisely to the branching order predicted by the fossil record and by earlier molecular studies: orangutans diverged from the African ape lineage approximately 12–16 million years ago, gorillas split off approximately 9–10 million years ago, and humans and chimpanzees diverged most recently, approximately 6–7 million years ago.6, 7, 9
Sequence divergence across the great apes2, 6, 7, 8
| Species pair | Nucleotide divergence | Estimated divergence time |
|---|---|---|
| Human – Chimpanzee | ~1.23% | ~6–7 Ma |
| Human – Gorilla | ~1.62% | ~9–10 Ma |
| Chimpanzee – Gorilla | ~1.63% | ~9–10 Ma |
| Human – Orangutan | ~3.1% | ~12–16 Ma |
| Human – Rhesus macaque | ~6.5% | ~25–30 Ma |
Genes that differ: FOXP2, HAR1, and beyond
While 98.7% identity is the dominant pattern, the 1.23% that differs includes some of the most biologically consequential changes in human evolution. Several genes and genomic regions have been identified where natural selection appears to have driven rapid change on the human lineage, producing functional differences that may underlie distinctively human traits.
Perhaps the most celebrated example is FOXP2, a transcription factor gene involved in the neural circuits underlying speech and language. In 2002, Wolfgang Enard and colleagues at the Max Planck Institute for Evolutionary Anthropology showed that the FOXP2 protein differs at just two amino acid positions between humans and chimpanzees, but that these changes show the molecular signature of positive selection—they spread through the human population far more rapidly than would be expected under neutral evolution.10 Mutations in FOXP2 cause severe speech and language disorders in humans, suggesting that the human-specific amino acid changes may have been important in the evolution of our capacity for spoken language.10
An even more striking example comes from the human accelerated regions (HARs)—segments of the genome that were highly conserved across mammals for hundreds of millions of years but then changed rapidly on the human lineage after the split from chimpanzees. In 2006, Katherine Pollard and colleagues identified 49 such regions, the most dramatic of which, HAR1, is a 118-base-pair sequence that accumulated 18 human-specific substitutions despite having changed at only two positions between chickens and chimpanzees over approximately 300 million years of evolution.11 HAR1 is part of a novel RNA gene expressed in Cajal-Retzius neurons during a critical period of cortical development (weeks 7 to 19 of gestation), suggesting a role in the formation of the characteristically expanded human neocortex.11
Other genes showing accelerated evolution on the human lineage include ASPM and microcephalin (MCPH1), both of which regulate brain size and show evidence of positive selection throughout hominin evolution.12, 13 The Chimpanzee Sequencing Consortium identified 585 genes with evidence of accelerated evolution on the human lineage, many of them involved in sensory perception, immune defense, and reproduction.2
Gene regulation versus coding differences
One of the most important insights from the human-chimpanzee genomic comparison is that much of what distinguishes the two species may lie not in the protein-coding sequences themselves but in the regulatory machinery that controls when, where, and how much each gene is expressed. This idea was first articulated by King and Wilson in their 1975 paper, where they proposed that "the organismal differences between chimpanzees and humans are caused more by changes in gene regulation than by changes in structural genes."1 Forty years of subsequent research have largely vindicated this hypothesis.
In 2004 and 2005, Philipp Khaitovich and colleagues at the Max Planck Institute published comprehensive studies of gene expression across multiple tissues in humans and chimpanzees. They found that approximately 10.6% of reliably detected genes showed differential expression between the species, with the brain showing the fewest differences and the liver the most.14 Intriguingly, although brain genes have changed less overall, genes expressed in the brain have accumulated more expression changes on the human lineage than on the chimpanzee lineage, suggesting that the human brain has been subject to particularly intense regulatory evolution.14
The human accelerated regions provide a concrete mechanism for regulatory change. Most HARs do not code for proteins; instead, they appear to function as enhancers—regulatory sequences that control the expression of nearby genes. Pollard and colleagues found that many HARs lie near genes involved in brain development and transcriptional regulation, suggesting that the rapid evolution of these non-coding sequences may have altered the developmental programs that produce the human brain without changing the proteins themselves.11, 15
This regulatory perspective helps explain a puzzle that has long fascinated biologists: how can two species that share 98.7% of their DNA be so different in anatomy, behavior, and cognition? The answer, increasingly, is that small changes in when and where genes are turned on and off can have cascading effects on development, producing large phenotypic differences from modest genetic ones. A gene expressed for an extra week during fetal brain development, or at twice the normal level in a particular cell type, can produce a substantially different organ even if the protein it encodes is identical in both species.1, 14
The molecular clock and divergence time
The degree of genetic divergence between humans and chimpanzees can be used to estimate when the two lineages last shared a common ancestor, a method known as the molecular clock. The logic is straightforward: if mutations accumulate at a roughly constant rate over time, the number of differences between two species is proportional to the time since they diverged.9
Early molecular clock estimates, calibrated using the fossil record of other primate divergences, placed the human-chimpanzee split at approximately 5–7 million years ago.16 More recent analyses have tended to push this date somewhat older. In 2012, Kevin Langergraber and colleagues used genetic parentage data from wild chimpanzees and mountain gorillas to directly measure average generation times in these species, finding them substantially longer than previously assumed—more than 24 years for chimpanzees and more than 19 years for gorillas. Because longer generation times mean fewer opportunities for mutations to occur per unit of calendar time, these findings imply a slower mutation rate and therefore an older divergence date. Langergraber's analysis placed the human-chimpanzee split at 7–8 million years ago, and possibly as early as 13 million years ago under some assumptions.17
The Chimpanzee Sequencing Consortium's own analysis estimated the human-chimpanzee divergence at approximately 6.3 million years ago based on the observed substitution rate, while noting that this estimate is sensitive to assumptions about generation time and mutation rate.2 The gorilla genome project placed the human-chimpanzee speciation event at approximately 6 million years ago and the human-chimpanzee-gorilla speciation event at approximately 10 million years ago.6 A consensus range of approximately 6–7 million years for the human-chimpanzee split is broadly accepted, though the precise date remains an active area of research.9, 17
Crucially, this molecular estimate is consistent with the fossil record. The oldest known potential hominins—Sahelanthropus tchadensis (approximately 7 million years ago) and Orrorin tugenensis (approximately 6 million years ago)—fall precisely within the window predicted by the molecular clock, providing independent corroboration from two entirely different lines of evidence.18, 19
Addressing common objections
Critics of evolutionary biology, particularly within the young-earth creationist movement, have raised several objections to the significance of human-chimpanzee DNA similarity. These objections deserve careful examination, because some of them touch on genuine complexities in the data, even if the conclusions drawn from them are unwarranted.
The most common objection is that the true figure is not 98.7% but something lower—95%, or 85%, or even 70%. These alternative figures typically arise from including indels in the calculation, from counting non-alignable regions as differences, or from methodological choices that inflate the apparent divergence.5 As discussed above, when indels are included alongside substitutions, total divergence does rise to approximately 4–5%.2, 5 This is a legitimate measurement, but it does not invalidate the 98.77% figure—the two numbers simply describe different things. The substitution rate of 1.23% measures base-by-base identity in alignable regions; the 4–5% figure includes additional sources of variation. Both are real, both are well documented, and both confirm that humans and chimpanzees are extraordinarily similar at the DNA level by any comparative standard. For context, two individual humans typically differ at approximately 0.1% of nucleotide positions, so the human-chimpanzee difference is roughly ten times greater than the variation within our own species.2
A second objection is that "similarity does not prove common ancestry"—that a shared designer could have used similar DNA sequences in different organisms just as an engineer might reuse components across different products. This argument confuses overall similarity with the specific pattern of similarities and differences. If two genomes were independently designed, one would expect the differences to be distributed in whatever pattern best serves each organism's functional needs. Instead, the differences between human and chimpanzee genomes follow a precise nested hierarchical pattern that matches the branching order of the primate phylogeny: humans are more similar to chimpanzees than to gorillas, more similar to gorillas than to orangutans, and more similar to orangutans than to macaques.2, 6, 7 This nested pattern is exactly what common descent predicts and would be a remarkable coincidence under any other hypothesis.
Moreover, the two genomes share not only functional sequences but also pseudogenes (broken, nonfunctional remnants of former genes), endogenous retroviruses (viral DNA inserted into the genome millions of years ago), and other "molecular fossils" in the same locations. These shared imperfections have no functional explanation but are readily explained by inheritance from a common ancestor in whose genome the original insertion or mutation occurred.2, 20
A third objection holds that even a 1.23% difference represents approximately 35 million nucleotide changes—far too many, critics argue, to have accumulated in 6–7 million years. This objection reflects a misunderstanding of mutation rates. The human germline mutation rate is approximately 1–1.5 × 10-8 per base pair per generation.21 With a generation time of roughly 25 years and a genome of 3 billion base pairs, this yields approximately 35–45 new mutations per individual per generation. Over 6–7 million years (roughly 250,000–280,000 generations), mutations accumulate on both the human and chimpanzee lineages independently, producing a combined total comfortably in the range of tens of millions of differences. The observed divergence is entirely consistent with known mutation rates and known timescales.2, 21
Why it matters
The 98.7% figure is not merely a curiosity or a debating point. It has concrete scientific implications that extend across medicine, evolutionary biology, and our understanding of what makes humans human. In medicine, the close genetic similarity between humans and chimpanzees explains why chimpanzees have historically been used as model organisms for studying human diseases, and it informs ongoing research into why the two species differ in their susceptibility to conditions such as HIV/AIDS, Alzheimer's disease, and certain cancers.2
In evolutionary biology, the detailed comparison of human and chimpanzee genomes has provided an unprecedented catalog of the genetic changes associated with the evolution of bipedalism, large brains, language, and other distinctively human traits. By identifying which genes changed, which regulatory elements were altered, and which regions of the genome were subject to natural selection, researchers can move from simply documenting that humans evolved from ape-like ancestors to understanding the molecular mechanisms by which that evolution occurred.2, 10, 11
Perhaps most fundamentally, the human-chimpanzee genome comparison demonstrates that evolution operates not by wholesale rewriting of genetic blueprints but by tinkering—modifying existing components, adjusting regulatory controls, and repurposing ancient genetic machinery for new functions. The overwhelming majority of our genome is inherited, nearly unchanged, from the ancestor we share with chimpanzees. The small fraction that differs has been sufficient to produce all the anatomical, physiological, and cognitive differences between the two species—a testament to the power of small genetic changes accumulated over millions of years of natural selection.1, 2
References
Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels
Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans
Accelerated evolution of the ASPM gene controlling brain size begins prior to human brain expansion
Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution
Differences between human and chimpanzee genomes and their implications in gene expression, protein functions and biochemical properties of the two species