COVID-19 Origins

Geopolitics

Bayesian review of data, publications, and narrative

The COVID-19 pandemic did not begin as a sudden outbreak from a Seafood market in December 2019. Instead, it emerged from the daily life of millions of daily commuters, with nearly a million traveling across town and to the countryside on public transit, a global sporting event, and in a series of operational failures that started months before anyone realized what was happening.

Wuhan Metro overlayed with December 15-31 casesWuhan Metro overlayed with December 15-31 cases

Watching Contagion, the 2011 pandemic thriller, feels prescient and visionary. That is, until you read about the 2002 SARS outbreak origin and timeline, and spot the adaptation of real-world events into a fictional narrative. By 2019, a previous outbreak, mostly remembered through fictionalized tales, lived in the shadows of a global zeitgeist, easing the collective acceptance of history repeating.

Most debates between zoonotic spillover and lab leak hypotheses focus on the Huanan market — lazily, or efficiently, depending on your viewpoint — mapping existing narratives to this location, rather than investigating from an unassuming baseline, looking at the locations and timelines of early cases.

Below are December 2019 cases overlay on the the greater Wuhan area with suburbs, marked with pink dots connected to Huanan market, green independent.

December Wuhan Region Cases with Population DensityDecember Wuhan Region Cases with Population Density

In 2025, Andrew Levin, former Federal Reserve board member and professor of Economics at Dartmouth, published a comprehensive spatiotemporal Bayesian analysis reviewing four layers: China, Wuhan, the Market, and December cases.1 In 2024, he previously presented an earlier draft of the work and slides.23 This is the best known paper reviewing early data and in the discussion, recognizing the data limitations, offering alternatives for locations and intermediate hosts.


  December 2019 and January 2020 Cases

The initial paper, Patients infected with 2019 novel coronavirus in Wuhan, China, cited over 37k times and the primary reference for the Huanan Seafood Market as the origin of zoonotic spillover, contains data breaking two common narratives.4

NEJM sourced Huanan/Non-Huanan CasesNEJM sourced Huanan/Non-Huanan Cases

The first is that the Huanan market is an epicenter and infection explosion. It is true that over 60% of the 41 patients in the first study had exposure to the market, but the first case isn't until December 10, 2019, and there are five days until the onset of the next two cases. Most days in December have 1-2 cases, with two days being outliers with 5 cases. When compared to the subsequent paper extending published research and covering the initial 425 confirmed cases, Huanan represents a minority of the outbreak at roughly 9% of the first cases.5

The seafood market has 10k daily visitors, making it an easy hub for spreading between people, but hardly an explosion considering the traffic volume and early infectiousness, and clearly not the only superspreader site.

Looking at the first three cases market-linked cases, one patient lived several blocks south of the market, and two lived in different neighborhoods along the Wuhan 1 line.16 In the subsequent days the cluster moves to the Huanan neighborhood with market-connected case expansion along other metro lines, with several suburban merchants testing positive.

More importantly, the data highlights something else.

Lancet December 1–January 1 Huanan CasesLancet December 1–January 1 Huanan Cases

The second key data point in the report is that the first three cases between December 1 and 10 did not have market exposure. The second extended report misses some of the early cases, but also confirms that the earliest cases did not have market exposure, and potentially adds 1-2 more non-market cases. This data works against the narrative of zoonotic spillover starting at the Huanan market and hints at a timeline earlier than December.

    Shifting Timelines

Without assuming any malice, the official reporting on the initial outbreak slowly moved earlier over time. What began as a mid-December day connected to the Huanan market later moved to the beginning of December from the Chinese government. A South China Morning Post report later revealed internal government data traced a case back to November 17, 2019.7 Again, contrary to official data, the date was further moved back by researchers to September through November based on molecular data, or through reviewing early cases in France and Italy.89 Later research found that the virus was already spreading throughout Paris in December 2019.10 When investigating this first documented French case, the patient had no connection to China and had not recently traveled abroad.

The 2024 Final Report of the Select Subcommittee on the Coronavirus Pandemic Committee on Oversight and Accountability U.S. House of Representatives found similar timeline data was presented to the World Health Organization (WHO) by Chinese scientists.11 The scientists presented an analysis of 76,000 medical records from October, November, and early December 2019, revealing 92 patients who displayed symptoms consistent with COVID-19 during this critical early period. However, the Chinese scientists claimed that none of these patients tested positive for COVID-19 antibodies according to the medical records provided, and refused providing the WHO access to raw data for independent analysis.

Mid-December Cases Mapped to Wuhan CityMid-December Cases Mapped to Wuhan City

When reviewing the WHO's April 2021 origins report closely, they note the analysis showing molecular sequence data pointing as early as late September.12 Additionally, they acknowledge reviewing global published studies' data, finding circulation evidence preceding initial December case detection by several weeks. Unfortunately, they do not cite those studies, so we are left with the aforementioned Paris case.

Reviewing the data from the two studies published in January 2020, we see that all testable data was destroyed for those patients as well. Despite the lack of transparency, it does create an additional datapoint, albeit weak, pointing to earlier contagion. Unfortunately, as of the end of 2025, neither the CDC nor WHO has updated their publicly reported timelines to address consensus research showing an earlier outbreak timeline.1314

  Analyzing the Market

Huanan Market stalls patients, swabs, and sample timelineHuanan Market stalls patients, swabs, and sample timeline

To better understand the market-origin narrative, the leading linked and cited study is the Surveillance of SARS-CoV-2 at the Huanan Seafood Market paper from 2023, which surveys 10 prior papers on the market and natural origins.15

Huanan Market Swabs and Cases, mapped against live animal zonesHuanan Market Swabs and Cases, mapped against live animal zones

The paper analyzes zoonotic origins, the market, stall-level virus testing, and sample genomics. Again, this paper betrays two key points in the market origin: no bats or pangolins are sold at the market, no samples of bats or pangolins were tested, and of all of the animals tested, all 457 animal samples from 18 species tested negative for SARS-CoV-2. Unfortunately, the majority of animal sampling did not occur until late January 2020 creating enough temporal space for doubt. It is important to note that this is less than previous market testing in the SARS-CoV-1 outbreak, which did find positive animal samples months after an initial outbreak. Additionally, the locations where live virus were found almost completely maps to known human cases, where the vendors sold shrimp and seafood products, unlinked to zoonotic transmission.1

Accidental vs. Zoonotic spillover hypotheses mapped to Huanan Market regions over timeAccidental vs. Zoonotic spillover hypotheses mapped to Huanan Market regions over time

The highest viral presence, visualized on the diagram above, was in the West Zone (streets 1-8) with 87.5% of positive environmental samples, primarily at wildlife and seafood stalls, though live virus was only isolated from three environmental samples and not from any animal products. 74 of 923 environmental samples (floors, walls, tools, sewers) tested positive, reflecting viral RNA persistence on surfaces from earlier human contamination rather than direct animal infection. Complicating common myths further, the market does not sell bats, pangolins, and rarely sells civets or raccoon dogs, with vendors providing them on-demand — no civets were sold in October, November, or December, and records for Wuhan showing less than one raccoon dog sold per day in fall 2019.1

The study's authors emphasized that environmental positives cannot prove animal infection and noted that sampling occurred after human cases were already linked to the market, making it impossible to rule out human-to-animal transmission or simple surface contamination from infected visitors.


  The SARS-CoV-2 Origins Data Set

The following is the key evidence to review when considering a hypothesis and how the evidence weighs against its probability.

    Bayesian hypothesis comparison: Accidental Lab Leak (A) vs Zoonotic Origin (Z)

To quantify the relative probability of laboratory versus zoonotic spillover origin, we can apply Bayesian inference to systematically weight each piece of evidence. This approach moves beyond intuitive assessments to provide a mathematical framework for evaluating competing hypotheses based on how well each explains the observed data.

Bayesian analysis requires three components: prior probabilities (our initial beliefs before examining evidence), likelihood ratios (how much more likely each piece of evidence is under one hypothesis versus another), and the posterior probability (our updated belief after considering all evidence). The strength of this approach lies in its ability to combine multiple independent lines of evidence while accounting for their relative diagnostic power.

The complete Bayesian calculation incorporating all evidence clusters can be expressed as:

P(LabEvidence)=P(EvidenceLab)×P(Lab)P(Evidence)P(Lab|Evidence) = \frac{P(Evidence|Lab) \times P(Lab)}{P(Evidence)}

Where

P(Evidence)=P(EvidenceLab)×P(Lab)+P(EvidenceZoonotic)×P(Zoonotic)P(Evidence) = P(Evidence|Lab) × P(Lab) + P(Evidence|Zoonotic) × P(Zoonotic)

The likelihood ratio for each evidence cluster multiplies together:

LRtotal=LRgeographic×LRbiological×LRtimeline×LRclinical×LRepidemiologicalLR_{total} = LR_{geographic} \times LR_{biological} \times LR_{timeline} \times LR_{clinical} \times LR_{epidemiological}

Evidence CategoryLikelihood Ratio Components
LR_geographicWuhan Hub × Military Games × Weather Patterns × Metro Network
LR_biologicalDEFUSE/FCS × Missing Host × Genomic Signatures × BANAL-52 × Lineages A/B
LR_timelineHospital Surge × Database Removal × MACE Blackout × Cryptic Spread
LR_clinicalResearcher Illness × PLA Hospital Files
LR_epidemiologicalMarket Cluster × Facility Proximity × CDC Relocation

Using conservative estimates, this yields a posterior probability range of 63-99.99% for laboratory origin, with the DEFUSE proposal, MACE blackout, and researcher illness creating the strongest likelihood ratios that cannot be overcome by natural spillover explanations.

    Geographic Context

Wuhan Hub vs. WIV. The prior probability of an outbreak occurring in Wuhan based on facility location versus natural risk factors. Wuhan is a transit hub (11M people), but houses the WIV, which holds the world's largest sarbecovirus collection.16 The likelihood of a natural bat-coronavirus pandemic starting specifically in Wuhan city limits (vs. Southern China/SE Asia) is low compared to the concentration of high-risk research there.

2019 Weather Patterns. The unusual temperature conditions during the cryptic spread period. September to October 2019 experienced unusually hot weather in Wuhan, with temperatures ranging from 30-35°C (86-95°F). This created an environment where people spent more time outdoors and increased ventilation usage, conditions that naturally lowered the virus's reproduction rate (R₀) through improved air circulation and UV exposure. The hot weather may have contributed to the cryptic nature of early transmission, keeping case numbers low and symptoms mild until colder temperatures in mid-October and November created more favorable conditions for viral spread in enclosed spaces.17

2019 Military World Games. The timing and scale of the international sporting event in Wuhan. The Military World Games occurred in October 2019 with over 9,000 athletes from 100+ countries. Multiple athletes reported COVID-like symptoms during and after the games, with some testing positive for antibodies months later.18 This event provided a potential superspreader mechanism and global distribution vector if the virus was already circulating cryptically in Wuhan by October 2019.

Metro Transit Network. The critical transportation infrastructure connecting WIV facilities to outbreak epicenters. Wuhan's metro system serves over 1 million daily commuters across multiple lines that create direct pathways from WIV areas to early outbreak locations. Metro Line 2 directly connects the WIV Zhengdian campus area to the Huanan Seafood Market via Hankou Railway Station, while also servicing major hospitals (PLA Hospital, Wuhan Central Hospital) and residential areas where WIV staff live connecting to adjacent lines.19 Lines 1 and 5, which also connect to WIV-adjacent areas, are locations of the first known cases outside the market cluster on east and west sides of river.1 This comprehensive transit network creates multiple transmission vectors from lab facilities to various outbreak epicenters, explaining the cryptic spread pattern and distributed clustering without requiring natural spillover.

    Biological & Genetic

DEFUSE Grant & FCS. The correlation between specific features proposed in the 2018 DEFUSE grant and the actual virus. WIV/EcoHealth proposed inserting a Furin Cleavage Site (PRRAR) into SARS-like viruses to enhance transmissibility. SARS-CoV-2 is the only virus in its clade to possess this exact feature, appearing one year later in the city where the proposal originated.20

Missing Intermediate Host. The absence of a confirmed animal host after extensive searching. After 5 years of searching, no intermediate animal host has been found. In contrast, the SARS-1 host (civet) was found in 4 months, and MERS (camel) within a year.2122 Supports the "humanized mouse" lab host theory.

Genomic Signatures. The physical structure of the viral genome. The genome lacks restriction enzyme "scars," which is consistent with natural evolution but also consistent with the "No See'm" seamless cloning techniques explicitly proposed in the DEFUSE grant.2324

BANAL-52 / Laos. Existence of natural viral backbones in the wild. Viruses like BANAL-52 found in Laos share a high similarity to the SARS-CoV-2 backbone, proving natural ingredients exist.25 However, WIV frequently sampled in this specific region, providing a mechanism for acquisition.

Lineage A & B Separation. The presence of two distinct early viral lineages. Early sequencing found two distinct lineages (A and B), suggesting two separate introduction events.26 This is common in market spillovers (multiple infected animals) but requires complex explanations (double leak or cryptic evolution) for a lab origin.

    Timeline & Digital

Hospital Traffic Surge. Retrospective analysis of hospital usage and internet search terms. Satellite imagery showed a statistically significant (2+ sigma) surge in hospital parking volume and Baidu searches for COVID-like symptoms in Sept 2019, 3 months prior to the official timeline, indicating cryptic spread.27

Database Removal / RaTG13. The scrubbing of WIV public data archives. On Sept 12, 2019 (during the surge window), WIV removed its database of 22,000 samples. They also renamed the sample BtCoV/4991 to RaTG13 to obscure its link to the 2012 Mojiang mine deaths, indicating an intent to hide origins.28

MACE Blackout. Commercial telemetry and satellite analysis of the WIV Zhengdian complex. Data indicates a total cessation of cell phone activity (zero pings) at the high-security facility from Oct 7–24, 2019, accompanied by roadblocks. This suggests a facility-wide quarantine or "stand-down" event.29

Cryptic Spread & Molecular Timeline Alignment. The timeline discrepancy between official narratives and molecular evidence. SCMP government data traced the first retrospective case to Nov 17, 2019, while TMRCA phylogenetics estimates emergence between mid-October and mid-November 2019.7 Retroactive testing in France and Italy identified SARS-CoV-2 RNA in Dec 2019 patients with no China travel history, indicating cryptic transmission before the market cluster.10 For natural spillover, this requires stuttering cryptic chains and asymptomatic carriers exporting virus before detection. For lab leak, this requires 6-8 weeks of cryptic spread that mimics diffuse community introduction rather than clustering around lab staff.

    Human & Clinical

Researcher Illnesses. American intelligence regarding specific WIV staff falling ill. Ben Hu, the lead researcher for the humanized mouse/DEFUSE experiments, along with two colleagues, allegedly required hospitalization in Nov 2019 with symptoms consistent with COVID-19/pneumonia. Hu later denied these claims, though U.S. intelligence agencies continue to stand by their information.30

PLA Hospital Files. Early medical records from military hospitals. Genomic files for patients were created on Dec 10, 2019, at the PLA Hospital. This facility is located on the eastern side of Wuhan (east bank of the Yangtze River) near the WIV academic/shuttle hub, suggesting spread in the residential center prior to the market explosion on the west bank.31

    Epidemiological

Huanan Market Cluster. A mid-December minor cluster representing less than 9% of December-January cases in Wuhan. The market cluster was not among the first confirmed cases but became a notable concentration point. The earliest market-associated cases were seafood sellers, with the very first stall case linked to a shrimp vendor rather than wildlife stalls. While initially considered strong evidence for natural spillover, no wildlife tested positive, and the market is on Metro Line 2, which directly connects to WIV campus one and services staff residential areas.15

Facility Proximity. Distance between the labs and the outbreak center. The high-risk Zhengdian lab is 20 miles from the market, but WIV operates shuttle services connecting to Campus I, which has direct Metro Line 2 access to the market area. The Wuhan CDC is only 350m from the market, though it held lower-risk collections. Notably, the earliest confirmed cases were concentrated on the eastern side of Wuhan near Metro Line 5, which connects to WIV-adjacent areas, rather than near the market itself—suggesting transmission patterns originating from lab facilities rather than the market cluster.32

CDC Relocation. Operational changes at the local CDC. The Wuhan CDC moved facilities in Fall 2019, located just 350m from the market. While a potential leak source, it lacks the specific chimeric research capabilities of the WIV.32


  References

  Footnotes

  1. Levin, A. T. "Bayesian Assessment of COVID-19 Origins" NBER Working Paper No. 33428 (2025) 2 3 4 5

  2. Levin, A. T. "Presentation: Bayesian Assessment of COVID-19 Origins" Hoover Institution Workshop (2024)

  3. Levin, A. T. "Slides: Bayesian Assessment of COVID-19 Origins" Hoover Institution Presentation Slides (2024)

  4. Huang, C. et al. "Patients infected with 2019 novel coronavirus in Wuhan, China" The Lancet (2020)

  5. Li, Q. et al. "Early Transmission Dynamics in Wuhan" New England Journal of Medicine (2020)

  6. Statement of the Wuhan Institute of Virology, Chinese Academy of Sciences Wuhan Institute of Virology (2020)

  7. Ma, J. "China's first confirmed Covid-19 case traced back to November 17" South China Morning Post (2020) 2

  8. To, K. K-W. et al. "Lessons learned 1 year after SARS-CoV-2 emergence" Emerging Microbes & Infections (2021)

  9. Platto, S. et al. "COVID19: an announced pandemic" Cell Death & Disease (2020)

  10. Deslandes, A. et al. "SARS-CoV-2 was already spreading in France" International Journal of Antimicrobial Agents (2020) 2

  11. Select Subcommittee on the Coronavirus Pandemic. "Final Report: After Action Review of the COVID-19 Pandemic" U.S. House of Representatives (2024)

  12. WHO-convened Global Study of Origins of SARS-CoV-2. "Joint WHO-China Study: China Part" World Health Organization (2021)

  13. CDC Museum. "COVID-19 Pandemic Timeline" Centers for Disease Control and Prevention.

  14. WHO. "WHO's response to COVID-19" World Health Organization (2020)

  15. Liu, W. J. et al. "Surveillance of SARS-CoV-2 at the Huanan Seafood Market" Nature (2023) 2

  16. McCaul, M. "The Origins of COVID-19: An Investigation of the Wuhan Institute of Virology" House Foreign Affairs Committee (2021)

  17. Weather Spark. "Historical Weather during 2019 in Wuhan, China" Weather Spark.

  18. Rogin, J. "Congressional investigation will delve into the Wuhan Military Games" The Washington Post (2021)

  19. Quay, S. C. "WHO should probe Wuhan Metro Line 2 for spread of pandemic" The Sunday Guardian (2021)

  20. Lerner, S. et al. "Leaked Grant Proposal Details High-Risk Coronavirus Research" The Intercept (2021)

  21. Select Subcommittee on the Coronavirus Pandemic. "Final Report: COVID-19 Pandemic Review" U.S. House of Representatives (2024)

  22. Mallapaty, S. "The hunt for the origins of SARS-CoV-2 will look beyond the Wuhan market" Nature (2021)

  23. Bruttel, V. et al. "Endonuclease fingerprint indicates a synthetic origin of SARS-CoV-2" bioRxiv (2022)

  24. Deigin, Y. et al. "The genetic structure of SARS-CoV-2 does not rule out a laboratory origin" BioEssays (2021)

  25. Temmam, S. et al. "Bat coronaviruses related to SARS-CoV-2 and infectious for human cells" Nature (2022)

  26. Pekar, J. et al. "The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2" Science (2022)

  27. Nsoesie, E. et al. "Analysis of hospital traffic and search engine data in Wuhan China indicates early disease activity in the Fall of 2019" Harvard DASH (2020)

  28. Bloom, J. "Recovery of Deleted Deep Sequencing Data Sheds Light on the Early Evolution of SARS-CoV-2" PLOS Biology (2021)

  29. Burr, R. et al. "An Analysis of the Origins of the COVID-19 Pandemic: Interim Report" Senate Committee on Health Education, Labor and Pensions (2022)

  30. Gordon, M. et al. "U.S.-Funded Scientist Among Three Chinese Researchers Who Fell Ill Amid Early Covid-19 Outbreak" Wall Street Journal (2023)

  31. U.S. House of Representatives. "Memo: Re: New Evidence Regarding COVID-19 Origins" Select Subcommittee on the Coronavirus Pandemic (2024)

  32. Xiao, B. "The possible origins of 2019-nCoV coronavirus" ResearchGate (2020) 2

Published on November 29, 2025, updated on December 1, 2025

17 min read

COVID-19 Origins | Stephen M. Walker II