The Hidden Science of Wikipedia

What Millions of Citations Reveal About Knowledge

Explore the Research

More Than Just Click-Editing

Have you ever wondered what happens when millions of volunteers collectively document human knowledge? Wikipedia isn't just the world's largest encyclopedia—it's becoming one of the most revealing maps of how scientific knowledge connects and evolves.

Every time an editor cites a research paper, they're not just verifying a fact; they're creating a digital thread between established science and public understanding. Now, researchers have begun tracing these threads to reveal a surprising picture of how scientific knowledge is consumed and connected in the digital age 1 .

When scientists examine these connections through a method called 'co-citation analysis,' they can visualize which fields of science are most visible to the public, which journals truly shape public understanding, and how open scientific knowledge really is.

A groundbreaking study published in PLOS ONE in 2020 pioneered this approach by analyzing nearly 1.4 million references to scientific papers across Wikipedia articles 1 8 . What they discovered reveals not just what we know, but how we know it together.

The Science of Connections

Understanding Co-citation and Knowledge Maps

What Exactly is Co-citation?

Imagine two scientific papers that are often mentioned together in the same Wikipedia articles. Every time this happens, it's like an invisible vote suggesting these papers are related. This phenomenon is called co-citation—the frequency with which two documents are cited together by other documents 5 .

Visualizing Co-citation

Document A
Cited by C, D, E
Document B
Cited by C, D, E

Co-citation strength: 3

Here's a simple way to visualize it: If Documents A and B are both cited by Documents C, D, and E, then A and B have a co-citation strength of three. The higher this number, the stronger their semantic relationship is considered to be 5 . It's like two people always being mentioned together in conversations—eventually, you assume they must be connected in some important way.

Co-citation analysis provides a forward-looking assessment of document similarity, revealing how relationships between research evolve as new citations accumulate over time 5 . This makes it particularly powerful for mapping dynamic fields of knowledge.

Wikipedia as a Global Knowledge Mirror

Wikipedia represents an extraordinary experiment in collective intelligence and distributed epistemology 1 . With over 5.5 million entries in the English version alone (as of 2018) and a top-ten global website ranking, Wikipedia has become a primary gateway to knowledge for millions worldwide 1 .

5.5M+
English Wikipedia Articles
Top 10
Global Website Ranking

What makes Wikipedia particularly valuable for research is its emphasis on verifiability and reliable sources. The encyclopedia explicitly prioritizes academic and peer-reviewed publications, creating an intentional bridge between scholarly research and public knowledge 1 . This creates a unique opportunity to study how scientific information flows from specialized research communities to broader public understanding.

Mapping Knowledge: The Wikipedia Co-citation Experiment

The Research Study at a Glance

In 2020, a team of researchers set out to investigate how Wikipedia editors regard science through their references to scientific papers. Their study, "Science through Wikipedia: A novel representation of open knowledge through co-citation networks," established a novel methodology for analyzing the consumption of scientific literature through this open encyclopedia 1 8 .

The researchers adapted co-citation analysis to Wikipedia's context, generating Pathfinder networks (PFNET) that highlighted the most relevant scientific journals and categories, along with their interactions 1 . Additionally, they studied the obsolescence of references through the Price Index to understand how current Wikipedia's scientific references are 1 .

Study Highlights

  • Analyzed 1.4M+ scientific references
  • Examined 193,802 Wikipedia articles
  • Covered 14,149 scientific journals
  • Applied co-citation network analysis
  • Published in PLOS ONE (2020)

Step-by-Step: How the Mapping Was Done

Data Collection

The team started with 1,433,457 references available from Altmetric.com that linked Wikipedia articles to scientific papers 1 .

Data Processing

Through rigorous pre-processing and linking with Elsevier's CiteScore Metrics, the sample was refined to 847,512 references from 193,802 Wikipedia articles citing 598,746 scientific articles across 14,149 journals 1 .

Network Construction

The researchers built co-citation networks at three different levels to obtain a holistic view: journal co-citation maps, main field co-citation maps, and field co-citation maps 1 .

Analysis Techniques

Using these networks, the team could identify which journals and scientific fields were most central to Wikipedia's representation of science, how these fields interconnected, and what patterns emerged from this massive dataset of scientific consumption 1 .

What Wikipedia's Citations Reveal About Science Consumption

The Dominance of Medicine and Biology

The analysis revealed a significant concentration of Wikipedia's scientific references in specific fields. Medicine and Biochemistry, Genetics and Molecular Biology emerged as particularly dominant areas in Wikipedia's scientific coverage 1 .

This finding suggests that Wikipedia editors—and potentially readers—have a stronger interest in scientific fields with direct implications for human health and biology. The consumption of scientific literature through Wikipedia isn't evenly distributed across all disciplines but reflects particular public interests and concerns.

Scientific Field Representation on Wikipedia

The Multidisciplinary Journal Preference

When examining which journals were most frequently cited, researchers discovered that the most important journals are multidisciplinary in nature 1 . High-impact factor journals were more likely to be cited, suggesting that Wikipedia editors prioritize authoritative and recognized sources 1 .

Field of Science Representation Level Potential Reasons
Medicine High Direct health relevance, broad public interest
Biochemistry, Genetics & Molecular Biology High Rapid advancements, societal implications
Multidisciplinary Journals High Authority, breadth of content, recognition
Humanities Lower (5.49%) Different citation patterns, less verifiable facts

The Open Access Gap

Despite Wikipedia's ethos of open knowledge, the study uncovered a surprising limitation: only 13.44% of Wikipedia citations are to Open Access journals 1 . This suggests significant barriers to truly open science communication, as much of the research cited remains behind paywalls, limiting access for readers who wish to explore primary sources.

13.44%
Open Access Journals

Direct access to primary sources for all readers

86.56%
Subscription-Based Journals

Paywalls prevent many readers from accessing source material

Open Access vs. Subscription Content in Wikipedia Citations

Inside the Toolbox: Research Reagent Solutions

Just as biological research requires specific reagents and tools, the analysis of co-citation networks depends on specialized digital tools and platforms.

Tool/Platform Primary Function Role in Research
Altmetric.com Tracks alternative metrics for scholarly content Provided initial data on Wikipedia references to scientific papers 1
Elsevier's CiteScore Metrics Provides citation metrics for journals Enabled linking references to journal information and categories 1
Pathfinder Networks (PFNET) Algorithm for pruning and visualizing networks Highlighted most relevant journals and their interactions 1
R Software Statistical computing and graphics Used for data pre-processing and analysis 1
Scopus ASJC Classification Classifies journals into subject areas Enabled categorization of scientific fields 1

Visualizing the Network

The researchers created co-citation networks to visualize the relationships between different scientific fields and journals. These networks help identify clusters of closely related research areas and the bridges that connect different domains of knowledge.

Simplified Co-citation Network Visualization

This simplified visualization represents how different scientific fields connect through co-citation patterns in Wikipedia.

Why This Matters: The Bigger Picture of Knowledge Mapping

Tracking How Knowledge Evolves

Co-citation analysis on platforms like Wikipedia does more than just create pretty network visualizations—it provides a dynamic window into how scientific fields connect and evolve 2 . By examining which papers are cited together frequently, researchers can identify emerging relationships between scientific concepts that might not be apparent through traditional literature reviews.

This approach is particularly valuable for enhancing transdisciplinarity—helping researchers identify key literature and concepts across disciplinary boundaries 2 . In our increasingly interconnected scientific world, these bridges between specialties are where some of the most exciting innovations occur.

Science Communication in the Digital Age

The Wikipedia co-citation study also reveals important insights about how scientific knowledge reaches the public. The preference for high-impact multidisciplinary journals suggests that Wikipedia editors function as knowledge curators, prioritizing sources they judge to be most authoritative and significant 1 .

Furthermore, the low percentage of Open Access citations highlights a significant challenge in making scientific research truly accessible. If even Wikipedia—with its massive reach and explicit mission of sharing knowledge—primarily points readers toward paywalled content, there's still considerable work to be done in opening up the scientific literature 1 .

Conclusion: The Living Map of Human Knowledge

The analysis of co-citation networks within Wikipedia reveals more than just patterns of scientific consumption—it shows us how collective intelligence processes and organizes human knowledge.

Through the seemingly simple act of citing sources, Wikipedia editors are unconsciously mapping the connections between different fields of science, highlighting which research matters most to public understanding, and creating a living representation of how scientific knowledge fits together.

As this research continues to evolve, it promises to provide even deeper insights into the dynamic relationship between specialized research and public knowledge. The hidden patterns in Wikipedia's citations don't just tell us what we know—they reveal how we know together, creating a fascinating window into the collective scientific consciousness of our digital society.

The next time you read a Wikipedia article and click through to a scientific reference, remember that you're not just verifying a fact—you're participating in a vast network of knowledge connections that maps the very structure of human understanding.

References

References