What Millions of Citations Reveal About Knowledge
Explore the ResearchHave you ever wondered what happens when millions of volunteers collectively document human knowledge? Wikipedia isn't just the world's largest encyclopedia—it's becoming one of the most revealing maps of how scientific knowledge connects and evolves.
Every time an editor cites a research paper, they're not just verifying a fact; they're creating a digital thread between established science and public understanding. Now, researchers have begun tracing these threads to reveal a surprising picture of how scientific knowledge is consumed and connected in the digital age 1 .
When scientists examine these connections through a method called 'co-citation analysis,' they can visualize which fields of science are most visible to the public, which journals truly shape public understanding, and how open scientific knowledge really is.
A groundbreaking study published in PLOS ONE in 2020 pioneered this approach by analyzing nearly 1.4 million references to scientific papers across Wikipedia articles 1 8 . What they discovered reveals not just what we know, but how we know it together.
Understanding Co-citation and Knowledge Maps
Imagine two scientific papers that are often mentioned together in the same Wikipedia articles. Every time this happens, it's like an invisible vote suggesting these papers are related. This phenomenon is called co-citation—the frequency with which two documents are cited together by other documents 5 .
Co-citation strength: 3
Here's a simple way to visualize it: If Documents A and B are both cited by Documents C, D, and E, then A and B have a co-citation strength of three. The higher this number, the stronger their semantic relationship is considered to be 5 . It's like two people always being mentioned together in conversations—eventually, you assume they must be connected in some important way.
Co-citation analysis provides a forward-looking assessment of document similarity, revealing how relationships between research evolve as new citations accumulate over time 5 . This makes it particularly powerful for mapping dynamic fields of knowledge.
Wikipedia represents an extraordinary experiment in collective intelligence and distributed epistemology 1 . With over 5.5 million entries in the English version alone (as of 2018) and a top-ten global website ranking, Wikipedia has become a primary gateway to knowledge for millions worldwide 1 .
What makes Wikipedia particularly valuable for research is its emphasis on verifiability and reliable sources. The encyclopedia explicitly prioritizes academic and peer-reviewed publications, creating an intentional bridge between scholarly research and public knowledge 1 . This creates a unique opportunity to study how scientific information flows from specialized research communities to broader public understanding.
In 2020, a team of researchers set out to investigate how Wikipedia editors regard science through their references to scientific papers. Their study, "Science through Wikipedia: A novel representation of open knowledge through co-citation networks," established a novel methodology for analyzing the consumption of scientific literature through this open encyclopedia 1 8 .
The researchers adapted co-citation analysis to Wikipedia's context, generating Pathfinder networks (PFNET) that highlighted the most relevant scientific journals and categories, along with their interactions 1 . Additionally, they studied the obsolescence of references through the Price Index to understand how current Wikipedia's scientific references are 1 .
The team started with 1,433,457 references available from Altmetric.com that linked Wikipedia articles to scientific papers 1 .
Through rigorous pre-processing and linking with Elsevier's CiteScore Metrics, the sample was refined to 847,512 references from 193,802 Wikipedia articles citing 598,746 scientific articles across 14,149 journals 1 .
The researchers built co-citation networks at three different levels to obtain a holistic view: journal co-citation maps, main field co-citation maps, and field co-citation maps 1 .
Using these networks, the team could identify which journals and scientific fields were most central to Wikipedia's representation of science, how these fields interconnected, and what patterns emerged from this massive dataset of scientific consumption 1 .
The analysis revealed a significant concentration of Wikipedia's scientific references in specific fields. Medicine and Biochemistry, Genetics and Molecular Biology emerged as particularly dominant areas in Wikipedia's scientific coverage 1 .
This finding suggests that Wikipedia editors—and potentially readers—have a stronger interest in scientific fields with direct implications for human health and biology. The consumption of scientific literature through Wikipedia isn't evenly distributed across all disciplines but reflects particular public interests and concerns.
When examining which journals were most frequently cited, researchers discovered that the most important journals are multidisciplinary in nature 1 . High-impact factor journals were more likely to be cited, suggesting that Wikipedia editors prioritize authoritative and recognized sources 1 .
| Field of Science | Representation Level | Potential Reasons |
|---|---|---|
| Medicine | High | Direct health relevance, broad public interest |
| Biochemistry, Genetics & Molecular Biology | High | Rapid advancements, societal implications |
| Multidisciplinary Journals | High | Authority, breadth of content, recognition |
| Humanities | Lower (5.49%) | Different citation patterns, less verifiable facts |
Despite Wikipedia's ethos of open knowledge, the study uncovered a surprising limitation: only 13.44% of Wikipedia citations are to Open Access journals 1 . This suggests significant barriers to truly open science communication, as much of the research cited remains behind paywalls, limiting access for readers who wish to explore primary sources.
Direct access to primary sources for all readers
Paywalls prevent many readers from accessing source material
Just as biological research requires specific reagents and tools, the analysis of co-citation networks depends on specialized digital tools and platforms.
| Tool/Platform | Primary Function | Role in Research |
|---|---|---|
| Altmetric.com | Tracks alternative metrics for scholarly content | Provided initial data on Wikipedia references to scientific papers 1 |
| Elsevier's CiteScore Metrics | Provides citation metrics for journals | Enabled linking references to journal information and categories 1 |
| Pathfinder Networks (PFNET) | Algorithm for pruning and visualizing networks | Highlighted most relevant journals and their interactions 1 |
| R Software | Statistical computing and graphics | Used for data pre-processing and analysis 1 |
| Scopus ASJC Classification | Classifies journals into subject areas | Enabled categorization of scientific fields 1 |
The researchers created co-citation networks to visualize the relationships between different scientific fields and journals. These networks help identify clusters of closely related research areas and the bridges that connect different domains of knowledge.
This simplified visualization represents how different scientific fields connect through co-citation patterns in Wikipedia.
Co-citation analysis on platforms like Wikipedia does more than just create pretty network visualizations—it provides a dynamic window into how scientific fields connect and evolve 2 . By examining which papers are cited together frequently, researchers can identify emerging relationships between scientific concepts that might not be apparent through traditional literature reviews.
This approach is particularly valuable for enhancing transdisciplinarity—helping researchers identify key literature and concepts across disciplinary boundaries 2 . In our increasingly interconnected scientific world, these bridges between specialties are where some of the most exciting innovations occur.
The Wikipedia co-citation study also reveals important insights about how scientific knowledge reaches the public. The preference for high-impact multidisciplinary journals suggests that Wikipedia editors function as knowledge curators, prioritizing sources they judge to be most authoritative and significant 1 .
Furthermore, the low percentage of Open Access citations highlights a significant challenge in making scientific research truly accessible. If even Wikipedia—with its massive reach and explicit mission of sharing knowledge—primarily points readers toward paywalled content, there's still considerable work to be done in opening up the scientific literature 1 .
The analysis of co-citation networks within Wikipedia reveals more than just patterns of scientific consumption—it shows us how collective intelligence processes and organizes human knowledge.
Through the seemingly simple act of citing sources, Wikipedia editors are unconsciously mapping the connections between different fields of science, highlighting which research matters most to public understanding, and creating a living representation of how scientific knowledge fits together.
As this research continues to evolve, it promises to provide even deeper insights into the dynamic relationship between specialized research and public knowledge. The hidden patterns in Wikipedia's citations don't just tell us what we know—they reveal how we know together, creating a fascinating window into the collective scientific consciousness of our digital society.
The next time you read a Wikipedia article and click through to a scientific reference, remember that you're not just verifying a fact—you're participating in a vast network of knowledge connections that maps the very structure of human understanding.