skip to primary navigationskip to content
 

Web Data Commons team publishes a new large hyperlink graph

last modified Dec 18, 2013 09:38 PM
The Web Data Commons team is happy to announce the publication of a new large hyperlink graph.

The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public.

The graph can be downloaded in various formats from:
http://webdatacommons.org/hyperlinkgraph

We provide initial statistics about the topology of the graph at:
http://webdatacommons.org/hyperlinkgraph/topology.html

We want to thanks the Common Crawl project for providing their great web crawl and thus enabling the creation of the WDC Hyperlink Graph.