Submitted by Dr Petra Vertes on Wed, 18/12/2013 - 21:38
The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public.
The graph can be downloaded in various formats from:
http://webdatacommons.org/hyperlinkgraph
We provide initial statistics about the topology of the graph at:
http://webdatacommons.org/hyperlinkgraph/topology.html
We want to thanks the Common Crawl project for providing their great web crawl and thus enabling the creation of the WDC Hyperlink Graph.