last modified
Dec 18, 2013 09:38 PM
The Web Data Commons team is happy to announce the publication of a new large hyperlink graph.
The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public.
The graph can be downloaded in various formats from:
http://webdatacommons.org/hyperlinkgraph
We provide initial statistics about the topology of the graph at:
http://webdatacommons.org/hyperlinkgraph/topology.html
We want to thanks the Common Crawl project for providing their great web crawl and thus enabling the creation of the WDC Hyperlink Graph.
The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public.
The graph can be downloaded in various formats from:
http://webdatacommons.org/hyperlinkgraph
We provide initial statistics about the topology of the graph at:
http://webdatacommons.org/hyperlinkgraph/topology.html
We want to thanks the Common Crawl project for providing their great web crawl and thus enabling the creation of the WDC Hyperlink Graph.