Web Data Commons team publishes a new large hyperlink graph

Submitted by Dr Petra Vertes on Wed, 18/12/2013 - 21:38

The Web Data Commons team is happy to announce the publication of a new large hyperlink graph.

The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public.

The graph can be downloaded in various formats from:
http://webdatacommons.org/hyperlinkgraph

We provide initial statistics about the topology of the graph at:
http://webdatacommons.org/hyperlinkgraph/topology.html

We want to thanks the Common Crawl project for providing their great web crawl and thus enabling the creation of the WDC Hyperlink Graph.

Web Data Commons team publishes a new large hyperlink graph

Contact us

Study at Cambridge

About the University

Research at Cambridge