Publication Details

Khalili, H, Yahyavi, A & Oroumchian, F, Web-graph pre-compression for similarity based algorithms, Proceedings of the Third International Conference on Modeling, Simulation and Applied Optimization, Sharjah, U.A.E, 2009, 20-22 .January 2009.


The size of web-graph created from crawling the web is an issue for every search engine. The amount of data gathered by web crawlers makes it impossible to load the web-graph into memory which increases the number of I/O operations. Compression can help reduce run-time of web-graph processing algorithms. We introduce a new algorithm for compressing the link structure of the web graph by grouping similar pages together and building a smaller representation of the graph. The resulted graph has far less edges than the original and the similarity between adjacency lists of nodes is increased dramatically which makes it more suitable for graph compression algorithms. The characteristics of the resulted graph are similar to the original. We also ensure fast decompression of the compressed graph. Partial decompression of the web-graph is also possible. The result of the precompression can be loaded into memory and the increased similarity between adjacency nodes makes it suitable for further use of compression and web-algorithms.