University of Wollongong
Browse

Web-graph pre-compression for similarity based algorithms

Download (195.06 kB)
conference contribution
posted on 2024-11-18, 14:56 authored by Hamid Khalili, Amir Yahyavi, Farhad Oroumchian
The size of web-graph created from crawling the web is an issue for every search engine. The amount of data gathered by web crawlers makes it impossible to load the web-graph into memory which increases the number of I/O operations. Compression can help reduce run-time of web-graph processing algorithms. We introduce a new algorithm for compressing the link structure of the web graph by grouping similar pages together and building a smaller representation of the graph. The resulted graph has far less edges than the original and the similarity between adjacency lists of nodes is increased dramatically which makes it more suitable for graph compression algorithms. The characteristics of the resulted graph are similar to the original. We also ensure fast decompression of the compressed graph. Partial decompression of the web-graph is also possible. The result of the precompression can be loaded into memory and the increased similarity between adjacency nodes makes it suitable for further use of compression and web-algorithms.

History

Citation

Khalili, H, Yahyavi, A & Oroumchian, F, Web-graph pre-compression for similarity based algorithms, Proceedings of the Third International Conference on Modeling, Simulation and Applied Optimization, Sharjah, U.A.E, 2009, 20-22 .January 2009.

Language

English

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC