University of Wollongong
Browse

A scalable lightweight distributed crawler for crawling with limited resources

Download (189.31 kB)
conference contribution
posted on 2024-11-14, 10:25 authored by Milly Kc, Markus HagenbuchnerMarkus Hagenbuchner, Ah Chung Tsoi
Web page crawlers are an essential component in a number of Web applications. The sheer size of the Internet can pose problems in the design of Web crawlers. All currently known crawlers implement approximations or have limitations so as to maximize the throughput of the crawl, and hence, maximize the number of pages that can be retrieved within a given time frame. This paper proposes a distributed crawling concept which is designed to avoid approximations, to limit the network overhead, and to run on relatively inexpensive hardware. A set of experiments, and comparisons highlight the effectiveness of the proposed approach.

History

Citation

Kc, M. W., Hagenbuchner, M. & Tsoi, A. (2008). A scalable lightweight distributed crawler for crawling with limited resources. IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology (pp. 663-666). Sydney, Australia: IEEE Computer Society.

Parent title

Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2008

Pagination

663-666

Language

English

RIS ID

25476

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC