Efficient visualisation of the relative distribution of keyword search results in a corpus data cube
Most keyword searches target precision for finding the most relevant document. However some target recall, finding all relevant documents. Our system supports high recall searches that return hundreds or thousands of relevant results. In particular, it provides a visualization that shows the distribution of search results relative to the distribution of items for the entire corpus. Such relative distributional features include over and under representation, clusters and outliers. The contribution of this paper is efficient visualisation, that is, how to provide the best relative distribution view for a given data cube size. This requirement is translated to: for which limited size meta-data summary cube are search results disambiguated the most in our relative distribution view. We identify metrics and several algorithms for such a summary cube selection.