Compact and discriminative visual codebooks are preferred in many visual recognition tasks. In the literature, a few researchers have taken the approach of hierarchically merging visual words of a initial large-size code-book, but implemented this idea with different merging criteria. In this work, we show that by defining different class-conditional distribution function and parameter estimation method, these merging criteria can be unified under a single probabilistic framework. More importantly, by adopting new distribution functions and/or parameter estimation methods, we can generalize this framework to produce a spectrum of novel merging criteria. Two of them are particularly focused in this work. For one criterion, we adopt the multinomial distribution to model each object class, and for the other criterion we propose a max-margin-based parameter estimation method. Both theoretical analysis and experimental study demonstrate the superior performance of the two new merging criteria and the general applicability of our probabilistic framework.