Purpose - Learning to rank algorithms inherently faces many challenges. The most important challenges could be listed as high-dimensionality of the training data, the dynamic nature of Web information resources and lack of click-through data. High dimensionality of the training data affects effectiveness and efficiency of learning algorithms. Besides, most of learning to rank benchmark datasets do not include click-through data as a very rich source of information about the search behavior of users while dealing with the ranked lists of search results. To deal with these limitations, this paper aims to introduce a novel learning to rank algorithm by using a set of complex click-through features in a reinforcement learning (RL) model. These features are calculated from the existing click-through information in the data set or even from data sets without any explicit click-through information. Design/methodology/approach - The proposed ranking algorithm (QRC-Rank) applies RL techniques on a set of calculated click-through features. QRC-Rank is as a two-steps process. In the first step, Transformation phase, a compact benchmark data set is created which contains a set of click-through features. These feature are calculated from the original click-through information available in the data set and constitute a compact representation of click-through information. To find most effective click-through feature, a number of scenarios are investigated. The second phase is Model-Generation, in which a RL model is built to rank the documents. This model is created by applying temporal difference learning methods such as Q-Learning and SARSA. Findings - The proposed learning to rank method, QRC-rank, is evaluated on WCL2R and LETOR4.0 data sets. Experimental results demonstrate that QRC-Rank outperforms the state-of-the-art learning to rank methods such as SVMRank, RankBoost, ListNet and AdaRank based on the precision and normalized discount cumulative gain evaluation criteria. The use of the click-through features calculated from the training data set is a major contributor to the performance of the system. Originality/value - In this paper, we have demonstrated the viability of the proposed features that provide a compact representation for the click through data in a learning to rank application. These compact click-through features are calculated from the original features of the learning to rank benchmark data set. In addition, a Markov Decision Process model is proposed for the learning to rank problem using RL, including the sets of states, actions, rewarding strategy and the transition function.