Using cost-sensitive learning and feature selection algorithms to improve the performance of imbalanced classification
Imbalanced data problem is widely present in network intrusion detection, spam filtering, biomedical engineering, finance, science, being a challenge in many real-life data-intensive applications. Classifier bias occurs when traditional classification algorithms are used to deal with imbalanced data. As already known, the General Vector Machine (GVM) algorithm has good generalization ability, though it does not work well for the imbalanced classification. Additionally, the state-of-the-art Binary Ant Lion Optimizer (BALO) algorithm has high exploitability and fast convergence rate. Based on these facts, we have proposed in this paper a Cost-sensitive Feature selection General Vector Machine (CFGVM) algorithm based on GVM and BALO algorithms to tackle the imbalanced classification problem, delivering different cost weights to different classes of samples. In our method, the BALO algorithm determines the cost weights and extract more significant features to improve the classification performance. Experiments conducted on eleven imbalanced data sets have shown that the CFGVM algorithm significantly improves the classification performance of minority class samples. By comparing with similar algorithms and state-of-the-art algorithms, the proposed algorithm significantly outperforms in performance and produces better classification results.
Feng, F., Li, K., Shen, J., Zhou, Q. & Yang, X. (2020). Using cost-sensitive learning and feature selection algorithms to improve the performance of imbalanced classification. IEEE Access, 8 (1), 69979-69996.