Naive bayes classification for email spam detection
Advanced Interdisciplinary Applications of Machine Learning Python Libraries for Data Science
Email is one of the cheapest forms of communication that every internet user utilizes, from individuals to businesses. Because of its simplicity and wide availability, it is vulnerable to threats by perpetrators through spam with malicious intents, known to have resulted in huge financial losses and threatened the privacy of millions of individuals. Not all spam emails are malicious; however, they are a nuisance to users regardless. Because of these reasons, there is a dire need for good spam detection systems that are automatically able to identify emails as spam. This chapter aims to do exactly that by proposing a Naive Bayes approach to create a spam detection system by using a combination of the Enron Email dataset and the 419 fraud dataset. The datasets are lemmatized in order to boost performance in terms of execution time and accuracy. Grid search is one technique adopted to maximize accuracy. Finally, the model is evaluated through various metrics and a comparative analysis is performed.
Open Access Status
This publication is not available as open access