Statistical methods and software used in nutrition and dietetics research: A review of the published literature using text mining

Nutrition and Dietetics


Aim: Dietitians must be statistically literate to effectively interpret the scientific literature underpinning the discipline. Despite this, no study has been conducted that objectively identifies common statistical methods and packages specific to current nutrition and dietetics literature. This study aimed to identify statistical methods and software frequently used in nutrition and dietetics research. Methods: A text mining approach using the bag-of-words method was applied to a random sample of articles obtained from all journals in the ‘Nutrition and Dietetics’ subject category within the SCImago Journal and Country Rank portal and published in 2018. A list of 229 statistical terms and 19 statistical software packages was developed to define the search terms to be mined. Statistical information from the methods section of included articles was extracted into Microsoft Excel (2016) for data cleaning. Statistical analyses were conducted in R (Version 3.6.0) and Microsoft Excel (2016). Results: Seven hundred and fifty-seven journal articles were included. Numerical descriptive statistics were the most common statistical method group, appearing in 83.2% of articles (n = 630). This was followed by specific hypothesis tests (68.8%, n = 521), general hypothesis concepts (58.4%, n = 442), regression (44.4%, n = 336), and ANOVA (30.8%, n = 233). IBM SPSS statistics was the most common statistical software package, reported in 41.7% of included articles. Conclusion: These findings provide useful information for educators to evaluate current statistics curricula and develop short courses for continuing education. They may also act as a starting point for dietitians to educate themselves on typical statistical methods they may encounter.

