In many mainland Chinese universities, undergraduate students specializing in English language and applied linguistics are required to write a dissertation, in English, of about 5000 words exploring some aspect of original research. This is a task which is of considerable difficulty not only at the genre or discourse level but also at the lexico-grammatical level. The teaching of academic writing in Chinese universities tends to focus on general discourse-level features such as "move" structures, while the more micro, form-focused knowledge and skills are comparatively underexplored and usually based on intuition or an arbitrary selection of features. This paper presents a data-driven, pedagogically oriented analysis of a corpus of 78 Chinese undergraduate dissertations alongside 2 comparison native-speaker corpora, focusing on characteristically problematic areas, as revealed through keywords analyses and complementary qualitative investigations of collocations and word clusters. Most of the overuse of words and phrases turns out to involve function words and high-frequency "common" words which are typically not the focus of academic writing instruction. These usages are highly patterned rather than random, thus being in principle amenable to teaching using a data-driven pedagogical approach. The paper argues that by systematically deriving potential teaching items from a learner corpus, EAP writing pedagogy can be more needs-based and learner-centered, which are two facilitating conditions for successful form-focused instruction.