There is evidence, mostly with phones (consonants & vowels), that visual concomitants of articulation facilitate speech perception. Here the visual concomitants of lexical tone are considered. In tone languages fundamental frequency variations signal lexical meaning. In a word identification experiment with auditory-visual words differing only in tone, Cantonese perceivers performed above chance in a Visual Only condition. A subsequent study showed augmentation of word pair discrimination in noise in an Auditory-Visual compared to an Auditory Only condition for Cantonese, tonal Thai speakers, and even non-tone Australian speakers). The source of this perceptual information was sought in an OPTOTRAK production study of a Cantonese speaker. Functional Data Analysis (FDA) and Principal Component (PC) extraction suggests that the salient PCs to distinguish tones involve rigid motion of the head rather than non-rigid face motion. Results of a final perception study using OPTOTRAK output in which rigid or non-rigid motion could be presented independently in tone differing or phone differing conditions, suggests that non-rigid motion is most useful for the discrimination of phones, whereas rigid motion is most useful for the discrimination of tones.