In this article we describe a user-driven adaptive method to control the sonic response of digital musical instruments using information extracted from the timbre of the human voice. The mapping between heterogeneous attributes of the input and output timbres is determined from data collected through machine-listening techniques and then processed by unsupervised machine-learning algorithms. This approach is based on a minimum-loss mapping that hides any synthesizer-specific parameters and that maps the vocal interaction directly to perceptual characteristics of the generated sound. The mapping adapts to the dynamics detected in the voice and maximizes the timbral space covered by the sound synthesizer. The strategies for mapping vocal control to perceptual timbral features and for automating the customization of vocal interfaces for different users and synthesizers, in general, are evaluated through a variety of qualitative and quantitative methods.