Two of the most salient features of music are the blending of pitch and the matching of time. I propose here a possible evolutionary precursor of human music based on a process I call “contagious heterophony”. Heterophony is a form of pitch blending in which individuals generate similar musical lines but in which these lines are poorly synchronized. A wonderful example can be found in the howling of wolves. Each wolf makes a similar call but the resultant chorus is poorly blended in time. The other major feature of the current hypothesis is contagion. Once one animal starts calling, other members of the group join in through a spreading process. While this type of heterophonic calling is well-represented in nature, synchronized polyphony is not. In this article, I discuss evolutionary scenarios by which the human capacity to integrate musical parts in pitch-space and in time may have emerged in music. In doing so, I make mention of neuroimaging findings that shed light on the neural mechanisms of vocal imitation and metric entrainment in humans, two key processes underlying musical integration.