Amazon?s Mechanical Turk service makes linguistic experimentation quick, easy, and inexpensive. However, researchers have not been certain about its reliability. In a series of experiments, this paper compares data collected via Mechanical Turk to those obtained using more traditional methods One set of experiments measured the predictability of words in sentences using the Cloze sentence completion task (Taylor, 1953). The correlation between traditional and Turk Cloze scores is high (rho=0.823) and both data sets perform similarly against alternative measures of contextual predictability. Five other experiments on the semantic relatedness of verbs and phrasal verbs (how much is ?lift? part of ?lift up?) manipulate the presence of the sentence context and the composition of the experimental list. The results indicate that Turk data correlate well between experiments and with data from traditional methods (rho up to 0.9), and they show high inter-rater consistency and agreement. We conclude that Mechanical Turk is a reliable source of data for complex linguistic tasks in heavy use by psycholinguists. The paper provides suggestions for best practices in data collection and scrubbing.