A clinician can treat patients with an empirical intervention but the treatment of a patient is enhanced enormously if the correct diagnosis is known. Diagnostic investigations are therefore an important part of modern medicine and diagnostic studies aim to evaluate the accuracy of the test and also provide information on its utility in the management of the patient. The diagnostic utility of the test is best evaluated by metrics that are expressed in pairs such as sensitivity and specificity, positive and negative predictive value and/or positive and negative likelihood ratios. As with any other study design there are a variety of issues that can limit the validity of the study. Selection and spectrum bias can alter the type of patient enrolled into a study and influence the apparent accuracy of a test. A case‐control design is an extreme example and this design usually overestimates the accuracy of a test. Lack of blinding of those administering the new test or the reference standard will also tend to give an overly optimistic view of the performance of the test. Selecting the most appropriate reference standard to compare the new test to is also a challenge, especially when there is no test that is very accurate.