Background. Health economists recommend that when patients provide preference ratings of their own health state using utility and health state preference measures such as the feeling thermometer (FT) and standard gamble (SG), they first rate hypothetical health states (clinical marker states [CMS]). However, there is no evidence to support improvement in measurement properties with the use of CMS. The authors evaluated validity and responsiveness of the SG and FT with and without administration of the CMS. Methods. Respiratory rehabilitation improves health-related quality of life in patients with chronic airflow limitation. The authors randomized 84 patients undergoing pulmonary rehabilitation to administration of the FT and SG with (FT+ or SG+) or without (FT- or SG-) CMS before and after a standard 12-week respiratory rehabilitation program. Patients also completed the Health Utilities Index 3 (HUI3), the Chronic Respiratory Questionnaire (CRQ), and the St. George Respiratory Questionnaire (SGRQ) to evaluate longitudinal validity. Results. Marker state status did not significantly affect baseline scores on either FT or SG (FT+ 0.54, FT- 0.60, SG+ 0.68, SG- 0.66, on a scale from 0 [ dead] to 1.0 [ full health]). The improvement after the rehabilitation program was 0.14 (P < 0.001) in the FT+ group and 0.08 (P = 0.02) in the FT- group (difference between FT+ and FT- = 0.06; 95% confidence interval [CI] = -0.03 to 0.15, P = 0.17). The corresponding improvement was 0.12 (P = 0.009) in the SG+ group and 0.07 in the SG- group (P = 0.11) (difference between SG+ and SG- = 0.05; 95% CI = -0.07 to 0.17, P = 0.39). Correlations between change in the FT+ with the CRQ and SGRQ were slightly but not significantly lower than between the change in FT- and these 2 specific instruments. The correlations between change in the SG with the CRQ and the SGRQ were weaker in patients randomized to SG+ (from 0.00 to 0.17) compared to the SG- group (from 0.21 to 0.58). These differences were statistically significant for the CRQ domains of dyspnea and fatigue. Conclusion. The authors found nonsignificant trends toward superior responsiveness when patients rated hypothetical health states before rating their own health state. Although including hypothetical health states did not significantly influence the validity of the FT, it decreased the longitudinal validity of the SG. This study fails to show convincing advantage for use of marker states. Theoretical arguments in favor of marker states cannot stand alone without empirical support.