Introduction The medical education community has implemented writing exercises that foster critical analysis and nurture reflective capacity. The REFLECT rubric (Wald et al. 2012) was developed to address the challenge of assessing these written reflections. The objective of this replication work is to explore the reproducibility of the reliability characteristics presented by the REFLECT developers. Methods Five raters evaluated narratives written by medical students and experienced clinicians using the REFLECT rubric. Reliability across rubric domains was determined via intraclass correlation coefficient and internal consistency was determined via Cronbach’s alpha. Results Intraclass coefficients demonstrated poor reliability for ratings across all tool criteria (0.350–0.452) including overall ratings of narratives (0.448). Moreover, the internal consistency between scale items was also poor across all criteria (0.529–0.621). Discussion We did not replicate the reliability characteristics presented in the original REFLECT article. We consider these findings with respect to the contextual differences that existed between our study and the Wald and colleagues study, pointing particularly at the possible influence that repetitive testing and refinement of the tool may have had on their reviewers’ shared understanding of its use. We conclude with a discussion about the challenges inherent to reductionist approaches to assessing reflection.