Sample size determination for bibliographic retrieval studies
- Additional Document Info
- View All
BACKGROUND: Research for developing search strategies to retrieve high-quality clinical journal articles from MEDLINE is expensive and time-consuming. The objective of this study was to determine the minimal number of high-quality articles in a journal subset that would need to be hand-searched to update or create new MEDLINE search strategies for treatment, diagnosis, and prognosis studies. METHODS: The desired width of the 95% confidence intervals (W) for the lowest sensitivity among existing search strategies was used to calculate the number of high-quality articles needed to reliably update search strategies. New search strategies were derived in journal subsets formed by 2 approaches: random sampling of journals and top journals (having the most high-quality articles). The new strategies were tested in both the original large journal database and in a low-yielding journal (having few high-quality articles) subset. RESULTS: For treatment studies, if W was 10% or less for the lowest sensitivity among our existing search strategies, a subset of 15 randomly selected journals or 2 top journals were adequate for updating search strategies, based on each approach having at least 99 high-quality articles. The new strategies derived in 15 randomly selected journals or 2 top journals performed well in the original large journal database. Nevertheless, the new search strategies developed using the random sampling approach performed better than those developed using the top journal approach in a low-yielding journal subset. For studies of diagnosis and prognosis, no journal subset had enough high-quality articles to achieve the expected W (10%). CONCLUSION: The approach of randomly sampling a small subset of journals that includes sufficient high-quality articles is an efficient way to update or create search strategies for high-quality articles on therapy in MEDLINE. The concentrations of diagnosis and prognosis articles are too low for this approach.
has subject area