MEDLINE: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Robert Badgett
imported>Robert Badgett
Line 72: Line 72:


===Citation analysis or PageRank===
===Citation analysis or PageRank===
There are conflicting results over the role of ranking results based on citation counts or [[PageRank]]. A study using [[Google]]'s own [[PageRank]] found PubMed's clinical queries to be better.<ref name="pmid17603909">{{cite journal |author=Haase A, Follmann M, Skipka G, Kirchner H |title=Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance |journal=BMC Med Res Methodol |volume=7 |issue= |pages=28 |year=2007 |pmid=17603909 |doi=10.1186/1471-2288-7-28}}</ref> However, a comparative study found better results for a metric analogous to PageRank for biomedical journals based on:<ref name="pmid16221938">{{cite journal |author=Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR |title=Using citation data to improve retrieval from MEDLINE |journal=J Am Med Inform Assoc |volume=13 |issue=1 |pages=96–105 |year=2006 |pmid=16221938 |doi=10.1197/jamia.M1909}}</ref><ref name="pmid16779053">{{cite journal |author=Herskovic JR, Bernstam EV |title=Using incomplete citation data for MEDLINE results ranking |journal=AMIA Annu Symp Proc |volume= |issue= |pages=316–20 |year=2005 |pmid=16779053 |doi=}} [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=citizendium&pubmedid=16779053 PubMed Central]</ref>
There are conflicting results over the role of ranking results based on citation counts or [[PageRank]]. A study using [[Google]]'s own [[PageRank]] found PubMed's clinical queries to be better.<ref name="pmid17603909">{{cite journal |author=Haase A, Follmann M, Skipka G, Kirchner H |title=Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance |journal=BMC Med Res Methodol |volume=7 |issue= |pages=28 |year=2007 |pmid=17603909 |doi=10.1186/1471-2288-7-28}}</ref> However, a comparative study found better results for a metric analogous to PageRank for biomedical journals based on:<ref name="pmid16221938">{{cite journal |author=Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR |title=Using citation data to improve retrieval from MEDLINE |journal=J Am Med Inform Assoc |volume=13 |issue=1 |pages=96–105 |year=2006 |pmid=16221938 |doi=10.1197/jamia.M1909}} ''This study may have been biased towards ranking systems because all retrieval methods analyzed a "preliminary result set using simple PubMed queries"''</ref><ref name="pmid16779053">{{cite journal |author=Herskovic JR, Bernstam EV |title=Using incomplete citation data for MEDLINE results ranking |journal=AMIA Annu Symp Proc |volume= |issue= |pages=316–20 |year=2005 |pmid=16779053 |doi=}} [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=citizendium&pubmedid=16779053 PubMed Central]</ref>


:<math>\text{PageRank for the index article} = \frac{\text{the number of articles citing the index article }}{\text{the number of articles cited by the index article}}</math>
:<math>\text{PageRank for the index article} = \frac{\text{the number of articles citing the index article }}{\text{the number of articles cited by the index article}}</math>

Revision as of 01:48, 27 June 2011

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

According to the U.S. National Library of Medicine, "MEDLINE® (Medical Literature Analysis and Retrieval System Online) is the U.S. National Library of Medicine's® (NLM) premier bibliographic database that contains over 16 million references to journal articles in life sciences with a concentration on biomedicine. A distinctive feature of MEDLINE is that the records are indexed with NLM's Medical Subject Headings (MeSH®)."[1]

PubMed is the National Library of Medicine's free online search system for MEDLINE.

Structure

MEDLINE® (Medical Literature Analysis and Retrieval System Online) is a database of predominantly biomedical bibliographic citations maintained by the U.S. National Library of Medicine (NLM).[2] Each citation includes bibliographic data, abstract if available, links to full text of the article and keywords.

The process for selecting journals is described.[3]

The keywords are indexed with the NLM's Medical Subject Headings (MeSH®)[4] and subheadings[5]. Indexing of MESH terms by human is assisted by the Medical Text Indexer (MTI).[6]

The important MeSH terms “Randomized Controlled Trial” and “Clinical Controlled Trial” were introduced in 1991 and 1995, respectively.[7] The Cochrane Collaboration helps MEDLINE correctly retag articles with these terms.[7]

The National Library of Medicine's Indexing Initiative is trying to automate assignment of MeSH terms. The National Library of Medicine is investigated whether indexing MeSH terms can be either fully or semi-automated.[8]

PubMed provides feedback relevance with its "See related" feature.[9][10]

Methods to improve searching MEDLINE

There is much ongoing research into improving MEDLINE search results.

Citation tracking

Citation tracking may help identify relevant studies in MEDLINE.[11][12]

Clustering

Clustering search results may help.[13]

Filters (hedges)

MEDLINE filters, also called hedges, are an optimal Boolean combination of search terms, both textword and MeSH terms, to search articles. Many filters have been made by the Hedges Team and are available as Clinical Queries at PubMed. The Clinical Queries at PubMed may improve the quality of articles retrieved.[14]

Filters have been criticized for being imperfect.[15]

Filters for article types

Evolution of search filters
Purpose category Strategy with
high sensitivity
Strategy with
high specificity
1994[16]
Treatment randomized controlled trial[Publication Type] OR drug therapy[MeSH Subheading] OR therapeutic use[MeSH Subheading] OR random*[Title/Abstract] placebo*[Title/Abstract] OR (double[Title/Abstract] AND blind*[Title/Abstract]
Diagnosis
2005[17]
Treatment (clinical[Title/Abstract] AND trial[Title/Abstract]) OR clinical trials[MeSH Terms] OR clinical trial[Publication Type] OR random*[Title/Abstract] OR random allocation[MeSH Terms] OR therapeutic use[MeSH Subheading] randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract])
Diagnosis sensitiv*[Title/Abstract] OR sensitivity and specificity[MeSH Terms] OR diagnos*{Title/Abstract] OR diagnosis[MeSH:noexp] OR diagnostic * [MeSH:noexp] OR diagnosis,differential[MeSH:noexp] OR diagnosis[Subheading:noexp] specificity[Title/Abstract]

One filter is for identifying randomized controlled trials. Many MEDLINE filters have been developed by the Hedges team[17] supported by a grant from the National Library of Medicine.[18] The filters were initially published in 1994[16] and then revised and published in 2005[19].

Examples include filters for randomized controlled trials[20] and systematic reviews[21].

Filters for subject types

A filter have been developed for articles about kidney disease[22], dentistry[23], and about specific age ranges[24].

Relevancy ranking

Although MEDLINE is usually searched for exact matches using Boolean terms, relevancy ranking has been studied. In an early comparison, relevancy ranking performed well; however, the Boolean version of MEDLINE did not fully use MeSH terms.[25][26]

eTBLAST uses text mining to search for similar publications.[27][28]

Citation analysis or PageRank

There are conflicting results over the role of ranking results based on citation counts or PageRank. A study using Google's own PageRank found PubMed's clinical queries to be better.[29] However, a comparative study found better results for a metric analogous to PageRank for biomedical journals based on:[30][31]

Machine learning

Machine learning methods in which the search engine seeks articles that more resemble the included articles, may be more accurate than Boolean methods (see EBMSearch below).[32][33] However, the study by Aphinyanaphongs compared machine learning to the 1994 Boolean filters.[32]

Machine learning may be improved by ensemble learning method using stacked generalization (or stacking) to emphasize the role of UMLS concepts and title words.[34]

Machine learning may[35][33] or may not[30] be more accurate than citation based strategies. Citation or link strategies may improve upon text categorization.[36]

Machine learning built for categorizing one gold standard may not work as well in another setting.[35]

Research methods for comparative studies

For more information, see: Information retrieval.

In comparing the information retrieval of search strategies, there are two experimental methods.

  1. If a complete test collection of articles is available that is already divided into articles of meeting inclusion criteria and articles that not meeting criteria, then each strategy is compared for its ability to successfully identify the articles meeting criteria (sensitivity) and to successfully exclude (specificity) the articles not meeting criteria. Sensitivity is also called "recall".[37]
  2. If a partial test collection is available that only consists of articles meeting inclusion criteria (for example, article meeting inclusion criteria for ACP Journal Club[32] or articles included in a systematic review of a clinical topic or articles in an annotated bibliography[31]), then the sensitivity is again the proportion of relevant articles identified by the strategy. However, the specificity is not computable. Instead, one of several related measures are calculated. These measures are all based on the positive predictive value (PPV) of the strategy. Analogous to PPV used in diagnostic testing, the PPV directly correlates with the prevalence of relevant articles in the collection and thus is not stable across prevalences.[38]
    1. Precision is "the proportion of retrieved articles that meet criteria" and thus is the same as the PPV.[39][40]
    2. Number Needed to Read (NNR) is 1/precision and is "how many papers in a journal have to be read to find one of adequate clinical quality and relevance."[41][42][38][29] Of note, the NNR has been proposed as a metric to help libraries to decide which journals to subscribe to.[41]
    3. Hit curve "is the number of important articles among the first n results."[43][30]
    4. 11-point precision recall graph is similar to a receiver operating characteristic curve[32]

Methods to access MEDLINE

There are many third party interfaces to search MEDLINE such as OVID[44]. The National Library of Medicine's own search interface is PubMed (http://pubmed.gov).

PubMed

For more information, see: PubMed.

PubMed (http://pubmed.gov) is the National Library of Medicine's own free Internet access to MEDLINE. PubMed has been freely available since 1997.

EBM Search

EBM Search (http://www.ahsl.arizona.edu/ebmsearch/) is a federated medical search engine.[45]

EBMSearch

EBMSearch (http://ebmsearch.org/) maintains its own copy of MEDLINE and uses machine learning to rank articles.[32]

eTBLAST

eTBLAST uses text mining to search for similar publications.[27][28]

GoPubMed

GoPubMed (http://www.GoPubMed.org/) applies social networking to MEDLINE.[46]

HubMed

HubMed (http://www.hubmed.org/) does not maintain its own copy of MEDLILNE, but rather uses PubMed's EUtils web service to retrieve MEDLINE records stored at PubMed.[47]

Ovid

SUMSearch

SUMSearch (http://sumsearch.uthscsa.edu/) is a federated medical search engine. It does not maintain its own copy of MEDLINE, but rather queries PubMed and revises searches too few or too many citations are retrieved. At the same time, SUMSearch queries the National Guidelines Clearinghouse, DARE, WikiPedia, and other resources.

References

  1. MEDLINE Fact Sheet. National Library of Medicine. Retrieved on 2008-01-22.
  2. National Library of Medicine. MEDLINE Fact Sheet. Retrieved on 2007-11-09.
  3. Anonymous (2007). MEDLINE® Journal Selection Fact Sheet. National Library of Medicine. Retrieved on 2010-04-04.
  4. National Library of Medicine. Medical Subject Headings (MESH®) Fact Sheet. Retrieved on 2007-11-09.
  5. Anonymous (2008). Qualifiers - 2008. National Library of Medicine. Retrieved on 2008-03-19.
  6. Anonymous. Medical Text Indexer (MTI). National Library of Medicine
  7. 7.0 7.1 Glanville JM, Lefebvre C, Miles JN, Camosso-Stefinovic J (2006). "How to identify randomized controlled trials in MEDLINE: ten years on.". J Med Libr Assoc 94 (2): 130-6. PMID 16636704. PMC PMC1435857.
  8. National Library of Medicine. Indexing Initiative. Retrieved on 2007-11-25.
  9. Lin J, Wilbur WJ (2007). "PubMed related articles: a probabilistic topic-based model for content similarity.". BMC Bioinformatics 8: 423. DOI:10.1186/1471-2105-8-423. PMID 17971238. PMC PMC2212667. Research Blogging.
  10. Anonymous (2011). PubMed Help: Computation of Related Citations
  11. Bakkalbasi N, Bauer K, Glover J, Wang L (2006). "Three options for citation tracking: Google Scholar, Scopus and Web of Science". Biomed Digit Libr 3: 7. DOI:10.1186/1742-5581-3-7. PMID 16805916. Research Blogging.
  12. Kuper H, Nicholson A, Hemingway H (2006). "Searching for observational studies: what does citation tracking add to PubMed? A case study in depression and coronary heart disease". BMC Med Res Methodol 6: 4. DOI:10.1186/1471-2288-6-4. PMID 16483366. Research Blogging.
  13. Lin Y, Li W, Chen K, Liu Y (2007). "A document clustering and ranking system for exploring MEDLINE citations". J Am Med Inform Assoc 14 (5): 651–61. DOI:10.1197/jamia.M2215. PMID 17600104. Research Blogging.
  14. Lokker C, Haynes RB, Wilczynski NL, McKibbon KA, Walter SD (2011). "Retrieval of diagnostic and treatment studies for clinical use through PubMed and PubMed's Clinical Queries filters.". J Am Med Inform Assoc. DOI:10.1136/amiajnl-2011-000233. PMID 21680559. Research Blogging.
  15. Leeflang MM, Scholten RJ, Rutjes AW, Reitsma JB, Bossuyt PM (2006). "Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies.". J Clin Epidemiol 59 (3): 234-40. DOI:10.1016/j.jclinepi.2005.07.014. PMID 16488353. Research Blogging.
  16. 16.0 16.1 Haynes RB, Wilczynski N, McKibbon KA, Walker CJ, Sinclair JC (1994). "Developing optimal search strategies for detecting clinically sound studies in MEDLINE.". J Am Med Inform Assoc 1 (6): 447-58. PMID 7850570. PMC PMC116228[e]
  17. 17.0 17.1 Hedges Team. Search Strategies. Retrieved on 2011-03-015.
  18. Project Information - NIH RePORTER – NIH Research Portfolio Online Reporting Tool Expenditures and Results. Retrieved on 2007-11-25.
  19. Haynes RB, McKibbon KA, Wilczynski NL, Walter SD, Werre SR, Hedges Team (2005). "Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey.". BMJ 330 (7501): 1179. DOI:10.1136/bmj.38446.498542.8F. PMID 15894554. PMC PMC558012. Research Blogging.
  20. McKibbon KA, Wilczynski NL, Haynes RB (2009). "Retrieving randomized controlled trials from MEDLINE: a comparison of 38 published search filters.". Health Info Libr J 26 (3): 187-202. DOI:10.1111/j.1471-1842.2008.00827.x. PMID 19712211. Research Blogging.
  21. Wilczynski NL, Haynes RB (2009). "Consistency and accuracy of indexing systematic review articles and meta-analyses in MEDLINE.". Health Info Libr J 26 (3): 203-10. DOI:10.1111/j.1471-1842.2008.00823.x. PMID 19712212. Research Blogging.
  22. Garg AX, Iansavichus AV, Wilczynski NL, Kastner M, Baier LA, Shariff SZ et al. (2009). "Filtering Medline for a clinical discipline: diagnostic test assessment framework.". BMJ 339: b3435. DOI:10.1136/bmj.b3435. PMID 19767336. Research Blogging.
  23. Niederman R, Chen L, Murzyn L, Conway S. Benchmarking the dental randomised controlled literature on MEDLINE. Evidence-Based Dentistry. 2002;3:5-9 DOI:10.1038/sj/ebd/4600095
  24. Kastner M, Wilczynski NL, Walker-Dilks C, McKibbon KA, Haynes B (2006). "Age-specific search strategies for Medline.". J Med Internet Res 8 (4): e25. DOI:10.2196/jmir.8.4.e25. PMID 17213044. PMC PMC1794003. Research Blogging.
  25. Hersh WR, Hickam DH (1992). "A comparison of retrieval effectiveness for three methods of indexing medical literature". Am. J. Med. Sci. 303 (5): 292–300. PMID 1580316[e]
  26. Hersh WR, Hickam DH, Haynes RB, McKibbon KA (1994). "A performance and failure analysis of SAPHIRE with a MEDLINE test collection". J Am Med Inform Assoc 1 (1): 51–60. PMID 7719787[e]
  27. 27.0 27.1 Errami M, Wren JD, Hicks JM, Garner HR (2007). "eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications.". Nucleic Acids Res 35 (Web Server issue): W12-5. DOI:10.1093/nar/gkm221. PMID 17452348. PMC PMC1933238. Research Blogging.
  28. 28.0 28.1 Lewis J, Ossowski S, Hicks J, Errami M, Garner HR (2006). "Text similarity: an alternative way to search MEDLINE.". Bioinformatics 22 (18): 2298-304. DOI:10.1093/bioinformatics/btl388. PMID 16926219. Research Blogging.
  29. 29.0 29.1 Haase A, Follmann M, Skipka G, Kirchner H (2007). "Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance". BMC Med Res Methodol 7: 28. DOI:10.1186/1471-2288-7-28. PMID 17603909. Research Blogging.
  30. 30.0 30.1 30.2 Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR (2006). "Using citation data to improve retrieval from MEDLINE". J Am Med Inform Assoc 13 (1): 96–105. DOI:10.1197/jamia.M1909. PMID 16221938. Research Blogging. This study may have been biased towards ranking systems because all retrieval methods analyzed a "preliminary result set using simple PubMed queries" Cite error: Invalid <ref> tag; name "pmid16221938" defined multiple times with different content Cite error: Invalid <ref> tag; name "pmid16221938" defined multiple times with different content
  31. 31.0 31.1 Herskovic JR, Bernstam EV (2005). "Using incomplete citation data for MEDLINE results ranking". AMIA Annu Symp Proc: 316–20. PMID 16779053[e] PubMed Central Cite error: Invalid <ref> tag; name "pmid16779053" defined multiple times with different content
  32. 32.0 32.1 32.2 32.3 32.4 Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF (2005). "Text categorization models for high-quality article retrieval in internal medicine". J Am Med Inform Assoc 12 (2): 207–16. DOI:10.1197/jamia.M1641. PMID 15561789. Research Blogging.
  33. 33.0 33.1 Fu LD, Wang L, Aphinyanagphongs Y, Aliferis CF (2007). "A comparison of impact factor, clinical query filters, and pattern recognition query filters in terms of sensitivity to topic.". Stud Health Technol Inform 129 (Pt 1): 716-20. PMID 17911810[e]
  34. Kilicoglu H, Demner-Fushman D, Rindflesch TC, Wilczynski NL, Haynes RB (2009). "Towards automatic recognition of scientifically rigorous clinical research evidence". J Am Med Inform Assoc 16 (1): 25–31. DOI:10.1197/jamia.M2996. PMID 18952929. PMC 2605595. Research Blogging.
  35. 35.0 35.1 Aphinyanaphongs Y, Statnikov A, Aliferis CF (2006). "A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents.". J Am Med Inform Assoc 13 (4): 446-55. DOI:10.1197/jamia.M2031. PMID 16622165. PMC PMC1513679. Research Blogging.
  36. Lin J (2008). "PageRank without hyperlinks: reranking with PubMed related article networks for biomedical text retrieval.". BMC Bioinformatics 9: 270. DOI:10.1186/1471-2105-9-270. PMID 18538027. PMC PMC2442104. Research Blogging.
  37. Hersh, William R. (2008). Information Retrieval: A Health and Biomedical Perspective (Health Informatics). Berlin: Springer. ISBN 0-387-78702-X.  Google books
  38. 38.0 38.1 Bachmann LM, Coray R, Estermann P, Ter Riet G (2002). "Identifying diagnostic studies in MEDLINE: reducing the number needed to read". J Am Med Inform Assoc 9 (6): 653–8. PMID 12386115[e]
  39. Haynes RB, Wilczynski NL (2004). "Optimal search strategies for retrieving scientifically strong studies of diagnosis from Medline: analytical survey". BMJ 328 (7447): 1040. DOI:10.1136/bmj.38068.557998.EE. PMID 15073027. Research Blogging.
  40. Zhang L, Ajiferuke I, Sampson M (2006). "Optimizing search strategies to identify randomized controlled trials in MEDLINE". BMC Med Res Methodol 6: 23. DOI:10.1186/1471-2288-6-23. PMID 16684359. PMC 1488863. Research Blogging.
  41. 41.0 41.1 Toth B, Gray JA, Brice A (2005). "The number needed to read-a new measure of journal value". Health Info Libr J 22 (2): 81–2. DOI:10.1111/j.1471-1842.2005.00568.x. PMID 15910578. Research Blogging.
  42. McKibbon KA, Wilczynski NL, Haynes RB (2004). "What do evidence-based secondary journals tell us about the publication of clinically important articles in primary healthcare journals?". BMC Med 2: 33. DOI:10.1186/1741-7015-2-33. PMID 15350200. Research Blogging.
  43. Herskovic JR, Iyengar MS, Bernstam EV (2007). "Using hit curves to compare search algorithm performance". J Biomed Inform 40 (2): 93–9. DOI:10.1016/j.jbi.2005.12.007. PMID 16469545. Research Blogging.
  44. Anonymous. MEDLINE® - Ovid's MEDLINE. Retrieved on 2007-11-09.
  45. Bracke PJ, Howse DK, Keim SM (April 2008). "Evidence-based Medicine Search: a customizable federated search engine". J Med Libr Assoc 96 (2): 108–13. DOI:10.3163/1536-5050.96.2.108. PMID 18379665. PMC 2268222. Research Blogging.
  46. Doms A, Schroeder M (July 2005). "GoPubMed: exploring PubMed with the Gene Ontology". Nucleic acids research 33 (Web Server issue): W783–6. DOI:10.1093/nar/gki470. PMID 15980585. PMC 1160231. Research Blogging.
  47. Eaton AD (July 2006). "HubMed: a web-based biomedical literature search interface". Nucleic acids research 34 (Web Server issue): W745–7. DOI:10.1093/nar/gkl037. PMID 16845111. PMC 1538859. Research Blogging.

External links