Document (#22387)

Author
Qin, J.
Title
Semantic similarities between a keyword database and a controlled vocabulary database : an investigation in the antibiotic resistance literature
Source
Journal of the American Society for Information Science. 51(2000) no.2, S.166-180
Year
2000
Abstract
The 'KeyWords Plus' in the Science Citation Index database represents an approach to combining citation and semantic indexing in describing the document content. This paper explores the similariites or dissimilarities between citation-semantic and analytic indexing. The dataset consisted of over 400 matching records in the SCI and MEDLINE databases on antibiotic resistance in pneumonia. The degree of similarity in indexing terms was found to vary on a scale from completely different to completely identical with various levels in between. The within-document similarity in the 2 databases was measured by a variation on the Jaccard coefficient - the Inclusion Index. The average inclusion coefficient was 0,4134 for SCI and 0,3371 for Medline. The 20 terms occuring most frequently in each database were identified. The 2 groups of terms shared the same terms that consist of the 'intellectual base' for the subject. conceptual similarity was analyzed through scatterplots of matching and nonmatching terms vs. partially identical and broader/narrower terms. The study also found that both databases differed in assigning terms in various semantic categories. Implications of this research and further studies are suggested
Theme
Indexierungsstudien
Field
Pharmazie
Object
Science Citation Index
Medline

Similar documents (content)

  1. Tudhope, D.; Taylor, C.: ¬A unified similarity coefficient for navigating through multi-dimensional information (1996) 0.24
    0.23770477 = sum of:
      0.23770477 = product of:
        0.8489456 = sum of:
          0.058350693 = weight(abstract_txt:index in 7460) [ClassicSimilarity], result of:
            0.058350693 = score(doc=7460,freq=2.0), product of:
              0.11120989 = queryWeight, product of:
                1.2573751 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.018624343 = queryNorm
              0.5246898 = fieldWeight in 7460, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.078125 = fieldNorm(doc=7460)
          0.024006646 = weight(abstract_txt:between in 7460) [ClassicSimilarity], result of:
            0.024006646 = score(doc=7460,freq=1.0), product of:
              0.088723935 = queryWeight, product of:
                1.3754954 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.018624343 = queryNorm
              0.2705769 = fieldWeight in 7460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.078125 = fieldNorm(doc=7460)
          0.12052298 = weight(abstract_txt:matching in 7460) [ClassicSimilarity], result of:
            0.12052298 = score(doc=7460,freq=2.0), product of:
              0.18036793 = queryWeight, product of:
                1.6013004 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.018624343 = queryNorm
              0.6682063 = fieldWeight in 7460, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.078125 = fieldNorm(doc=7460)
          0.18037356 = weight(abstract_txt:coefficient in 7460) [ClassicSimilarity], result of:
            0.18037356 = score(doc=7460,freq=1.0), product of:
              0.2973295 = queryWeight, product of:
                2.055948 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.018624343 = queryNorm
              0.6066454 = fieldWeight in 7460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.078125 = fieldNorm(doc=7460)
          0.27892286 = weight(abstract_txt:similarity in 7460) [ClassicSimilarity], result of:
            0.27892286 = score(doc=7460,freq=6.0), product of:
              0.25047177 = queryWeight, product of:
                2.311097 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.018624343 = queryNorm
              1.11359 = fieldWeight in 7460, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=7460)
          0.09760336 = weight(abstract_txt:semantic in 7460) [ClassicSimilarity], result of:
            0.09760336 = score(doc=7460,freq=2.0), product of:
              0.19743882 = queryWeight, product of:
                2.3693237 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.018624343 = queryNorm
              0.49434733 = fieldWeight in 7460, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=7460)
          0.08916549 = weight(abstract_txt:terms in 7460) [ClassicSimilarity], result of:
            0.08916549 = score(doc=7460,freq=1.0), product of:
              0.28223437 = queryWeight, product of:
                3.7474177 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018624343 = queryNorm
              0.3159271 = fieldWeight in 7460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=7460)
        0.28 = coord(7/25)
    
  2. Diodato, V.; Henry, G.: ¬The rates of assignment of narrower terms in the Thesaurus of ERIC Descriptors (1993) 0.21
    0.21032842 = sum of:
      0.21032842 = product of:
        0.8763684 = sum of:
          0.26921242 = weight(abstract_txt:narrower in 3921) [ClassicSimilarity], result of:
            0.26921242 = score(doc=3921,freq=6.0), product of:
              0.15020055 = queryWeight, product of:
                1.0332701 = boost
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.018624343 = queryNorm
              1.7923532 = fieldWeight in 3921, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.09375 = fieldNorm(doc=3921)
          0.051712822 = weight(abstract_txt:document in 3921) [ClassicSimilarity], result of:
            0.051712822 = score(doc=3921,freq=2.0), product of:
              0.090863764 = queryWeight, product of:
                1.13655 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.018624343 = queryNorm
              0.5691248 = fieldWeight in 3921, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=3921)
          0.04074063 = weight(abstract_txt:between in 3921) [ClassicSimilarity], result of:
            0.04074063 = score(doc=3921,freq=2.0), product of:
              0.088723935 = queryWeight, product of:
                1.3754954 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.018624343 = queryNorm
              0.45918423 = fieldWeight in 3921, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.09375 = fieldNorm(doc=3921)
          0.057063047 = weight(abstract_txt:indexing in 3921) [ClassicSimilarity], result of:
            0.057063047 = score(doc=3921,freq=1.0), product of:
              0.13993798 = queryWeight, product of:
                1.7274543 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.018624343 = queryNorm
              0.40777382 = fieldWeight in 3921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=3921)
          0.13664374 = weight(abstract_txt:similarity in 3921) [ClassicSimilarity], result of:
            0.13664374 = score(doc=3921,freq=1.0), product of:
              0.25047177 = queryWeight, product of:
                2.311097 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.018624343 = queryNorm
              0.54554546 = fieldWeight in 3921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.09375 = fieldNorm(doc=3921)
          0.32099572 = weight(abstract_txt:terms in 3921) [ClassicSimilarity], result of:
            0.32099572 = score(doc=3921,freq=9.0), product of:
              0.28223437 = queryWeight, product of:
                3.7474177 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018624343 = queryNorm
              1.1373374 = fieldWeight in 3921, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=3921)
        0.24 = coord(6/25)
    
  3. Lee, D.H.; Schleyer, T.: Social tagging is no substitute for controlled indexing : a comparison of Medical Subject Headings and CiteULike tags assigned to 231,388 papers (2012) 0.20
    0.1965043 = sum of:
      0.1965043 = product of:
        0.70180106 = sum of:
          0.066418104 = weight(abstract_txt:consisted in 383) [ClassicSimilarity], result of:
            0.066418104 = score(doc=383,freq=1.0), product of:
              0.14068368 = queryWeight, product of:
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.018624343 = queryNorm
              0.47210953 = fieldWeight in 383, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.0625 = fieldNorm(doc=383)
          0.1548943 = weight(abstract_txt:jaccard in 383) [ClassicSimilarity], result of:
            0.1548943 = score(doc=383,freq=2.0), product of:
              0.19636603 = queryWeight, product of:
                1.181439 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.018624343 = queryNorm
              0.788804 = fieldWeight in 383, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0625 = fieldNorm(doc=383)
          0.019205317 = weight(abstract_txt:between in 383) [ClassicSimilarity], result of:
            0.019205317 = score(doc=383,freq=1.0), product of:
              0.088723935 = queryWeight, product of:
                1.3754954 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.018624343 = queryNorm
              0.21646151 = fieldWeight in 383, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.0625 = fieldNorm(doc=383)
          0.07608406 = weight(abstract_txt:indexing in 383) [ClassicSimilarity], result of:
            0.07608406 = score(doc=383,freq=4.0), product of:
              0.13993798 = queryWeight, product of:
                1.7274543 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.018624343 = queryNorm
              0.54369843 = fieldWeight in 383, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0625 = fieldNorm(doc=383)
          0.09686624 = weight(abstract_txt:medline in 383) [ClassicSimilarity], result of:
            0.09686624 = score(doc=383,freq=1.0), product of:
              0.22795251 = queryWeight, product of:
                1.8001775 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.018624343 = queryNorm
              0.42494047 = fieldWeight in 383, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.0625 = fieldNorm(doc=383)
          0.12882894 = weight(abstract_txt:similarity in 383) [ClassicSimilarity], result of:
            0.12882894 = score(doc=383,freq=2.0), product of:
              0.25047177 = queryWeight, product of:
                2.311097 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.018624343 = queryNorm
              0.51434517 = fieldWeight in 383, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=383)
          0.15950407 = weight(abstract_txt:terms in 383) [ClassicSimilarity], result of:
            0.15950407 = score(doc=383,freq=5.0), product of:
              0.28223437 = queryWeight, product of:
                3.7474177 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018624343 = queryNorm
              0.5651476 = fieldWeight in 383, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=383)
        0.28 = coord(7/25)
    
  4. Berry, M.W.; Dumais, S.T.; O'Brien, G.W.: Using linear algebra for intelligent information retrieval (1995) 0.20
    0.19631974 = sum of:
      0.19631974 = product of:
        0.6134992 = sum of:
          0.04309402 = weight(abstract_txt:document in 2206) [ClassicSimilarity], result of:
            0.04309402 = score(doc=2206,freq=2.0), product of:
              0.090863764 = queryWeight, product of:
                1.13655 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.018624343 = queryNorm
              0.4742707 = fieldWeight in 2206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
          0.033950526 = weight(abstract_txt:between in 2206) [ClassicSimilarity], result of:
            0.033950526 = score(doc=2206,freq=2.0), product of:
              0.088723935 = queryWeight, product of:
                1.3754954 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.018624343 = queryNorm
              0.3826535 = fieldWeight in 2206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
          0.06724945 = weight(abstract_txt:indexing in 2206) [ClassicSimilarity], result of:
            0.06724945 = score(doc=2206,freq=2.0), product of:
              0.13993798 = queryWeight, product of:
                1.7274543 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.018624343 = queryNorm
              0.48056605 = fieldWeight in 2206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
          0.049654458 = weight(abstract_txt:databases in 2206) [ClassicSimilarity], result of:
            0.049654458 = score(doc=2206,freq=1.0), product of:
              0.14403188 = queryWeight, product of:
                1.7525407 = boost
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.018624343 = queryNorm
              0.3447463 = fieldWeight in 2206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4127526 = idf(docFreq=1456, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
          0.13569084 = weight(abstract_txt:completely in 2206) [ClassicSimilarity], result of:
            0.13569084 = score(doc=2206,freq=1.0), product of:
              0.24593654 = queryWeight, product of:
                1.869841 = boost
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.018624343 = queryNorm
              0.5517311 = fieldWeight in 2206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
          0.0604048 = weight(abstract_txt:database in 2206) [ClassicSimilarity], result of:
            0.0604048 = score(doc=2206,freq=1.0), product of:
              0.18065377 = queryWeight, product of:
                2.2663743 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.018624343 = queryNorm
              0.33436778 = fieldWeight in 2206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
          0.069015995 = weight(abstract_txt:semantic in 2206) [ClassicSimilarity], result of:
            0.069015995 = score(doc=2206,freq=1.0), product of:
              0.19743882 = queryWeight, product of:
                2.3693237 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.018624343 = queryNorm
              0.34955636 = fieldWeight in 2206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
          0.15443914 = weight(abstract_txt:terms in 2206) [ClassicSimilarity], result of:
            0.15443914 = score(doc=2206,freq=3.0), product of:
              0.28223437 = queryWeight, product of:
                3.7474177 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018624343 = queryNorm
              0.54720175 = fieldWeight in 2206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2206)
        0.32 = coord(8/25)
    
  5. Leydesdorff, L.; Rotolo, D.; Rafols, I.: Bibliometric perspectives on medical innovation using the medical subject headings of PubMed (2012) 0.18
    0.18026336 = sum of:
      0.18026336 = product of:
        0.6437977 = sum of:
          0.032676425 = weight(abstract_txt:various in 494) [ClassicSimilarity], result of:
            0.032676425 = score(doc=494,freq=1.0), product of:
              0.09519462 = queryWeight, product of:
                1.1633204 = boost
                4.3937173 = idf(docFreq=1484, maxDocs=44218)
                0.018624343 = queryNorm
              0.34325916 = fieldWeight in 494, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3937173 = idf(docFreq=1484, maxDocs=44218)
                0.078125 = fieldNorm(doc=494)
          0.07146472 = weight(abstract_txt:index in 494) [ClassicSimilarity], result of:
            0.07146472 = score(doc=494,freq=3.0), product of:
              0.11120989 = queryWeight, product of:
                1.2573751 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.018624343 = queryNorm
              0.64261115 = fieldWeight in 494, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.078125 = fieldNorm(doc=494)
          0.024006646 = weight(abstract_txt:between in 494) [ClassicSimilarity], result of:
            0.024006646 = score(doc=494,freq=1.0), product of:
              0.088723935 = queryWeight, product of:
                1.3754954 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.018624343 = queryNorm
              0.2705769 = fieldWeight in 494, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.078125 = fieldNorm(doc=494)
          0.20972157 = weight(abstract_txt:medline in 494) [ClassicSimilarity], result of:
            0.20972157 = score(doc=494,freq=3.0), product of:
              0.22795251 = queryWeight, product of:
                1.8001775 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.018624343 = queryNorm
              0.9200231 = fieldWeight in 494, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.078125 = fieldNorm(doc=494)
          0.09595322 = weight(abstract_txt:citation in 494) [ClassicSimilarity], result of:
            0.09595322 = score(doc=494,freq=2.0), product of:
              0.17735748 = queryWeight, product of:
                1.9447489 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.018624343 = queryNorm
              0.5410159 = fieldWeight in 494, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.078125 = fieldNorm(doc=494)
          0.1208096 = weight(abstract_txt:database in 494) [ClassicSimilarity], result of:
            0.1208096 = score(doc=494,freq=4.0), product of:
              0.18065377 = queryWeight, product of:
                2.2663743 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.018624343 = queryNorm
              0.66873556 = fieldWeight in 494, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.078125 = fieldNorm(doc=494)
          0.08916549 = weight(abstract_txt:terms in 494) [ClassicSimilarity], result of:
            0.08916549 = score(doc=494,freq=1.0), product of:
              0.28223437 = queryWeight, product of:
                3.7474177 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018624343 = queryNorm
              0.3159271 = fieldWeight in 494, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=494)
        0.28 = coord(7/25)