Document (#20418)

Author
Lee, D.L.
Ren, L.
Title
Document ranking on weight-partitioned signature files
Source
ACM transactions on information systems. 14(1996) no.2, S.109-137
Year
1996
Abstract
Proposes the weight partitioned signature file, a signature file organization for supporting document ranking. It uses multiple signature files each corresponding to one term frequency to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hased into the signature file corresponding to that term frequency. Investigates the effect of false drops on retrieval effectiveness. Analyses the performance of the weight partitioned signature file under different search strategies and configurations. Obtains an optimal formula for storage allocation to minimise the effect of false drops on document ranks. Analytical results are supported by experiments on document collections
Theme
Retrievalalgorithmen

Similar documents (content)

  1. Lam, W.; Wong, K.-F.; Wong, C.-Y.: Chinese document indexing based on new partitioned signature file : model and evaluation (2001) 0.43
    0.4272128 = sum of:
      0.4272128 = product of:
        1.52576 = sum of:
          0.027032623 = weight(abstract_txt:analytical in 303) [ClassicSimilarity], result of:
            0.027032623 = score(doc=303,freq=1.0), product of:
              0.06627264 = queryWeight, product of:
                1.0759566 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.009437688 = queryNorm
              0.4079002 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.05148986 = weight(abstract_txt:files in 303) [ClassicSimilarity], result of:
            0.05148986 = score(doc=303,freq=2.0), product of:
              0.10183333 = queryWeight, product of:
                1.8861994 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.009437688 = queryNorm
              0.50562876 = fieldWeight in 303, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.08661039 = weight(abstract_txt:false in 303) [ClassicSimilarity], result of:
            0.08661039 = score(doc=303,freq=1.0), product of:
              0.18146747 = queryWeight, product of:
                2.5179205 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.009437688 = queryNorm
              0.47727776 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.0384592 = weight(abstract_txt:document in 303) [ClassicSimilarity], result of:
            0.0384592 = score(doc=303,freq=1.0), product of:
              0.14335042 = queryWeight, product of:
                3.5384421 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.009437688 = queryNorm
              0.26828802 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.15101898 = weight(abstract_txt:file in 303) [ClassicSimilarity], result of:
            0.15101898 = score(doc=303,freq=5.0), product of:
              0.1936986 = queryWeight, product of:
                3.6789243 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.009437688 = queryNorm
              0.7796596 = fieldWeight in 303, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.21680044 = weight(abstract_txt:partitioned in 303) [ClassicSimilarity], result of:
            0.21680044 = score(doc=303,freq=1.0), product of:
              0.38296226 = queryWeight, product of:
                4.4798784 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.009437688 = queryNorm
              0.56611437 = fieldWeight in 303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
          0.95434856 = weight(abstract_txt:signature in 303) [ClassicSimilarity], result of:
            0.95434856 = score(doc=303,freq=7.0), product of:
              0.6774822 = queryWeight, product of:
                8.426597 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009437688 = queryNorm
              1.4086696 = fieldWeight in 303, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=303)
        0.28 = coord(7/25)
    
  2. Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.22
    0.22494857 = sum of:
      0.22494857 = product of:
        1.4059286 = sum of:
          0.078827426 = weight(abstract_txt:files in 1029) [ClassicSimilarity], result of:
            0.078827426 = score(doc=1029,freq=3.0), product of:
              0.10183333 = queryWeight, product of:
                1.8861994 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.009437688 = queryNorm
              0.7740828 = fieldWeight in 1029, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
          0.053812988 = weight(abstract_txt:term in 1029) [ClassicSimilarity], result of:
            0.053812988 = score(doc=1029,freq=1.0), product of:
              0.1434653 = queryWeight, product of:
                3.1661468 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.009437688 = queryNorm
              0.37509412 = fieldWeight in 1029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
          0.16884436 = weight(abstract_txt:file in 1029) [ClassicSimilarity], result of:
            0.16884436 = score(doc=1029,freq=4.0), product of:
              0.1936986 = queryWeight, product of:
                3.6789243 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.009437688 = queryNorm
              0.871686 = fieldWeight in 1029, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
          1.1044438 = weight(abstract_txt:signature in 1029) [ClassicSimilarity], result of:
            1.1044438 = score(doc=1029,freq=6.0), product of:
              0.6774822 = queryWeight, product of:
                8.426597 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009437688 = queryNorm
              1.6302183 = fieldWeight in 1029, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=1029)
        0.16 = coord(4/25)
    
  3. Lee, D.L.: Massive parallelism on the hybrid text-retrieval machine (1995) 0.16
    0.16363908 = sum of:
      0.16363908 = product of:
        1.363659 = sum of:
          0.10130661 = weight(abstract_txt:file in 4075) [ClassicSimilarity], result of:
            0.10130661 = score(doc=4075,freq=1.0), product of:
              0.1936986 = queryWeight, product of:
                3.6789243 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.009437688 = queryNorm
              0.52301157 = fieldWeight in 4075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.09375 = fieldNorm(doc=4075)
          0.32520065 = weight(abstract_txt:partitioned in 4075) [ClassicSimilarity], result of:
            0.32520065 = score(doc=4075,freq=1.0), product of:
              0.38296226 = queryWeight, product of:
                4.4798784 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.009437688 = queryNorm
              0.8491715 = fieldWeight in 4075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.09375 = fieldNorm(doc=4075)
          0.93715173 = weight(abstract_txt:signature in 4075) [ClassicSimilarity], result of:
            0.93715173 = score(doc=4075,freq=3.0), product of:
              0.6774822 = queryWeight, product of:
                8.426597 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009437688 = queryNorm
              1.3832861 = fieldWeight in 4075, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.09375 = fieldNorm(doc=4075)
        0.12 = coord(3/25)
    
  4. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.15
    0.14623533 = sum of:
      0.14623533 = product of:
        1.2186278 = sum of:
          0.09102207 = weight(abstract_txt:files in 6973) [ClassicSimilarity], result of:
            0.09102207 = score(doc=6973,freq=4.0), product of:
              0.10183333 = queryWeight, product of:
                1.8861994 = boost
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.009437688 = queryNorm
              0.89383376 = fieldWeight in 6973, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.720536 = idf(docFreq=393, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
          0.119390994 = weight(abstract_txt:file in 6973) [ClassicSimilarity], result of:
            0.119390994 = score(doc=6973,freq=2.0), product of:
              0.1936986 = queryWeight, product of:
                3.6789243 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.009437688 = queryNorm
              0.6163751 = fieldWeight in 6973, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
          1.0082147 = weight(abstract_txt:signature in 6973) [ClassicSimilarity], result of:
            1.0082147 = score(doc=6973,freq=5.0), product of:
              0.6774822 = queryWeight, product of:
                8.426597 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009437688 = queryNorm
              1.488179 = fieldWeight in 6973, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=6973)
        0.12 = coord(3/25)
    
  5. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.11
    0.110032216 = sum of:
      0.110032216 = product of:
        0.68770134 = sum of:
          0.06034056 = weight(abstract_txt:ranking in 690) [ClassicSimilarity], result of:
            0.06034056 = score(doc=690,freq=2.0), product of:
              0.09754576 = queryWeight, product of:
                1.8460642 = boost
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.009437688 = queryNorm
              0.61858726 = fieldWeight in 690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.09320682 = weight(abstract_txt:term in 690) [ClassicSimilarity], result of:
            0.09320682 = score(doc=690,freq=3.0), product of:
              0.1434653 = queryWeight, product of:
                3.1661468 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.009437688 = queryNorm
              0.64968204 = fieldWeight in 690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.083266616 = weight(abstract_txt:document in 690) [ClassicSimilarity], result of:
            0.083266616 = score(doc=690,freq=3.0), product of:
              0.14335042 = queryWeight, product of:
                3.5384421 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.009437688 = queryNorm
              0.5808606 = fieldWeight in 690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.45088732 = weight(abstract_txt:signature in 690) [ClassicSimilarity], result of:
            0.45088732 = score(doc=690,freq=1.0), product of:
              0.6774822 = queryWeight, product of:
                8.426597 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009437688 = queryNorm
              0.66553384 = fieldWeight in 690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
        0.16 = coord(4/25)