Document (#15828)

Author
Persin, M.
Zobel, J.
Sacks-Davis, R.
Title
Filtered document retrieval with frequency-sorted indexes
Source
Journal of the American Society for Information SCience. 47(1996) no.10, S.749-764
Year
1996
Abstract
Proposes an evaluation technique for ranking that uses early recognition of which documents are likely to be highly ranked to reduce costs. Queries are evaluated in 2% of the memory of standard implementation without degradation in retrieval effectiveness. CPU time and disc traffic can also be dramatically reduced by designing inverted indexes explicitly to support the technique. Inverted lists are sorted by decreasing within-document frequency rather than by document number, and this method experimentally reduces CPU time and disk traffic to around 1/3rd of the original requirement. Frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed

Similar documents (author)

  1. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 1.95
    1.9539574 = sum of:
      1.9539574 = product of:
        3.9079149 = sum of:
          3.9079149 = weight(author_txt:zobel in 5764) [ClassicSimilarity], result of:
            3.9079149 = score(doc=5764,freq=1.0), product of:
              0.8319755 = queryWeight, product of:
                1.2245661 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.07232089 = queryNorm
              4.697151 = fieldWeight in 5764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.5 = fieldNorm(doc=5764)
        0.5 = coord(1/2)
    
  2. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 1.95
    1.9539574 = sum of:
      1.9539574 = product of:
        3.9079149 = sum of:
          3.9079149 = weight(author_txt:zobel in 1678) [ClassicSimilarity], result of:
            3.9079149 = score(doc=1678,freq=1.0), product of:
              0.8319755 = queryWeight, product of:
                1.2245661 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.07232089 = queryNorm
              4.697151 = fieldWeight in 1678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.5 = fieldNorm(doc=1678)
        0.5 = coord(1/2)
    
  3. Uitdenbogerd, A.L.; Zobel, J.: ¬An architecture for effective music information retrieval (2004) 1.95
    1.9539574 = sum of:
      1.9539574 = product of:
        3.9079149 = sum of:
          3.9079149 = weight(author_txt:zobel in 3055) [ClassicSimilarity], result of:
            3.9079149 = score(doc=3055,freq=1.0), product of:
              0.8319755 = queryWeight, product of:
                1.2245661 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.07232089 = queryNorm
              4.697151 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.5 = fieldNorm(doc=3055)
        0.5 = coord(1/2)
    
  4. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 1.95
    1.9539574 = sum of:
      1.9539574 = product of:
        3.9079149 = sum of:
          3.9079149 = weight(author_txt:zobel in 5159) [ClassicSimilarity], result of:
            3.9079149 = score(doc=5159,freq=1.0), product of:
              0.8319755 = queryWeight, product of:
                1.2245661 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.07232089 = queryNorm
              4.697151 = fieldWeight in 5159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.5 = fieldNorm(doc=5159)
        0.5 = coord(1/2)
    
  5. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 1.95
    1.9539574 = sum of:
      1.9539574 = product of:
        3.9079149 = sum of:
          3.9079149 = weight(author_txt:zobel in 9) [ClassicSimilarity], result of:
            3.9079149 = score(doc=9,freq=1.0), product of:
              0.8319755 = queryWeight, product of:
                1.2245661 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.07232089 = queryNorm
              4.697151 = fieldWeight in 9, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.5 = fieldNorm(doc=9)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.18
    0.1812259 = sum of:
      0.1812259 = product of:
        0.75510794 = sum of:
          0.116546325 = weight(abstract_txt:disc in 2648) [ClassicSimilarity], result of:
            0.116546325 = score(doc=2648,freq=2.0), product of:
              0.14517571 = queryWeight, product of:
                1.0477824 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.019068794 = queryNorm
              0.80279493 = fieldWeight in 2648, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.15268874 = weight(abstract_txt:compressed in 2648) [ClassicSimilarity], result of:
            0.15268874 = score(doc=2648,freq=1.0), product of:
              0.21899939 = queryWeight, product of:
                1.2869023 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.019068794 = queryNorm
              0.6972108 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.0797018 = weight(abstract_txt:index in 2648) [ClassicSimilarity], result of:
            0.0797018 = score(doc=2648,freq=3.0), product of:
              0.12402805 = queryWeight, product of:
                1.3696169 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.019068794 = queryNorm
              0.64261115 = fieldWeight in 2648, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.08086246 = weight(abstract_txt:indexes in 2648) [ClassicSimilarity], result of:
            0.08086246 = score(doc=2648,freq=1.0), product of:
              0.18061183 = queryWeight, product of:
                1.6527692 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.019068794 = queryNorm
              0.4477141 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.05097647 = weight(abstract_txt:document in 2648) [ClassicSimilarity], result of:
            0.05097647 = score(doc=2648,freq=1.0), product of:
              0.15200521 = queryWeight, product of:
                1.8570089 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019068794 = queryNorm
              0.33536002 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.2743321 = weight(abstract_txt:inverted in 2648) [ClassicSimilarity], result of:
            0.2743321 = score(doc=2648,freq=2.0), product of:
              0.3236602 = queryWeight, product of:
                2.2125025 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.019068794 = queryNorm
              0.84759295 = fieldWeight in 2648, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
        0.24 = coord(6/25)
    
  2. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 0.16
    0.16362047 = sum of:
      0.16362047 = product of:
        0.68175197 = sum of:
          0.024985155 = weight(abstract_txt:retrieval in 9) [ClassicSimilarity], result of:
            0.024985155 = score(doc=9,freq=3.0), product of:
              0.06641546 = queryWeight, product of:
                1.0022446 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.019068794 = queryNorm
              0.37619486 = fieldWeight in 9, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=9)
          0.081968375 = weight(abstract_txt:reduced in 9) [ClassicSimilarity], result of:
            0.081968375 = score(doc=9,freq=2.0), product of:
              0.1332291 = queryWeight, product of:
                1.0037457 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.019068794 = queryNorm
              0.61524373 = fieldWeight in 9, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.0625 = fieldNorm(doc=9)
          0.054866843 = weight(abstract_txt:time in 9) [ClassicSimilarity], result of:
            0.054866843 = score(doc=9,freq=5.0), product of:
              0.0946393 = queryWeight, product of:
                1.1963959 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.019068794 = queryNorm
              0.5797469 = fieldWeight in 9, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=9)
          0.1727476 = weight(abstract_txt:compressed in 9) [ClassicSimilarity], result of:
            0.1727476 = score(doc=9,freq=2.0), product of:
              0.21899939 = queryWeight, product of:
                1.2869023 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.019068794 = queryNorm
              0.788804 = fieldWeight in 9, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0625 = fieldNorm(doc=9)
          0.03681269 = weight(abstract_txt:index in 9) [ClassicSimilarity], result of:
            0.03681269 = score(doc=9,freq=1.0), product of:
              0.12402805 = queryWeight, product of:
                1.3696169 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.019068794 = queryNorm
              0.29680938 = fieldWeight in 9, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.0625 = fieldNorm(doc=9)
          0.31037134 = weight(abstract_txt:inverted in 9) [ClassicSimilarity], result of:
            0.31037134 = score(doc=9,freq=4.0), product of:
              0.3236602 = queryWeight, product of:
                2.2125025 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.019068794 = queryNorm
              0.95894194 = fieldWeight in 9, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.0625 = fieldNorm(doc=9)
        0.24 = coord(6/25)
    
  3. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.15
    0.15157232 = sum of:
      0.15157232 = product of:
        0.6315513 = sum of:
          0.020400293 = weight(abstract_txt:retrieval in 819) [ClassicSimilarity], result of:
            0.020400293 = score(doc=819,freq=2.0), product of:
              0.06641546 = queryWeight, product of:
                1.0022446 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.019068794 = queryNorm
              0.3071618 = fieldWeight in 819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.07184937 = weight(abstract_txt:requirement in 819) [ClassicSimilarity], result of:
            0.07184937 = score(doc=819,freq=1.0), product of:
              0.15374236 = queryWeight, product of:
                1.0782535 = boost
                7.4773793 = idf(docFreq=67, maxDocs=44218)
                0.019068794 = queryNorm
              0.4673362 = fieldWeight in 819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4773793 = idf(docFreq=67, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.03681269 = weight(abstract_txt:index in 819) [ClassicSimilarity], result of:
            0.03681269 = score(doc=819,freq=1.0), product of:
              0.12402805 = queryWeight, product of:
                1.3696169 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.019068794 = queryNorm
              0.29680938 = fieldWeight in 819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.06468997 = weight(abstract_txt:indexes in 819) [ClassicSimilarity], result of:
            0.06468997 = score(doc=819,freq=1.0), product of:
              0.18061183 = queryWeight, product of:
                1.6527692 = boost
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.019068794 = queryNorm
              0.35817128 = fieldWeight in 819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7307405 = idf(docFreq=389, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.057673294 = weight(abstract_txt:document in 819) [ClassicSimilarity], result of:
            0.057673294 = score(doc=819,freq=2.0), product of:
              0.15200521 = queryWeight, product of:
                1.8570089 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019068794 = queryNorm
              0.37941656 = fieldWeight in 819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.3801257 = weight(abstract_txt:inverted in 819) [ClassicSimilarity], result of:
            0.3801257 = score(doc=819,freq=6.0), product of:
              0.3236602 = queryWeight, product of:
                2.2125025 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.019068794 = queryNorm
              1.1744592 = fieldWeight in 819, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
        0.24 = coord(6/25)
    
  4. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.13
    0.12513718 = sum of:
      0.12513718 = product of:
        0.5214049 = sum of:
          0.057960395 = weight(abstract_txt:reduced in 5226) [ClassicSimilarity], result of:
            0.057960395 = score(doc=5226,freq=1.0), product of:
              0.1332291 = queryWeight, product of:
                1.0037457 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.019068794 = queryNorm
              0.43504304 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.08108265 = weight(abstract_txt:decreasing in 5226) [ClassicSimilarity], result of:
            0.08108265 = score(doc=5226,freq=1.0), product of:
              0.16664675 = queryWeight, product of:
                1.1225935 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.019068794 = queryNorm
              0.48655403 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.024537198 = weight(abstract_txt:time in 5226) [ClassicSimilarity], result of:
            0.024537198 = score(doc=5226,freq=1.0), product of:
              0.0946393 = queryWeight, product of:
                1.1963959 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.019068794 = queryNorm
              0.2592707 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.057673294 = weight(abstract_txt:document in 5226) [ClassicSimilarity], result of:
            0.057673294 = score(doc=5226,freq=2.0), product of:
              0.15200521 = queryWeight, product of:
                1.8570089 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019068794 = queryNorm
              0.37941656 = fieldWeight in 5226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.19168429 = weight(abstract_txt:sorted in 5226) [ClassicSimilarity], result of:
            0.19168429 = score(doc=5226,freq=1.0), product of:
              0.3726026 = queryWeight, product of:
                2.373898 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.019068794 = queryNorm
              0.514447 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
          0.108467095 = weight(abstract_txt:frequency in 5226) [ClassicSimilarity], result of:
            0.108467095 = score(doc=5226,freq=1.0), product of:
              0.29179913 = queryWeight, product of:
                2.5729227 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.019068794 = queryNorm
              0.37171838 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=5226)
        0.24 = coord(6/25)
    
  5. Shieh, W.-Y.; Chung, C.-P.: ¬A statistics-based approach to incrementally update inverted files (2005) 0.12
    0.12277911 = sum of:
      0.12277911 = product of:
        0.61389554 = sum of:
          0.018031482 = weight(abstract_txt:retrieval in 1010) [ClassicSimilarity], result of:
            0.018031482 = score(doc=1010,freq=1.0), product of:
              0.06641546 = queryWeight, product of:
                1.0022446 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.019068794 = queryNorm
              0.27149525 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
          0.08045897 = weight(abstract_txt:reduction in 1010) [ClassicSimilarity], result of:
            0.08045897 = score(doc=1010,freq=1.0), product of:
              0.14287443 = queryWeight, product of:
                1.0394447 = boost
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.019068794 = queryNorm
              0.5631446 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
          0.030671496 = weight(abstract_txt:time in 1010) [ClassicSimilarity], result of:
            0.030671496 = score(doc=1010,freq=1.0), product of:
              0.0946393 = queryWeight, product of:
                1.1963959 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.019068794 = queryNorm
              0.32408836 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
          0.05097647 = weight(abstract_txt:document in 1010) [ClassicSimilarity], result of:
            0.05097647 = score(doc=1010,freq=1.0), product of:
              0.15200521 = queryWeight, product of:
                1.8570089 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.019068794 = queryNorm
              0.33536002 = fieldWeight in 1010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
          0.43375716 = weight(abstract_txt:inverted in 1010) [ClassicSimilarity], result of:
            0.43375716 = score(doc=1010,freq=5.0), product of:
              0.3236602 = queryWeight, product of:
                2.2125025 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.019068794 = queryNorm
              1.3401622 = fieldWeight in 1010, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.078125 = fieldNorm(doc=1010)
        0.2 = coord(5/25)