Document (#23765)

Author
Kaszkiel, M.
Zobel, J.
Title
Effective ranking with arbitrary passages
Source
Journal of the American Society for Information Science and technology. 52(2001) no.4, S.344-364
Year
2001
Abstract
Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
Theme
Retrievalalgorithmen

Similar documents (author)

  1. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:zobel in 1678) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 1678, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=1678)
    
  2. Uitdenbogerd, A.L.; Zobel, J.: ¬An architecture for effective music information retrieval (2004) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:zobel in 3055) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 3055, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=3055)
    
  3. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:zobel in 5159) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 5159, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=5159)
    
  4. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:zobel in 9) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 9, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=9)
    
  5. Hawking, D.; Zobel, J.: Does topic metadata help with Web search? (2007) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:zobel in 204) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 204, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=204)
    

Similar documents (content)

  1. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.58
    0.5754234 = sum of:
      0.5754234 = product of:
        1.7981982 = sum of:
          0.04158578 = weight(abstract_txt:blocks in 2765) [ClassicSimilarity], result of:
            0.04158578 = score(doc=2765,freq=1.0), product of:
              0.09865533 = queryWeight, product of:
                1.0182503 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.012569839 = queryNorm
              0.42152596 = fieldWeight in 2765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.024882298 = weight(abstract_txt:document in 2765) [ClassicSimilarity], result of:
            0.024882298 = score(doc=2765,freq=3.0), product of:
              0.06119565 = queryWeight, product of:
                1.1341475 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012569839 = queryNorm
              0.4066024 = fieldWeight in 2765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.036031123 = weight(abstract_txt:text in 2765) [ClassicSimilarity], result of:
            0.036031123 = score(doc=2765,freq=4.0), product of:
              0.08146347 = queryWeight, product of:
                1.6026415 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012569839 = queryNorm
              0.4422979 = fieldWeight in 2765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.0141846035 = weight(abstract_txt:with in 2765) [ClassicSimilarity], result of:
            0.0141846035 = score(doc=2765,freq=4.0), product of:
              0.05188065 = queryWeight, product of:
                1.6511328 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012569839 = queryNorm
              0.27340835 = fieldWeight in 2765, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.050753657 = weight(abstract_txt:length in 2765) [ClassicSimilarity], result of:
            0.050753657 = score(doc=2765,freq=1.0), product of:
              0.1419533 = queryWeight, product of:
                1.7273568 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.012569839 = queryNorm
              0.3575377 = fieldWeight in 2765, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.08808262 = weight(abstract_txt:documents in 2765) [ClassicSimilarity], result of:
            0.08808262 = score(doc=2765,freq=3.0), product of:
              0.22563529 = queryWeight, product of:
                4.355548 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.012569839 = queryNorm
              0.39037606 = fieldWeight in 2765, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.7667569 = weight(abstract_txt:passage in 2765) [ClassicSimilarity], result of:
            0.7667569 = score(doc=2765,freq=14.0), product of:
              0.4534956 = queryWeight, product of:
                4.3662724 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012569839 = queryNorm
              1.6907703 = fieldWeight in 2765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
          0.77592117 = weight(abstract_txt:passages in 2765) [ClassicSimilarity], result of:
            0.77592117 = score(doc=2765,freq=14.0), product of:
              0.45710188 = queryWeight, product of:
                4.383599 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.012569839 = queryNorm
              1.6974797 = fieldWeight in 2765, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2765)
        0.32 = coord(8/25)
    
  2. Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 0.21
    0.21104448 = sum of:
      0.21104448 = product of:
        0.879352 = sum of:
          0.020522574 = weight(abstract_txt:document in 4955) [ClassicSimilarity], result of:
            0.020522574 = score(doc=4955,freq=1.0), product of:
              0.06119565 = queryWeight, product of:
                1.1341475 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012569839 = queryNorm
              0.33536002 = fieldWeight in 4955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
          0.026831679 = weight(abstract_txt:collections in 4955) [ClassicSimilarity], result of:
            0.026831679 = score(doc=4955,freq=1.0), product of:
              0.07316969 = queryWeight, product of:
                1.240152 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.012569839 = queryNorm
              0.36670482 = fieldWeight in 4955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
          0.017548895 = weight(abstract_txt:with in 4955) [ClassicSimilarity], result of:
            0.017548895 = score(doc=4955,freq=3.0), product of:
              0.05188065 = queryWeight, product of:
                1.6511328 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012569839 = queryNorm
              0.3382551 = fieldWeight in 4955, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
          0.10274165 = weight(abstract_txt:documents in 4955) [ClassicSimilarity], result of:
            0.10274165 = score(doc=4955,freq=2.0), product of:
              0.22563529 = queryWeight, product of:
                4.355548 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.012569839 = queryNorm
              0.4553439 = fieldWeight in 4955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
          0.29274914 = weight(abstract_txt:passage in 4955) [ClassicSimilarity], result of:
            0.29274914 = score(doc=4955,freq=1.0), product of:
              0.4534956 = queryWeight, product of:
                4.3662724 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012569839 = queryNorm
              0.6455391 = fieldWeight in 4955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
          0.41895804 = weight(abstract_txt:passages in 4955) [ClassicSimilarity], result of:
            0.41895804 = score(doc=4955,freq=2.0), product of:
              0.45710188 = queryWeight, product of:
                4.383599 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.012569839 = queryNorm
              0.91655284 = fieldWeight in 4955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.078125 = fieldNorm(doc=4955)
        0.24 = coord(6/25)
    
  3. Wan, X.; Yang, J.; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks (2008) 0.20
    0.1994604 = sum of:
      0.1994604 = product of:
        0.7123586 = sum of:
          0.10627273 = weight(abstract_txt:blocks in 2081) [ClassicSimilarity], result of:
            0.10627273 = score(doc=2081,freq=5.0), product of:
              0.09865533 = queryWeight, product of:
                1.0182503 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.012569839 = queryNorm
              1.0772122 = fieldWeight in 2081, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=2081)
          0.046437286 = weight(abstract_txt:document in 2081) [ClassicSimilarity], result of:
            0.046437286 = score(doc=2081,freq=8.0), product of:
              0.06119565 = queryWeight, product of:
                1.1341475 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012569839 = queryNorm
              0.7588331 = fieldWeight in 2081, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2081)
          0.042713895 = weight(abstract_txt:whole in 2081) [ClassicSimilarity], result of:
            0.042713895 = score(doc=2081,freq=1.0), product of:
              0.11575829 = queryWeight, product of:
                1.559859 = boost
                5.9038734 = idf(docFreq=327, maxDocs=44218)
                0.012569839 = queryNorm
              0.3689921 = fieldWeight in 2081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9038734 = idf(docFreq=327, maxDocs=44218)
                0.0625 = fieldNorm(doc=2081)
          0.035661563 = weight(abstract_txt:text in 2081) [ClassicSimilarity], result of:
            0.035661563 = score(doc=2081,freq=3.0), product of:
              0.08146347 = queryWeight, product of:
                1.6026415 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012569839 = queryNorm
              0.4377614 = fieldWeight in 2081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2081)
          0.008105488 = weight(abstract_txt:with in 2081) [ClassicSimilarity], result of:
            0.008105488 = score(doc=2081,freq=1.0), product of:
              0.05188065 = queryWeight, product of:
                1.6511328 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012569839 = queryNorm
              0.15623334 = fieldWeight in 2081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=2081)
          0.116238914 = weight(abstract_txt:documents in 2081) [ClassicSimilarity], result of:
            0.116238914 = score(doc=2081,freq=4.0), product of:
              0.22563529 = queryWeight, product of:
                4.355548 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.012569839 = queryNorm
              0.5151628 = fieldWeight in 2081, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=2081)
          0.35692874 = weight(abstract_txt:ranking in 2081) [ClassicSimilarity], result of:
            0.35692874 = score(doc=2081,freq=6.0), product of:
              0.4164184 = queryWeight, product of:
                5.9170365 = boost
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.012569839 = queryNorm
              0.8571397 = fieldWeight in 2081, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.0625 = fieldNorm(doc=2081)
        0.28 = coord(7/25)
    
  4. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.16
    0.15752788 = sum of:
      0.15752788 = product of:
        0.65636617 = sum of:
          0.084015965 = weight(abstract_txt:blocks in 4119) [ClassicSimilarity], result of:
            0.084015965 = score(doc=4119,freq=2.0), product of:
              0.09865533 = queryWeight, product of:
                1.0182503 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.012569839 = queryNorm
              0.851611 = fieldWeight in 4119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.078125 = fieldNorm(doc=4119)
          0.020522574 = weight(abstract_txt:document in 4119) [ClassicSimilarity], result of:
            0.020522574 = score(doc=4119,freq=1.0), product of:
              0.06119565 = queryWeight, product of:
                1.1341475 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012569839 = queryNorm
              0.33536002 = fieldWeight in 4119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=4119)
          0.037945725 = weight(abstract_txt:collections in 4119) [ClassicSimilarity], result of:
            0.037945725 = score(doc=4119,freq=2.0), product of:
              0.07316969 = queryWeight, product of:
                1.240152 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.012569839 = queryNorm
              0.5185989 = fieldWeight in 4119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.078125 = fieldNorm(doc=4119)
          0.05339237 = weight(abstract_txt:whole in 4119) [ClassicSimilarity], result of:
            0.05339237 = score(doc=4119,freq=1.0), product of:
              0.11575829 = queryWeight, product of:
                1.559859 = boost
                5.9038734 = idf(docFreq=327, maxDocs=44218)
                0.012569839 = queryNorm
              0.4612401 = fieldWeight in 4119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9038734 = idf(docFreq=327, maxDocs=44218)
                0.078125 = fieldNorm(doc=4119)
          0.014328613 = weight(abstract_txt:with in 4119) [ClassicSimilarity], result of:
            0.014328613 = score(doc=4119,freq=2.0), product of:
              0.05188065 = queryWeight, product of:
                1.6511328 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012569839 = queryNorm
              0.27618414 = fieldWeight in 4119, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=4119)
          0.4461609 = weight(abstract_txt:ranking in 4119) [ClassicSimilarity], result of:
            0.4461609 = score(doc=4119,freq=6.0), product of:
              0.4164184 = queryWeight, product of:
                5.9170365 = boost
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.012569839 = queryNorm
              1.0714246 = fieldWeight in 4119, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.078125 = fieldNorm(doc=4119)
        0.24 = coord(6/25)
    
  5. Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 0.15
    0.15017802 = sum of:
      0.15017802 = product of:
        1.2514834 = sum of:
          0.043676317 = weight(abstract_txt:text in 2450) [ClassicSimilarity], result of:
            0.043676317 = score(doc=2450,freq=2.0), product of:
              0.08146347 = queryWeight, product of:
                1.6026415 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012569839 = queryNorm
              0.53614604 = fieldWeight in 2450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=2450)
          0.49681178 = weight(abstract_txt:passage in 2450) [ClassicSimilarity], result of:
            0.49681178 = score(doc=2450,freq=2.0), product of:
              0.4534956 = queryWeight, product of:
                4.3662724 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012569839 = queryNorm
              1.0955162 = fieldWeight in 2450, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.09375 = fieldNorm(doc=2450)
          0.7109954 = weight(abstract_txt:passages in 2450) [ClassicSimilarity], result of:
            0.7109954 = score(doc=2450,freq=4.0), product of:
              0.45710188 = queryWeight, product of:
                4.383599 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.012569839 = queryNorm
              1.5554419 = fieldWeight in 2450, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.09375 = fieldNorm(doc=2450)
        0.12 = coord(3/25)