Document (#33027)

Author
Moens, M.-F.
Angheluta, R.
Dumortier, J.
Title
Generic technologies for single-and multi-document summarization
Source
Information processing and management. 41(2005) no.3, S.569-586
Year
2005
Abstract
The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.

Similar documents (author)

  1. Moens, M.F.: Automatic indexing and abstracting of document texts (2000) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:moens in 6892) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 6892, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=6892)
    
  2. Moens, M.F.; Dumortier, J.: Use of a text grammar for generating highlight abstracts of magazine articles (2000) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:moens in 4540) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 4540, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=4540)
    
  3. Moens, M.-F.: Summarizing court decisions (2007) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:moens in 954) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 954, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=954)
    
  4. Moens, M.-F.; Dumortier, J.: Text categorization : the assignment of subject descriptors to magazine articles (2000) 4.11
    4.1100073 = sum of:
      4.1100073 = weight(author_txt:moens in 3329) [ClassicSimilarity], result of:
        4.1100073 = fieldWeight in 3329, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.4375 = fieldNorm(doc=3329)
    
  5. Moens, M.-F.; Uyttendaele, C.: Automatic text structuring and categorization as a first step in summarizing legal cases (1997) 4.11
    4.1100073 = sum of:
      4.1100073 = weight(author_txt:moens in 2256) [ClassicSimilarity], result of:
        4.1100073 = fieldWeight in 2256, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.4375 = fieldNorm(doc=2256)
    

Similar documents (content)

  1. Wan, X.; Yang, J.; Xiao, J.: Incorporating cross-document relationships between sentences for single document summarizations (2006) 0.28
    0.28475446 = sum of:
      0.28475446 = product of:
        1.186477 = sum of:
          0.04709801 = weight(abstract_txt:2002 in 2421) [ClassicSimilarity], result of:
            0.04709801 = score(doc=2421,freq=1.0), product of:
              0.0966531 = queryWeight, product of:
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.015495978 = queryNorm
              0.4872892 = fieldWeight in 2421, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.078125 = fieldNorm(doc=2421)
          0.01639406 = weight(abstract_txt:results in 2421) [ClassicSimilarity], result of:
            0.01639406 = score(doc=2421,freq=1.0), product of:
              0.060258046 = queryWeight, product of:
                1.1166438 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015495978 = queryNorm
              0.27206424 = fieldWeight in 2421, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2421)
          0.09588424 = weight(abstract_txt:single in 2421) [ClassicSimilarity], result of:
            0.09588424 = score(doc=2421,freq=3.0), product of:
              0.13562772 = queryWeight, product of:
                1.6752566 = boost
                5.2245407 = idf(docFreq=646, maxDocs=44218)
                0.015495978 = queryNorm
              0.7069664 = fieldWeight in 2421, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2245407 = idf(docFreq=646, maxDocs=44218)
                0.078125 = fieldNorm(doc=2421)
          0.13026936 = weight(abstract_txt:document in 2421) [ClassicSimilarity], result of:
            0.13026936 = score(doc=2421,freq=8.0), product of:
              0.1373365 = queryWeight, product of:
                2.0646467 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.015495978 = queryNorm
              0.9485414 = fieldWeight in 2421, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2421)
          0.39883083 = weight(abstract_txt:sentences in 2421) [ClassicSimilarity], result of:
            0.39883083 = score(doc=2421,freq=4.0), product of:
              0.36483258 = queryWeight, product of:
                3.3651123 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.015495978 = queryNorm
              1.0931886 = fieldWeight in 2421, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.078125 = fieldNorm(doc=2421)
          0.4980004 = weight(abstract_txt:summarization in 2421) [ClassicSimilarity], result of:
            0.4980004 = score(doc=2421,freq=2.0), product of:
              0.6319468 = queryWeight, product of:
                5.7176476 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.015495978 = queryNorm
              0.78804165 = fieldWeight in 2421, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=2421)
        0.24 = coord(6/25)
    
  2. Agarwal, B.; Ramampiaro, H.; Langseth, H.; Ruocco, M.: ¬A deep network model for paraphrase detection in short text messages (2018) 0.24
    0.24452522 = sum of:
      0.24452522 = product of:
        0.87330437 = sum of:
          0.013115247 = weight(abstract_txt:results in 5043) [ClassicSimilarity], result of:
            0.013115247 = score(doc=5043,freq=1.0), product of:
              0.060258046 = queryWeight, product of:
                1.1166438 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015495978 = queryNorm
              0.21765138 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.07060017 = weight(abstract_txt:detecting in 5043) [ClassicSimilarity], result of:
            0.07060017 = score(doc=5043,freq=1.0), product of:
              0.14690094 = queryWeight, product of:
                1.2328334 = boost
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.015495978 = queryNorm
              0.48059714 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.07519003 = weight(abstract_txt:good in 5043) [ClassicSimilarity], result of:
            0.07519003 = score(doc=5043,freq=2.0), product of:
              0.15320076 = queryWeight, product of:
                1.7804819 = boost
                5.5527015 = idf(docFreq=465, maxDocs=44218)
                0.015495978 = queryNorm
              0.4907941 = fieldWeight in 5043, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5527015 = idf(docFreq=465, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.12552878 = weight(abstract_txt:texts in 5043) [ClassicSimilarity], result of:
            0.12552878 = score(doc=5043,freq=5.0), product of:
              0.15885575 = queryWeight, product of:
                1.813045 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.015495978 = queryNorm
              0.7902061 = fieldWeight in 5043, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.08154575 = weight(abstract_txt:generic in 5043) [ClassicSimilarity], result of:
            0.08154575 = score(doc=5043,freq=1.0), product of:
              0.20375031 = queryWeight, product of:
                2.0533183 = boost
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.015495978 = queryNorm
              0.4002239 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.22561279 = weight(abstract_txt:sentences in 5043) [ClassicSimilarity], result of:
            0.22561279 = score(doc=5043,freq=2.0), product of:
              0.36483258 = queryWeight, product of:
                3.3651123 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.015495978 = queryNorm
              0.6184009 = fieldWeight in 5043, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
          0.28171158 = weight(abstract_txt:summarization in 5043) [ClassicSimilarity], result of:
            0.28171158 = score(doc=5043,freq=1.0), product of:
              0.6319468 = queryWeight, product of:
                5.7176476 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.015495978 = queryNorm
              0.44578367 = fieldWeight in 5043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=5043)
        0.28 = coord(7/25)
    
  3. Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization (2009) 0.24
    0.24136777 = sum of:
      0.24136777 = product of:
        1.2068388 = sum of:
          0.01967287 = weight(abstract_txt:results in 2448) [ClassicSimilarity], result of:
            0.01967287 = score(doc=2448,freq=1.0), product of:
              0.060258046 = queryWeight, product of:
                1.1166438 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015495978 = queryNorm
              0.32647708 = fieldWeight in 2448, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.09375 = fieldNorm(doc=2448)
          0.17298464 = weight(abstract_txt:generic in 2448) [ClassicSimilarity], result of:
            0.17298464 = score(doc=2448,freq=2.0), product of:
              0.20375031 = queryWeight, product of:
                2.0533183 = boost
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.015495978 = queryNorm
              0.8490031 = fieldWeight in 2448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.09375 = fieldNorm(doc=2448)
          0.07816161 = weight(abstract_txt:document in 2448) [ClassicSimilarity], result of:
            0.07816161 = score(doc=2448,freq=2.0), product of:
              0.1373365 = queryWeight, product of:
                2.0646467 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.015495978 = queryNorm
              0.5691248 = fieldWeight in 2448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=2448)
          0.3384192 = weight(abstract_txt:sentences in 2448) [ClassicSimilarity], result of:
            0.3384192 = score(doc=2448,freq=2.0), product of:
              0.36483258 = queryWeight, product of:
                3.3651123 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.015495978 = queryNorm
              0.92760134 = fieldWeight in 2448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.09375 = fieldNorm(doc=2448)
          0.59760046 = weight(abstract_txt:summarization in 2448) [ClassicSimilarity], result of:
            0.59760046 = score(doc=2448,freq=2.0), product of:
              0.6319468 = queryWeight, product of:
                5.7176476 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.015495978 = queryNorm
              0.94565 = fieldWeight in 2448, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.09375 = fieldNorm(doc=2448)
        0.2 = coord(5/25)
    
  4. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.24
    0.23829202 = sum of:
      0.23829202 = product of:
        0.99288344 = sum of:
          0.01854776 = weight(abstract_txt:results in 2693) [ClassicSimilarity], result of:
            0.01854776 = score(doc=2693,freq=2.0), product of:
              0.060258046 = queryWeight, product of:
                1.1166438 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015495978 = queryNorm
              0.30780554 = fieldWeight in 2693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.022679482 = weight(abstract_txt:content in 2693) [ClassicSimilarity], result of:
            0.022679482 = score(doc=2693,freq=1.0), product of:
              0.08681328 = queryWeight, product of:
                1.3402944 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.015495978 = queryNorm
              0.2612444 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.06522853 = weight(abstract_txt:multi in 2693) [ClassicSimilarity], result of:
            0.06522853 = score(doc=2693,freq=1.0), product of:
              0.17557222 = queryWeight, product of:
                1.9060532 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.015495978 = queryNorm
              0.37151965 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.03684574 = weight(abstract_txt:document in 2693) [ClassicSimilarity], result of:
            0.03684574 = score(doc=2693,freq=1.0), product of:
              0.1373365 = queryWeight, product of:
                2.0646467 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.015495978 = queryNorm
              0.26828802 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.15953232 = weight(abstract_txt:sentences in 2693) [ClassicSimilarity], result of:
            0.15953232 = score(doc=2693,freq=1.0), product of:
              0.36483258 = queryWeight, product of:
                3.3651123 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.015495978 = queryNorm
              0.43727544 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.6900496 = weight(abstract_txt:summarization in 2693) [ClassicSimilarity], result of:
            0.6900496 = score(doc=2693,freq=6.0), product of:
              0.6319468 = queryWeight, product of:
                5.7176476 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.015495978 = queryNorm
              1.0919425 = fieldWeight in 2693, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
        0.24 = coord(6/25)
    
  5. Zajic, D.M.; Dorr, B.J.; Lin, J.: Single-document and multi-document summarization techniques for email threads using sentence compression (2008) 0.23
    0.23001933 = sum of:
      0.23001933 = product of:
        1.1500967 = sum of:
          0.01639406 = weight(abstract_txt:results in 2105) [ClassicSimilarity], result of:
            0.01639406 = score(doc=2105,freq=1.0), product of:
              0.060258046 = queryWeight, product of:
                1.1166438 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.015495978 = queryNorm
              0.27206424 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.05535879 = weight(abstract_txt:single in 2105) [ClassicSimilarity], result of:
            0.05535879 = score(doc=2105,freq=1.0), product of:
              0.13562772 = queryWeight, product of:
                1.6752566 = boost
                5.2245407 = idf(docFreq=646, maxDocs=44218)
                0.015495978 = queryNorm
              0.40816724 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2245407 = idf(docFreq=646, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.08153566 = weight(abstract_txt:multi in 2105) [ClassicSimilarity], result of:
            0.08153566 = score(doc=2105,freq=1.0), product of:
              0.17557222 = queryWeight, product of:
                1.9060532 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.015495978 = queryNorm
              0.46439958 = fieldWeight in 2105, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.06513468 = weight(abstract_txt:document in 2105) [ClassicSimilarity], result of:
            0.06513468 = score(doc=2105,freq=2.0), product of:
              0.1373365 = queryWeight, product of:
                2.0646467 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.015495978 = queryNorm
              0.4742707 = fieldWeight in 2105, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
          0.93167347 = weight(abstract_txt:summarization in 2105) [ClassicSimilarity], result of:
            0.93167347 = score(doc=2105,freq=7.0), product of:
              0.6319468 = queryWeight, product of:
                5.7176476 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.015495978 = queryNorm
              1.474291 = fieldWeight in 2105, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.078125 = fieldNorm(doc=2105)
        0.2 = coord(5/25)