Document (#15671)

Author
Goh, A.
Hui, S.C.
Chan, S.K.
Title
¬A text extraction system for news reports
Source
Asian libraries. 5(1996) no.1, S.34-42
Year
1996
Abstract
Describes the design and implementation of a text extraction tool, NEWS_EXT, which aztomatically produces summaries from news reports by extracting sentences to form indicative abstracts. Selection of sentences is based on sentence importance, measured by means of sentence scoring or simple linguistic analysis of sentence structure. Tests were conducted on 4 approaches for the functioning of the NEWS_EXT system; extraction by keyword frequency; extraction by title keywords; extraction by location; and extraction by indicative phrase. Reports results of a study to compare the results of the application of NEWS_EXT with manually produced extracts; using relevance as the criterion for effectiveness. 48 newspaper articles were assessed (The Straits Times, International Herald Tribune, Asian Wall Street Journal, and Financial Times). The evaluation was conducted in 2 stages: stage 1 involving abstracts produced manually by 2 human experts; stage 2 involving the generation of abstracts using NEWS_EXT. Results of each of the 4 approaches were compared with the human produced abstracts, where the title and location approaches were found to give the best results for both local and foreign news. Reports plans to refine and enhance NEWS_EXT and incorporate it as a module within a larger newspaper clipping system
Theme
Automatisches Abstracting
Object
NEWS_EXT

Similar documents (author)

  1. Chan, L.M.: Year's work in cataloging and classification : 1975 (1976) 4.56
    4.564024 = sum of:
      4.564024 = weight(author_txt:chan in 307) [ClassicSimilarity], result of:
        4.564024 = fieldWeight in 307, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.3024383 = idf(docFreq=80, maxDocs=44218)
          0.625 = fieldNorm(doc=307)
    
  2. Chan, L.M.: 'American poetry' but 'Satire, American' : the direct and inverted forms of subject headings containing national adjectives (1973) 4.56
    4.564024 = sum of:
      4.564024 = weight(author_txt:chan in 382) [ClassicSimilarity], result of:
        4.564024 = fieldWeight in 382, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.3024383 = idf(docFreq=80, maxDocs=44218)
          0.625 = fieldNorm(doc=382)
    
  3. Chan, L.M.: Library of Congress Classification as an online retrieval tool : potentials and limitations (1986) 4.56
    4.564024 = sum of:
      4.564024 = weight(author_txt:chan in 1145) [ClassicSimilarity], result of:
        4.564024 = fieldWeight in 1145, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.3024383 = idf(docFreq=80, maxDocs=44218)
          0.625 = fieldNorm(doc=1145)
    
  4. Chan, L.M.: Library of Congress class numbers in online catalog searching (1989) 4.56
    4.564024 = sum of:
      4.564024 = weight(author_txt:chan in 1146) [ClassicSimilarity], result of:
        4.564024 = fieldWeight in 1146, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.3024383 = idf(docFreq=80, maxDocs=44218)
          0.625 = fieldNorm(doc=1146)
    
  5. Chan, L.M.: Dewey 18: another step in an evolutionary step (1972) 4.56
    4.564024 = sum of:
      4.564024 = weight(author_txt:chan in 1780) [ClassicSimilarity], result of:
        4.564024 = fieldWeight in 1780, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.3024383 = idf(docFreq=80, maxDocs=44218)
          0.625 = fieldNorm(doc=1780)
    

Similar documents (content)

  1. Goh, A.; Hui, S.C.: TES: a text extraction system (1996) 0.44
    0.43810508 = sum of:
      0.43810508 = product of:
        1.3690784 = sum of:
          0.026210152 = weight(abstract_txt:system in 6599) [ClassicSimilarity], result of:
            0.026210152 = score(doc=6599,freq=1.0), product of:
              0.07105983 = queryWeight, product of:
                1.2337143 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.01707978 = queryNorm
              0.36884624 = fieldWeight in 6599, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
          0.13380723 = weight(abstract_txt:manually in 6599) [ClassicSimilarity], result of:
            0.13380723 = score(doc=6599,freq=1.0), product of:
              0.18404913 = queryWeight, product of:
                1.6211522 = boost
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.01707978 = queryNorm
              0.7270191 = fieldWeight in 6599, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6470313 = idf(docFreq=155, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
          0.03848315 = weight(abstract_txt:results in 6599) [ClassicSimilarity], result of:
            0.03848315 = score(doc=6599,freq=1.0), product of:
              0.101034835 = queryWeight, product of:
                1.698664 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01707978 = queryNorm
              0.38088992 = fieldWeight in 6599, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
          0.15603484 = weight(abstract_txt:sentences in 6599) [ClassicSimilarity], result of:
            0.15603484 = score(doc=6599,freq=1.0), product of:
              0.20390527 = queryWeight, product of:
                1.7063618 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.01707978 = queryNorm
              0.765232 = fieldWeight in 6599, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
          0.24587592 = weight(abstract_txt:indicative in 6599) [ClassicSimilarity], result of:
            0.24587592 = score(doc=6599,freq=1.0), product of:
              0.2761159 = queryWeight, product of:
                1.9856495 = boost
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.01707978 = queryNorm
              0.8904809 = fieldWeight in 6599, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
          0.11659882 = weight(abstract_txt:produced in 6599) [ClassicSimilarity], result of:
            0.11659882 = score(doc=6599,freq=1.0), product of:
              0.1922091 = queryWeight, product of:
                2.0290349 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.01707978 = queryNorm
              0.6066249 = fieldWeight in 6599, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
          0.19325915 = weight(abstract_txt:abstracts in 6599) [ClassicSimilarity], result of:
            0.19325915 = score(doc=6599,freq=1.0), product of:
              0.2962903 = queryWeight, product of:
                2.9089122 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.01707978 = queryNorm
              0.6522628 = fieldWeight in 6599, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
          0.45880908 = weight(abstract_txt:extraction in 6599) [ClassicSimilarity], result of:
            0.45880908 = score(doc=6599,freq=2.0), product of:
              0.4790702 = queryWeight, product of:
                4.530197 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.01707978 = queryNorm
              0.9577074 = fieldWeight in 6599, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.109375 = fieldNorm(doc=6599)
        0.32 = coord(8/25)
    
  2. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.20
    0.19580428 = sum of:
      0.19580428 = product of:
        0.699301 = sum of:
          0.03803197 = weight(abstract_txt:human in 1719) [ClassicSimilarity], result of:
            0.03803197 = score(doc=1719,freq=2.0), product of:
              0.091705486 = queryWeight, product of:
                1.1443386 = boost
                4.692005 = idf(docFreq=1101, maxDocs=44218)
                0.01707978 = queryNorm
              0.41471857 = fieldWeight in 1719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.692005 = idf(docFreq=1101, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.030799855 = weight(abstract_txt:conducted in 1719) [ClassicSimilarity], result of:
            0.030799855 = score(doc=1719,freq=1.0), product of:
              0.1003857 = queryWeight, product of:
                1.1972721 = boost
                4.9090424 = idf(docFreq=886, maxDocs=44218)
                0.01707978 = queryNorm
              0.30681515 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9090424 = idf(docFreq=886, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.021990372 = weight(abstract_txt:results in 1719) [ClassicSimilarity], result of:
            0.021990372 = score(doc=1719,freq=1.0), product of:
              0.101034835 = queryWeight, product of:
                1.698664 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01707978 = queryNorm
              0.21765138 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.15443444 = weight(abstract_txt:sentences in 1719) [ClassicSimilarity], result of:
            0.15443444 = score(doc=1719,freq=3.0), product of:
              0.20390527 = queryWeight, product of:
                1.7063618 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.01707978 = queryNorm
              0.7573833 = fieldWeight in 1719, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.0666279 = weight(abstract_txt:produced in 1719) [ClassicSimilarity], result of:
            0.0666279 = score(doc=1719,freq=1.0), product of:
              0.1922091 = queryWeight, product of:
                2.0290349 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.01707978 = queryNorm
              0.3466428 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.1252399 = weight(abstract_txt:sentence in 1719) [ClassicSimilarity], result of:
            0.1252399 = score(doc=1719,freq=1.0), product of:
              0.29275095 = queryWeight, product of:
                2.5041 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.01707978 = queryNorm
              0.42780355 = fieldWeight in 1719, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
          0.2621766 = weight(abstract_txt:extraction in 1719) [ClassicSimilarity], result of:
            0.2621766 = score(doc=1719,freq=2.0), product of:
              0.4790702 = queryWeight, product of:
                4.530197 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.01707978 = queryNorm
              0.54726136 = fieldWeight in 1719, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=1719)
        0.28 = coord(7/25)
    
  3. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.15
    0.15013085 = sum of:
      0.15013085 = product of:
        0.62554526 = sum of:
          0.033444937 = weight(abstract_txt:approaches in 3949) [ClassicSimilarity], result of:
            0.033444937 = score(doc=3949,freq=1.0), product of:
              0.13270423 = queryWeight, product of:
                1.6859508 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.01707978 = queryNorm
              0.25202617 = fieldWeight in 3949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.019241575 = weight(abstract_txt:results in 3949) [ClassicSimilarity], result of:
            0.019241575 = score(doc=3949,freq=1.0), product of:
              0.101034835 = queryWeight, product of:
                1.698664 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01707978 = queryNorm
              0.19044496 = fieldWeight in 3949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.13513014 = weight(abstract_txt:sentences in 3949) [ClassicSimilarity], result of:
            0.13513014 = score(doc=3949,freq=3.0), product of:
              0.20390527 = queryWeight, product of:
                1.7063618 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.01707978 = queryNorm
              0.66271037 = fieldWeight in 3949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.031851668 = weight(abstract_txt:were in 3949) [ClassicSimilarity], result of:
            0.031851668 = score(doc=3949,freq=2.0), product of:
              0.11221619 = queryWeight, product of:
                1.7901921 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.01707978 = queryNorm
              0.283842 = fieldWeight in 3949, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.18980664 = weight(abstract_txt:sentence in 3949) [ClassicSimilarity], result of:
            0.18980664 = score(doc=3949,freq=3.0), product of:
              0.29275095 = queryWeight, product of:
                2.5041 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.01707978 = queryNorm
              0.6483553 = fieldWeight in 3949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.21607031 = weight(abstract_txt:abstracts in 3949) [ClassicSimilarity], result of:
            0.21607031 = score(doc=3949,freq=5.0), product of:
              0.2962903 = queryWeight, product of:
                2.9089122 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.01707978 = queryNorm
              0.72925204 = fieldWeight in 3949, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
        0.24 = coord(6/25)
    
  4. Ou, S.; Khoo, S.G.; Goh, D.H.: Automatic multidocument summarization of research abstracts : design and user evaluation (2007) 0.15
    0.14964436 = sum of:
      0.14964436 = product of:
        0.62351817 = sum of:
          0.01497723 = weight(abstract_txt:system in 522) [ClassicSimilarity], result of:
            0.01497723 = score(doc=522,freq=1.0), product of:
              0.07105983 = queryWeight, product of:
                1.2337143 = boost
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.01707978 = queryNorm
              0.21076928 = fieldWeight in 522, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3723085 = idf(docFreq=4123, maxDocs=44218)
                0.0625 = fieldNorm(doc=522)
          0.021990372 = weight(abstract_txt:results in 522) [ClassicSimilarity], result of:
            0.021990372 = score(doc=522,freq=1.0), product of:
              0.101034835 = queryWeight, product of:
                1.698664 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01707978 = queryNorm
              0.21765138 = fieldWeight in 522, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=522)
          0.12609519 = weight(abstract_txt:sentences in 522) [ClassicSimilarity], result of:
            0.12609519 = score(doc=522,freq=2.0), product of:
              0.20390527 = queryWeight, product of:
                1.7063618 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.01707978 = queryNorm
              0.6184009 = fieldWeight in 522, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=522)
          0.036401905 = weight(abstract_txt:were in 522) [ClassicSimilarity], result of:
            0.036401905 = score(doc=522,freq=2.0), product of:
              0.11221619 = queryWeight, product of:
                1.7901921 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.01707978 = queryNorm
              0.32439086 = fieldWeight in 522, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=522)
          0.17711596 = weight(abstract_txt:sentence in 522) [ClassicSimilarity], result of:
            0.17711596 = score(doc=522,freq=2.0), product of:
              0.29275095 = queryWeight, product of:
                2.5041 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.01707978 = queryNorm
              0.60500556 = fieldWeight in 522, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=522)
          0.2469375 = weight(abstract_txt:abstracts in 522) [ClassicSimilarity], result of:
            0.2469375 = score(doc=522,freq=5.0), product of:
              0.2962903 = queryWeight, product of:
                2.9089122 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.01707978 = queryNorm
              0.8334309 = fieldWeight in 522, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.0625 = fieldNorm(doc=522)
        0.24 = coord(6/25)
    
  5. Ling, X.; Jiang, J.; He, X.; Mei, Q.; Zhai, C.; Schatz, B.: Generating gene summaries from biomedical literature : a study of semi-structured summarization (2007) 0.14
    0.14371164 = sum of:
      0.14371164 = product of:
        0.7185582 = sum of:
          0.102840826 = weight(abstract_txt:stage in 946) [ClassicSimilarity], result of:
            0.102840826 = score(doc=946,freq=3.0), product of:
              0.15549247 = queryWeight, product of:
                1.4900868 = boost
                6.1096387 = idf(docFreq=266, maxDocs=44218)
                0.01707978 = queryNorm
              0.6613878 = fieldWeight in 946, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1096387 = idf(docFreq=266, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.021990372 = weight(abstract_txt:results in 946) [ClassicSimilarity], result of:
            0.021990372 = score(doc=946,freq=1.0), product of:
              0.101034835 = queryWeight, product of:
                1.698664 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01707978 = queryNorm
              0.21765138 = fieldWeight in 946, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.15443444 = weight(abstract_txt:sentences in 946) [ClassicSimilarity], result of:
            0.15443444 = score(doc=946,freq=3.0), product of:
              0.20390527 = queryWeight, product of:
                1.7063618 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.01707978 = queryNorm
              0.7573833 = fieldWeight in 946, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.17711596 = weight(abstract_txt:sentence in 946) [ClassicSimilarity], result of:
            0.17711596 = score(doc=946,freq=2.0), product of:
              0.29275095 = queryWeight, product of:
                2.5041 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.01707978 = queryNorm
              0.60500556 = fieldWeight in 946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
          0.2621766 = weight(abstract_txt:extraction in 946) [ClassicSimilarity], result of:
            0.2621766 = score(doc=946,freq=2.0), product of:
              0.4790702 = queryWeight, product of:
                4.530197 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.01707978 = queryNorm
              0.54726136 = fieldWeight in 946, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=946)
        0.2 = coord(5/25)