Document (#39701)

Author
Xu, D.
Cheng, G.
Qu, Y.
Title
Preferences in Wikipedia abstracts : empirical findings and implications for automatic entity summarization
Source
Information processing and management. 50(2014) no.2, S.284-296
Year
2014
Abstract
The volume of entity-centric structured data grows rapidly on the Web. The description of an entity, composed of property-value pairs (a.k.a. features), has become very large in many applications. To avoid information overload, efforts have been made to automatically select a limited number of features to be shown to the user based on certain criteria, which is called automatic entity summarization. However, to the best of our knowledge, there is a lack of extensive studies on how humans rank and select features in practice, which can provide empirical support and inspire future research. In this article, we present a large-scale statistical analysis of the descriptions of entities provided by DBpedia and the abstracts of their corresponding Wikipedia articles, to empirically study, along several different dimensions, which kinds of features are preferable when humans summarize. Implications for automatic entity summarization are drawn from the findings.
Content
Vgl.: doi: 10.1016/j.ipm.2013.12.001.
Theme
Automatisches Abstracting
Object
Wikipedia
DBpedia

Similar documents (author)

  1. Cheng, L.R.L.: Beyond bilingualism : a quest for communicative competence (1996) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:cheng in 5223) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 5223, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=5223)
    
  2. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 4.16
    4.164796 = sum of:
      4.164796 = weight(author_txt:cheng in 2188) [ClassicSimilarity], result of:
        4.164796 = fieldWeight in 2188, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.5 = fieldNorm(doc=2188)
    
  3. Cheng, K.-H.: Automatic identification for topics of electronic documents (1997) 4.16
    4.164796 = sum of:
      4.164796 = weight(author_txt:cheng in 1811) [ClassicSimilarity], result of:
        4.164796 = fieldWeight in 1811, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.5 = fieldNorm(doc=1811)
    
  4. Cheng, L.-y.: On bibliographic(al) control (1998) 4.16
    4.164796 = sum of:
      4.164796 = weight(author_txt:cheng in 3376) [ClassicSimilarity], result of:
        4.164796 = fieldWeight in 3376, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.5 = fieldNorm(doc=3376)
    
  5. Harter, S.P.; Cheng, Y.-R.: Colinked descriptors : improving vocabulary selection for end-user searching (1996) 3.64
    3.6441965 = sum of:
      3.6441965 = weight(author_txt:cheng in 4216) [ClassicSimilarity], result of:
        3.6441965 = fieldWeight in 4216, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.4375 = fieldNorm(doc=4216)
    

Similar documents (content)

  1. Aker, A.; Gaizauskas, R.: Generating descriptive multi-document summaries of geo-located entities using entity type models (2015) 0.22
    0.22366253 = sum of:
      0.22366253 = product of:
        1.1183126 = sum of:
          0.109776326 = weight(abstract_txt:wikipedia in 1726) [ClassicSimilarity], result of:
            0.109776326 = score(doc=1726,freq=2.0), product of:
              0.19816022 = queryWeight, product of:
                1.8894379 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.016733494 = queryNorm
              0.5539776 = fieldWeight in 1726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.06632844 = weight(abstract_txt:automatic in 1726) [ClassicSimilarity], result of:
            0.06632844 = score(doc=1726,freq=1.0), product of:
              0.20426045 = queryWeight, product of:
                2.3494282 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.016733494 = queryNorm
              0.32472485 = fieldWeight in 1726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.08340097 = weight(abstract_txt:features in 1726) [ClassicSimilarity], result of:
            0.08340097 = score(doc=1726,freq=2.0), product of:
              0.2078746 = queryWeight, product of:
                2.7367814 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.016733494 = queryNorm
              0.4012081 = fieldWeight in 1726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.3432067 = weight(abstract_txt:summarization in 1726) [ClassicSimilarity], result of:
            0.3432067 = score(doc=1726,freq=4.0), product of:
              0.3849476 = queryWeight, product of:
                3.2253053 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.016733494 = queryNorm
              0.89156735 = fieldWeight in 1726, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
          0.51560014 = weight(abstract_txt:entity in 1726) [ClassicSimilarity], result of:
            0.51560014 = score(doc=1726,freq=7.0), product of:
              0.49679437 = queryWeight, product of:
                4.730235 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.016733494 = queryNorm
              1.0378542 = fieldWeight in 1726, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=1726)
        0.2 = coord(5/25)
    
  2. Ou, S.; Khoo, C.S.G.; Goh, D.H.: Multi-document summarization of news articles using an event-based framework (2006) 0.11
    0.1104494 = sum of:
      0.1104494 = product of:
        0.46020585 = sum of:
          0.0767902 = weight(abstract_txt:summarize in 657) [ClassicSimilarity], result of:
            0.0767902 = score(doc=657,freq=1.0), product of:
              0.15615211 = queryWeight, product of:
                1.1859963 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.016733494 = queryNorm
              0.49176535 = fieldWeight in 657, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0625 = fieldNorm(doc=657)
          0.022533005 = weight(abstract_txt:findings in 657) [ClassicSimilarity], result of:
            0.022533005 = score(doc=657,freq=1.0), product of:
              0.08687605 = queryWeight, product of:
                1.2510493 = boost
                4.1499133 = idf(docFreq=1894, maxDocs=44218)
                0.016733494 = queryNorm
              0.25936958 = fieldWeight in 657, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1499133 = idf(docFreq=1894, maxDocs=44218)
                0.0625 = fieldNorm(doc=657)
          0.0117347855 = weight(abstract_txt:which in 657) [ClassicSimilarity], result of:
            0.0117347855 = score(doc=657,freq=1.0), product of:
              0.06437262 = queryWeight, product of:
                1.3189266 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.016733494 = queryNorm
              0.18229467 = fieldWeight in 657, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0625 = fieldNorm(doc=657)
          0.040135622 = weight(abstract_txt:implications in 657) [ClassicSimilarity], result of:
            0.040135622 = score(doc=657,freq=2.0), product of:
              0.1013205 = queryWeight, product of:
                1.351055 = boost
                4.481647 = idf(docFreq=1359, maxDocs=44218)
                0.016733494 = queryNorm
              0.39612538 = fieldWeight in 657, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.481647 = idf(docFreq=1359, maxDocs=44218)
                0.0625 = fieldNorm(doc=657)
          0.06632844 = weight(abstract_txt:automatic in 657) [ClassicSimilarity], result of:
            0.06632844 = score(doc=657,freq=1.0), product of:
              0.20426045 = queryWeight, product of:
                2.3494282 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.016733494 = queryNorm
              0.32472485 = fieldWeight in 657, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=657)
          0.2426838 = weight(abstract_txt:summarization in 657) [ClassicSimilarity], result of:
            0.2426838 = score(doc=657,freq=2.0), product of:
              0.3849476 = queryWeight, product of:
                3.2253053 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.016733494 = queryNorm
              0.6304333 = fieldWeight in 657, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=657)
        0.24 = coord(6/25)
    
  3. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.10
    0.104584165 = sum of:
      0.104584165 = product of:
        0.65365106 = sum of:
          0.0117347855 = weight(abstract_txt:which in 2693) [ClassicSimilarity], result of:
            0.0117347855 = score(doc=2693,freq=1.0), product of:
              0.06437262 = queryWeight, product of:
                1.3189266 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.016733494 = queryNorm
              0.18229467 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.15524717 = weight(abstract_txt:wikipedia in 2693) [ClassicSimilarity], result of:
            0.15524717 = score(doc=2693,freq=4.0), product of:
              0.19816022 = queryWeight, product of:
                1.8894379 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.016733494 = queryNorm
              0.7834427 = fieldWeight in 2693, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.06632844 = weight(abstract_txt:automatic in 2693) [ClassicSimilarity], result of:
            0.06632844 = score(doc=2693,freq=1.0), product of:
              0.20426045 = queryWeight, product of:
                2.3494282 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.016733494 = queryNorm
              0.32472485 = fieldWeight in 2693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
          0.42034066 = weight(abstract_txt:summarization in 2693) [ClassicSimilarity], result of:
            0.42034066 = score(doc=2693,freq=6.0), product of:
              0.3849476 = queryWeight, product of:
                3.2253053 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.016733494 = queryNorm
              1.0919425 = fieldWeight in 2693, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0625 = fieldNorm(doc=2693)
        0.16 = coord(4/25)
    
  4. Kannan, R.; Ghinea, G.; Swaminathan, S.: What do you wish to see? : A summarization system for movies based on user preferences (2015) 0.10
    0.09680885 = sum of:
      0.09680885 = product of:
        0.60505533 = sum of:
          0.09006351 = weight(abstract_txt:preferences in 2683) [ClassicSimilarity], result of:
            0.09006351 = score(doc=2683,freq=5.0), product of:
              0.11101489 = queryWeight, product of:
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.016733494 = queryNorm
              0.8112742 = fieldWeight in 2683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2683)
          0.014521057 = weight(abstract_txt:which in 2683) [ClassicSimilarity], result of:
            0.014521057 = score(doc=2683,freq=2.0), product of:
              0.06437262 = queryWeight, product of:
                1.3189266 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.016733494 = queryNorm
              0.22557814 = fieldWeight in 2683, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2683)
          0.10320344 = weight(abstract_txt:features in 2683) [ClassicSimilarity], result of:
            0.10320344 = score(doc=2683,freq=4.0), product of:
              0.2078746 = queryWeight, product of:
                2.7367814 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.016733494 = queryNorm
              0.4964697 = fieldWeight in 2683, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2683)
          0.3972673 = weight(abstract_txt:summarization in 2683) [ClassicSimilarity], result of:
            0.3972673 = score(doc=2683,freq=7.0), product of:
              0.3849476 = queryWeight, product of:
                3.2253053 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.016733494 = queryNorm
              1.0320036 = fieldWeight in 2683, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2683)
        0.16 = coord(4/25)
    
  5. Kar, M.; Nunes, S.; Ribeiro, C.: Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model (2015) 0.10
    0.09650748 = sum of:
      0.09650748 = product of:
        0.4825374 = sum of:
          0.00880109 = weight(abstract_txt:which in 2676) [ClassicSimilarity], result of:
            0.00880109 = score(doc=2676,freq=1.0), product of:
              0.06437262 = queryWeight, product of:
                1.3189266 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.016733494 = queryNorm
              0.136721 = fieldWeight in 2676, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.046875 = fieldNorm(doc=2676)
          0.057379086 = weight(abstract_txt:select in 2676) [ClassicSimilarity], result of:
            0.057379086 = score(doc=2676,freq=1.0), product of:
              0.19625266 = queryWeight, product of:
                1.8803217 = boost
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.016733494 = queryNorm
              0.29237354 = fieldWeight in 2676, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.237302 = idf(docFreq=234, maxDocs=44218)
                0.046875 = fieldNorm(doc=2676)
          0.058217686 = weight(abstract_txt:wikipedia in 2676) [ClassicSimilarity], result of:
            0.058217686 = score(doc=2676,freq=1.0), product of:
              0.19816022 = queryWeight, product of:
                1.8894379 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.016733494 = queryNorm
              0.293791 = fieldWeight in 2676, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.046875 = fieldNorm(doc=2676)
          0.070351936 = weight(abstract_txt:automatic in 2676) [ClassicSimilarity], result of:
            0.070351936 = score(doc=2676,freq=2.0), product of:
              0.20426045 = queryWeight, product of:
                2.3494282 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.016733494 = queryNorm
              0.3444227 = fieldWeight in 2676, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.046875 = fieldNorm(doc=2676)
          0.2877876 = weight(abstract_txt:summarization in 2676) [ClassicSimilarity], result of:
            0.2877876 = score(doc=2676,freq=5.0), product of:
              0.3849476 = queryWeight, product of:
                3.2253053 = boost
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.016733494 = queryNorm
              0.747602 = fieldWeight in 2676, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.132539 = idf(docFreq=95, maxDocs=44218)
                0.046875 = fieldNorm(doc=2676)
        0.2 = coord(5/25)