Document (#13158)

Author
Pirkola, A.
Jarvelin, K.
Title
¬The effect of anaphor and ellipsis resolution on proximity searching in a text database
Source
Information processing and management. 32(1996) no.2, S.199-216
Year
1995
Abstract
So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval of scientific abstracts. No significant improvements has been observed. Analyzes the effects of ellipsis and anaphor resolution on proximity searching in a full text database. Anaphora and ellipsis are classified on the basis of the type of their correlates / antecedents rather than, as traditional, on the basis of their own linguistic type. The classification differentiates proper names and common nouns of basic words, compound words, and phrases. The study was carried out in a newspaper article database containing 55.000 full text articles. A set of 154 keyword pairs in different categories was created. Human resolution of keyword ellipsis and anaphora was performed to identify sentences and paragraphs which would match proximity searches after resolution. Findings indicate that ellipsis and anaphor resolution is most relevant for proper name phrases and only marginal in the other keyword categories. Therefore the recall effect of restricted resolution of proper name phrases only was analyzed for keyword pairs containing at least 1 proper name phrase. Findings indicate a recall increase of 38.2% in sentence searches, and 28.8% in paragraph searches when proper name ellipsis were resolved. The recall increase was 17.6% sentence searches, and 19.8% in paragraph searches when proper name anaphora were resolved. Some simple and computationally justifiable resolution method might be developed only for proper name phrases to support keyword based full text information retrieval. Discusses elements of such a method
Theme
Retrievalstudien
Volltextretrieval

Similar documents (author)

  1. Pirkola, A.: Morphological typology of languages for IR (2001) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:pirkola in 4476) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 4476, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=4476)
    
  2. Pirkola, A.: Constructing topic-specific search keyphrase suggestion tools for Web information retrieval (2010) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:pirkola in 4665) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 4665, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=4665)
    
  3. Pirkola, A.; Järvelin, K.: Employing the resolution power of search keys (2001) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:pirkola in 5907) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 5907, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=5907)
    
  4. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:pirkola in 1074) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 1074, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=1074)
    
  5. Pirkola, A.; Hedlund, T.; Keskustalo, H.; Järvelin, K.: Dictionary-based cross-language information retrieval : problems, methods, and research findings (2001) 2.97
    2.9686446 = sum of:
      2.9686446 = weight(author_txt:pirkola in 3908) [ClassicSimilarity], result of:
        2.9686446 = fieldWeight in 3908, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.3125 = fieldNorm(doc=3908)
    

Similar documents (content)

  1. Steinberger, J.; Poesio, M.; Kabadjov, M.A.; Jezek, K.: Two uses of anaphora resolution in summarization (2007) 0.34
    0.34496117 = sum of:
      0.34496117 = product of:
        2.1560073 = sum of:
          0.06181425 = weight(abstract_txt:sentence in 949) [ClassicSimilarity], result of:
            0.06181425 = score(doc=949,freq=1.0), product of:
              0.1155937 = queryWeight, product of:
                1.2886394 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.013105042 = queryNorm
              0.53475446 = fieldWeight in 949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
          1.3486778 = weight(title_txt:anaphora in 949) [ClassicSimilarity], result of:
            1.3486778 = score(doc=949,freq=1.0), product of:
              0.36309215 = queryWeight, product of:
                2.7971668 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.013105042 = queryNorm
              3.7144227 = fieldWeight in 949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.375 = fieldNorm(doc=949)
          0.18802789 = weight(abstract_txt:proper in 949) [ClassicSimilarity], result of:
            0.18802789 = score(doc=949,freq=1.0), product of:
              0.36845064 = queryWeight, product of:
                4.304156 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.013105042 = queryNorm
              0.5103204 = fieldWeight in 949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
          0.5574876 = weight(abstract_txt:resolution in 949) [ClassicSimilarity], result of:
            0.5574876 = score(doc=949,freq=3.0), product of:
              0.5733182 = queryWeight, product of:
                6.087913 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.013105042 = queryNorm
              0.97238785 = fieldWeight in 949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.078125 = fieldNorm(doc=949)
        0.16 = coord(4/25)
    
  2. Colavizza, G.; Boyack, K.W.; Eck, N.J. van; Waltman, L.: ¬The closer the better : similarity of publication pairs at different cocitation levels (2018) 0.17
    0.16622227 = sum of:
      0.16622227 = product of:
        0.593651 = sum of:
          0.029653428 = weight(abstract_txt:indicate in 4214) [ClassicSimilarity], result of:
            0.029653428 = score(doc=4214,freq=1.0), product of:
              0.070836715 = queryWeight, product of:
                1.0087732 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.013105042 = queryNorm
              0.41861665 = fieldWeight in 4214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.034116764 = weight(abstract_txt:increase in 4214) [ClassicSimilarity], result of:
            0.034116764 = score(doc=4214,freq=1.0), product of:
              0.07777745 = queryWeight, product of:
                1.0570394 = boost
                5.6146684 = idf(docFreq=437, maxDocs=44218)
                0.013105042 = queryNorm
              0.43864596 = fieldWeight in 4214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6146684 = idf(docFreq=437, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.060581464 = weight(abstract_txt:pairs in 4214) [ClassicSimilarity], result of:
            0.060581464 = score(doc=4214,freq=1.0), product of:
              0.11405167 = queryWeight, product of:
                1.2800152 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.013105042 = queryNorm
              0.5311756 = fieldWeight in 4214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.08741854 = weight(abstract_txt:sentence in 4214) [ClassicSimilarity], result of:
            0.08741854 = score(doc=4214,freq=2.0), product of:
              0.1155937 = queryWeight, product of:
                1.2886394 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.013105042 = queryNorm
              0.75625694 = fieldWeight in 4214, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.02549271 = weight(abstract_txt:text in 4214) [ClassicSimilarity], result of:
            0.02549271 = score(doc=4214,freq=1.0), product of:
              0.08069175 = queryWeight, product of:
                1.5226282 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013105042 = queryNorm
              0.3159271 = fieldWeight in 4214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.24810107 = weight(abstract_txt:paragraph in 4214) [ClassicSimilarity], result of:
            0.24810107 = score(doc=4214,freq=3.0), product of:
              0.20242019 = queryWeight, product of:
                1.7052624 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.013105042 = queryNorm
              1.2256736 = fieldWeight in 4214, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
          0.108287044 = weight(abstract_txt:proximity in 4214) [ClassicSimilarity], result of:
            0.108287044 = score(doc=4214,freq=1.0), product of:
              0.19228993 = queryWeight, product of:
                2.0355802 = boost
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.013105042 = queryNorm
              0.5631446 = fieldWeight in 4214, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.078125 = fieldNorm(doc=4214)
        0.28 = coord(7/25)
    
  3. Boyack, K.W.; Small, H.; Klavans, R.: Improving the accuracy of co-citation clustering using full text (2013) 0.15
    0.14775735 = sum of:
      0.14775735 = product of:
        0.46174175 = sum of:
          0.013688028 = weight(abstract_txt:been in 1036) [ClassicSimilarity], result of:
            0.013688028 = score(doc=1036,freq=1.0), product of:
              0.04843205 = queryWeight, product of:
                1.0215894 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.013105042 = queryNorm
              0.28262335 = fieldWeight in 1036, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
          0.03085459 = weight(abstract_txt:effect in 1036) [ClassicSimilarity], result of:
            0.03085459 = score(doc=1036,freq=1.0), product of:
              0.072736934 = queryWeight, product of:
                1.0222139 = boost
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.013105042 = queryNorm
              0.42419428 = fieldWeight in 1036, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
          0.034116764 = weight(abstract_txt:increase in 1036) [ClassicSimilarity], result of:
            0.034116764 = score(doc=1036,freq=1.0), product of:
              0.07777745 = queryWeight, product of:
                1.0570394 = boost
                5.6146684 = idf(docFreq=437, maxDocs=44218)
                0.013105042 = queryNorm
              0.43864596 = fieldWeight in 1036, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6146684 = idf(docFreq=437, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
          0.031049965 = weight(abstract_txt:only in 1036) [ClassicSimilarity], result of:
            0.031049965 = score(doc=1036,freq=2.0), product of:
              0.06636457 = queryWeight, product of:
                1.1958538 = boost
                4.234672 = idf(docFreq=1740, maxDocs=44218)
                0.013105042 = queryNorm
              0.46786958 = fieldWeight in 1036, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.234672 = idf(docFreq=1740, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
          0.060581464 = weight(abstract_txt:pairs in 1036) [ClassicSimilarity], result of:
            0.060581464 = score(doc=1036,freq=1.0), product of:
              0.11405167 = queryWeight, product of:
                1.2800152 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.013105042 = queryNorm
              0.5311756 = fieldWeight in 1036, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
          0.0597376 = weight(abstract_txt:full in 1036) [ClassicSimilarity], result of:
            0.0597376 = score(doc=1036,freq=3.0), product of:
              0.08968029 = queryWeight, product of:
                1.3901397 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.013105042 = queryNorm
              0.66611737 = fieldWeight in 1036, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
          0.044154666 = weight(abstract_txt:text in 1036) [ClassicSimilarity], result of:
            0.044154666 = score(doc=1036,freq=3.0), product of:
              0.08069175 = queryWeight, product of:
                1.5226282 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013105042 = queryNorm
              0.54720175 = fieldWeight in 1036, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
          0.18755865 = weight(abstract_txt:proximity in 1036) [ClassicSimilarity], result of:
            0.18755865 = score(doc=1036,freq=3.0), product of:
              0.19228993 = queryWeight, product of:
                2.0355802 = boost
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.013105042 = queryNorm
              0.9753951 = fieldWeight in 1036, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.078125 = fieldNorm(doc=1036)
        0.32 = coord(8/25)
    
  4. Bonzi, S.: Representation of concepts in text : a comparison of within-document frequency, anaphora, and synonymy (1991) 0.14
    0.1380142 = sum of:
      0.1380142 = product of:
        1.1501184 = sum of:
          0.03059125 = weight(abstract_txt:text in 4933) [ClassicSimilarity], result of:
            0.03059125 = score(doc=4933,freq=1.0), product of:
              0.08069175 = queryWeight, product of:
                1.5226282 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013105042 = queryNorm
              0.37911248 = fieldWeight in 4933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=4933)
          0.89911854 = weight(title_txt:anaphora in 4933) [ClassicSimilarity], result of:
            0.89911854 = score(doc=4933,freq=1.0), product of:
              0.36309215 = queryWeight, product of:
                2.7971668 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.013105042 = queryNorm
              2.476282 = fieldWeight in 4933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.25 = fieldNorm(doc=4933)
          0.2204085 = weight(abstract_txt:keyword in 4933) [ClassicSimilarity], result of:
            0.2204085 = score(doc=4933,freq=3.0), product of:
              0.22482578 = queryWeight, product of:
                2.8415635 = boost
                6.037405 = idf(docFreq=286, maxDocs=44218)
                0.013105042 = queryNorm
              0.9803524 = fieldWeight in 4933, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.037405 = idf(docFreq=286, maxDocs=44218)
                0.09375 = fieldNorm(doc=4933)
        0.12 = coord(3/25)
    
  5. Wu, D.-S.; Liang, T.: Chinese pronominal anaphora resolution using lexical knowledge and entropy-based weight (2008) 0.12
    0.11562741 = sum of:
      0.11562741 = product of:
        1.4453427 = sum of:
          0.89911854 = weight(title_txt:anaphora in 2367) [ClassicSimilarity], result of:
            0.89911854 = score(doc=2367,freq=1.0), product of:
              0.36309215 = queryWeight, product of:
                2.7971668 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.013105042 = queryNorm
              2.476282 = fieldWeight in 2367, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.25 = fieldNorm(doc=2367)
          0.5462241 = weight(abstract_txt:resolution in 2367) [ClassicSimilarity], result of:
            0.5462241 = score(doc=2367,freq=2.0), product of:
              0.5733182 = queryWeight, product of:
                6.087913 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.013105042 = queryNorm
              0.9527416 = fieldWeight in 2367, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.09375 = fieldNorm(doc=2367)
        0.08 = coord(2/25)