Document (#38099)

Author
Akiva, N.
Koppel, M.
Title
¬A generic unsupervised method for decomposing multi-author documents
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2256-2264
Year
2013
Abstract
Given an unsegmented multi-author text, we wish to automatically separate out distinct authorial threads. We present a novel, entirely unsupervised, method that achieves strong results on multiple testbeds, including those for which authorial threads are topically identical. Unlike previous work, our method requires no specialized linguistic tools and can be easily applied to any text.

Similar documents (author)

  1. Koppel, T.P.: Public access catalogs through Internet (1990) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:koppel in 4070) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 4070, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=4070)
    
  2. Koppel, M.; Schweitzer, N.: Measuring direct and indirect authorial influence in historical corpora (2014) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:koppel in 1506) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 1506, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=1506)
    
  3. Koppel, M.; Winter, Y.: Determining if two documents are written by the same author (2014) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:koppel in 1602) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 1602, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=1602)
    
  4. Koppel, M.; Akiva, N.; Dagan, I.: Feature instability as a criterion for selecting potential style markers (2006) 3.61
    3.606542 = sum of:
      3.606542 = weight(author_txt:koppel in 6092) [ClassicSimilarity], result of:
        3.606542 = fieldWeight in 6092, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.375 = fieldNorm(doc=6092)
    
  5. Koppel, M.; Schler, J.; Argamon, S.: Computational methods in authorship attribution (2009) 3.61
    3.606542 = sum of:
      3.606542 = weight(author_txt:koppel in 2683) [ClassicSimilarity], result of:
        3.606542 = fieldWeight in 2683, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.375 = fieldNorm(doc=2683)
    

Similar documents (content)

  1. Xu, L.; Qiu, J.: Unsupervised multi-class sentiment classification approach (2019) 0.11
    0.10884502 = sum of:
      0.10884502 = product of:
        0.5442251 = sum of:
          0.030044898 = weight(abstract_txt:novel in 5003) [ClassicSimilarity], result of:
            0.030044898 = score(doc=5003,freq=1.0), product of:
              0.086939074 = queryWeight, product of:
                1.0022658 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.015687596 = queryNorm
              0.34558567 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.0625 = fieldNorm(doc=5003)
          0.040681094 = weight(abstract_txt:distinct in 5003) [ClassicSimilarity], result of:
            0.040681094 = score(doc=5003,freq=1.0), product of:
              0.10640522 = queryWeight, product of:
                1.1088094 = boost
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.015687596 = queryNorm
              0.38232234 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1171575 = idf(docFreq=264, maxDocs=44218)
                0.0625 = fieldNorm(doc=5003)
          0.16694137 = weight(abstract_txt:multi in 5003) [ClassicSimilarity], result of:
            0.16694137 = score(doc=5003,freq=5.0), product of:
              0.20095424 = queryWeight, product of:
                2.154959 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.015687596 = queryNorm
              0.8307432 = fieldWeight in 5003, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=5003)
          0.08420503 = weight(abstract_txt:method in 5003) [ClassicSimilarity], result of:
            0.08420503 = score(doc=5003,freq=3.0), product of:
              0.17281969 = queryWeight, product of:
                2.447556 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015687596 = queryNorm
              0.4872421 = fieldWeight in 5003, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=5003)
          0.22235268 = weight(abstract_txt:unsupervised in 5003) [ClassicSimilarity], result of:
            0.22235268 = score(doc=5003,freq=2.0), product of:
              0.3301638 = queryWeight, product of:
                2.7622 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.015687596 = queryNorm
              0.67346174 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=5003)
        0.2 = coord(5/25)
    
  2. Giannella, C.: ¬An improved algorithm for unsupervised decomposition of a multi-author document (2016) 0.10
    0.09742049 = sum of:
      0.09742049 = product of:
        0.48710245 = sum of:
          0.04155173 = weight(abstract_txt:text in 2642) [ClassicSimilarity], result of:
            0.04155173 = score(doc=2642,freq=2.0), product of:
              0.093000926 = queryWeight, product of:
                1.4660002 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015687596 = queryNorm
              0.44678837 = fieldWeight in 2642, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.094924 = weight(abstract_txt:author in 2642) [ClassicSimilarity], result of:
            0.094924 = score(doc=2642,freq=3.0), product of:
              0.14092286 = queryWeight, product of:
                1.8046007 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.015687596 = queryNorm
              0.6735884 = fieldWeight in 2642, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.09332307 = weight(abstract_txt:multi in 2642) [ClassicSimilarity], result of:
            0.09332307 = score(doc=2642,freq=1.0), product of:
              0.20095424 = queryWeight, product of:
                2.154959 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.015687596 = queryNorm
              0.46439958 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.060769748 = weight(abstract_txt:method in 2642) [ClassicSimilarity], result of:
            0.060769748 = score(doc=2642,freq=1.0), product of:
              0.17281969 = queryWeight, product of:
                2.447556 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015687596 = queryNorm
              0.3516367 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
          0.19653387 = weight(abstract_txt:unsupervised in 2642) [ClassicSimilarity], result of:
            0.19653387 = score(doc=2642,freq=1.0), product of:
              0.3301638 = queryWeight, product of:
                2.7622 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.015687596 = queryNorm
              0.5952617 = fieldWeight in 2642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.078125 = fieldNorm(doc=2642)
        0.2 = coord(5/25)
    
  3. Ferreira, A.A.; Veloso, A.; Gonçalves, M.A.; Laender, A.H.F.: Self-training author name disambiguation for information scarce scenarios (2014) 0.08
    0.08332667 = sum of:
      0.08332667 = product of:
        0.41663334 = sum of:
          0.030044898 = weight(abstract_txt:novel in 1292) [ClassicSimilarity], result of:
            0.030044898 = score(doc=1292,freq=1.0), product of:
              0.086939074 = queryWeight, product of:
                1.0022658 = boost
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.015687596 = queryNorm
              0.34558567 = fieldWeight in 1292, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.529371 = idf(docFreq=476, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.034092657 = weight(abstract_txt:requires in 1292) [ClassicSimilarity], result of:
            0.034092657 = score(doc=1292,freq=1.0), product of:
              0.094581984 = queryWeight, product of:
                1.0453931 = boost
                5.767298 = idf(docFreq=375, maxDocs=44218)
                0.015687596 = queryNorm
              0.36045614 = fieldWeight in 1292, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.767298 = idf(docFreq=375, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.09803709 = weight(abstract_txt:author in 1292) [ClassicSimilarity], result of:
            0.09803709 = score(doc=1292,freq=5.0), product of:
              0.14092286 = queryWeight, product of:
                1.8046007 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.015687596 = queryNorm
              0.69567907 = fieldWeight in 1292, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.0972316 = weight(abstract_txt:method in 1292) [ClassicSimilarity], result of:
            0.0972316 = score(doc=1292,freq=4.0), product of:
              0.17281969 = queryWeight, product of:
                2.447556 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015687596 = queryNorm
              0.56261873 = fieldWeight in 1292, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.1572271 = weight(abstract_txt:unsupervised in 1292) [ClassicSimilarity], result of:
            0.1572271 = score(doc=1292,freq=1.0), product of:
              0.3301638 = queryWeight, product of:
                2.7622 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.015687596 = queryNorm
              0.47620937 = fieldWeight in 1292, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
        0.2 = coord(5/25)
    
  4. Ko, Y.; Seo, J.: Text classification from unlabeled documents with bootstrapping and feature projection techniques (2009) 0.08
    0.08194213 = sum of:
      0.08194213 = product of:
        0.40971065 = sum of:
          0.042202383 = weight(abstract_txt:automatically in 2452) [ClassicSimilarity], result of:
            0.042202383 = score(doc=2452,freq=2.0), product of:
              0.086546436 = queryWeight, product of:
                5.5168705 = idf(docFreq=482, maxDocs=44218)
                0.015687596 = queryNorm
              0.48762706 = fieldWeight in 2452, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5168705 = idf(docFreq=482, maxDocs=44218)
                0.0625 = fieldNorm(doc=2452)
          0.039384007 = weight(abstract_txt:easily in 2452) [ClassicSimilarity], result of:
            0.039384007 = score(doc=2452,freq=1.0), product of:
              0.10413126 = queryWeight, product of:
                1.0968972 = boost
                6.0514402 = idf(docFreq=282, maxDocs=44218)
                0.015687596 = queryNorm
              0.37821501 = fieldWeight in 2452, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0514402 = idf(docFreq=282, maxDocs=44218)
                0.0625 = fieldNorm(doc=2452)
          0.062188935 = weight(abstract_txt:text in 2452) [ClassicSimilarity], result of:
            0.062188935 = score(doc=2452,freq=7.0), product of:
              0.093000926 = queryWeight, product of:
                1.4660002 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015687596 = queryNorm
              0.6686916 = fieldWeight in 2452, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2452)
          0.10870823 = weight(abstract_txt:method in 2452) [ClassicSimilarity], result of:
            0.10870823 = score(doc=2452,freq=5.0), product of:
              0.17281969 = queryWeight, product of:
                2.447556 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015687596 = queryNorm
              0.6290269 = fieldWeight in 2452, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=2452)
          0.1572271 = weight(abstract_txt:unsupervised in 2452) [ClassicSimilarity], result of:
            0.1572271 = score(doc=2452,freq=1.0), product of:
              0.3301638 = queryWeight, product of:
                2.7622 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.015687596 = queryNorm
              0.47620937 = fieldWeight in 2452, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=2452)
        0.2 = coord(5/25)
    
  5. Yu, C.; Xue, H.; An, L.; Li, G.: ¬A lightweight semantic-enhanced interactive network for efficient short-text matching (2023) 0.08
    0.08016788 = sum of:
      0.08016788 = product of:
        0.4008394 = sum of:
          0.06312021 = weight(abstract_txt:unlike in 890) [ClassicSimilarity], result of:
            0.06312021 = score(doc=890,freq=1.0), product of:
              0.14260896 = queryWeight, product of:
                1.2836565 = boost
                7.0817666 = idf(docFreq=100, maxDocs=44218)
                0.015687596 = queryNorm
              0.4426104 = fieldWeight in 890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0817666 = idf(docFreq=100, maxDocs=44218)
                0.0625 = fieldNorm(doc=890)
          0.07430052 = weight(abstract_txt:achieves in 890) [ClassicSimilarity], result of:
            0.07430052 = score(doc=890,freq=1.0), product of:
              0.1589873 = queryWeight, product of:
                1.3553661 = boost
                7.4773793 = idf(docFreq=67, maxDocs=44218)
                0.015687596 = queryNorm
              0.4673362 = fieldWeight in 890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4773793 = idf(docFreq=67, maxDocs=44218)
                0.0625 = fieldNorm(doc=890)
          0.05757577 = weight(abstract_txt:text in 890) [ClassicSimilarity], result of:
            0.05757577 = score(doc=890,freq=6.0), product of:
              0.093000926 = queryWeight, product of:
                1.4660002 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015687596 = queryNorm
              0.6190881 = fieldWeight in 890, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=890)
          0.0486158 = weight(abstract_txt:method in 890) [ClassicSimilarity], result of:
            0.0486158 = score(doc=890,freq=1.0), product of:
              0.17281969 = queryWeight, product of:
                2.447556 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.015687596 = queryNorm
              0.28130937 = fieldWeight in 890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=890)
          0.1572271 = weight(abstract_txt:unsupervised in 890) [ClassicSimilarity], result of:
            0.1572271 = score(doc=890,freq=1.0), product of:
              0.3301638 = queryWeight, product of:
                2.7622 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.015687596 = queryNorm
              0.47620937 = fieldWeight in 890, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=890)
        0.2 = coord(5/25)