Document (#21064)

Author
Sun, Q.
Shaw, D.
Davis, C.H.
Title
¬A model for estimating the occurence of same-frequency words and the boundary between high- and low-frequency words in texts
Source
Journal of the American Society for Information Science. 50(1999) no.3, S.280-286
Year
1999
Abstract
A simpler model is proposed for estimating the frequency of any same-frequency words and identifying the boundary point between high-frequency words and low-frequency words in a text. The model, based on a 'maximum-ranking method', assigns ranks to the words and estimates word frequency by a formula. The boundary value between high-frequency and low-frequency words is obtained by taking the square root of the number of different words in the text. This straightforward model was used successfully with both English and Chinese texts
Theme
Informetrie

Similar documents (author)

  1. Davis, C.H.; Shaw, D.: Comparison of retrieval system interfaces using an objective measure of screen design effectiveness (1989) 5.55
    5.552513 = sum of:
      5.552513 = sum of:
        2.5846593 = weight(author_txt:davis in 3325) [ClassicSimilarity], result of:
          2.5846593 = score(doc=3325,freq=1.0), product of:
            0.67383105 = queryWeight, product of:
              7.6715355 = idf(docFreq=55, maxDocs=44218)
              0.08783522 = queryNorm
            3.8357677 = fieldWeight in 3325, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.6715355 = idf(docFreq=55, maxDocs=44218)
              0.5 = fieldNorm(doc=3325)
        2.9678535 = weight(author_txt:shaw in 3325) [ClassicSimilarity], result of:
          2.9678535 = score(doc=3325,freq=1.0), product of:
            0.73888546 = queryWeight, product of:
              1.04716 = boost
              8.033325 = idf(docFreq=38, maxDocs=44218)
              0.08783522 = queryNorm
            4.0166626 = fieldWeight in 3325, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.033325 = idf(docFreq=38, maxDocs=44218)
              0.5 = fieldNorm(doc=3325)
    
  2. Shaw, R.R.: Classification systems (1962/63) 1.85
    1.8549085 = sum of:
      1.8549085 = product of:
        3.709817 = sum of:
          3.709817 = weight(author_txt:shaw in 603) [ClassicSimilarity], result of:
            3.709817 = score(doc=603,freq=1.0), product of:
              0.73888546 = queryWeight, product of:
                1.04716 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.08783522 = queryNorm
              5.0208282 = fieldWeight in 603, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.625 = fieldNorm(doc=603)
        0.5 = coord(1/2)
    
  3. Shaw, W.M.: Subject and citation indexing : pt.1: the clustering structure of composite representations in the cystic fibrosis document collection (1991) 1.85
    1.8549085 = sum of:
      1.8549085 = product of:
        3.709817 = sum of:
          3.709817 = weight(author_txt:shaw in 4841) [ClassicSimilarity], result of:
            3.709817 = score(doc=4841,freq=1.0), product of:
              0.73888546 = queryWeight, product of:
                1.04716 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.08783522 = queryNorm
              5.0208282 = fieldWeight in 4841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.625 = fieldNorm(doc=4841)
        0.5 = coord(1/2)
    
  4. Shaw, W.M.: Subject and citation indexing : pt.2: the optimal, cluster-based retrieval performance of composite representations (1991) 1.85
    1.8549085 = sum of:
      1.8549085 = product of:
        3.709817 = sum of:
          3.709817 = weight(author_txt:shaw in 4842) [ClassicSimilarity], result of:
            3.709817 = score(doc=4842,freq=1.0), product of:
              0.73888546 = queryWeight, product of:
                1.04716 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.08783522 = queryNorm
              5.0208282 = fieldWeight in 4842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.625 = fieldNorm(doc=4842)
        0.5 = coord(1/2)
    
  5. Shaw, S.: ¬The Internet as an entertainment system (1994) 1.85
    1.8549085 = sum of:
      1.8549085 = product of:
        3.709817 = sum of:
          3.709817 = weight(author_txt:shaw in 198) [ClassicSimilarity], result of:
            3.709817 = score(doc=198,freq=1.0), product of:
              0.73888546 = queryWeight, product of:
                1.04716 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.08783522 = queryNorm
              5.0208282 = fieldWeight in 198, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.625 = fieldNorm(doc=198)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.26
    0.2649187 = sum of:
      0.2649187 = product of:
        1.103828 = sum of:
          0.052175187 = weight(abstract_txt:chinese in 5206) [ClassicSimilarity], result of:
            0.052175187 = score(doc=5206,freq=3.0), product of:
              0.0764642 = queryWeight, product of:
                1.1175787 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.010854624 = queryNorm
              0.6823479 = fieldWeight in 5206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.053206112 = weight(abstract_txt:formula in 5206) [ClassicSimilarity], result of:
            0.053206112 = score(doc=5206,freq=1.0), product of:
              0.1117284 = queryWeight, product of:
                1.3509238 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.010854624 = queryNorm
              0.47620937 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.015908463 = weight(abstract_txt:text in 5206) [ClassicSimilarity], result of:
            0.015908463 = score(doc=5206,freq=1.0), product of:
              0.06294357 = queryWeight, product of:
                1.433969 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.010854624 = queryNorm
              0.25274166 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.03047587 = weight(abstract_txt:model in 5206) [ClassicSimilarity], result of:
            0.03047587 = score(doc=5206,freq=1.0), product of:
              0.12232446 = queryWeight, product of:
                2.8270645 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.010854624 = queryNorm
              0.24913962 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.44280487 = weight(abstract_txt:words in 5206) [ClassicSimilarity], result of:
            0.44280487 = score(doc=5206,freq=9.0), product of:
              0.44117746 = queryWeight, product of:
                7.592778 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.010854624 = queryNorm
              1.0036888 = fieldWeight in 5206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.5092574 = weight(abstract_txt:frequency in 5206) [ClassicSimilarity], result of:
            0.5092574 = score(doc=5206,freq=5.0), product of:
              0.61268646 = queryWeight, product of:
                9.490508 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.010854624 = queryNorm
              0.83118755 = fieldWeight in 5206, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
        0.24 = coord(6/25)
    
  2. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.22
    0.2223955 = sum of:
      0.2223955 = product of:
        0.9266479 = sum of:
          0.04999307 = weight(abstract_txt:straightforward in 5045) [ClassicSimilarity], result of:
            0.04999307 = score(doc=5045,freq=1.0), product of:
              0.10718379 = queryWeight, product of:
                1.3231639 = boost
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.010854624 = queryNorm
              0.4664238 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.07923889 = weight(abstract_txt:assigns in 5045) [ClassicSimilarity], result of:
            0.07923889 = score(doc=5045,freq=1.0), product of:
              0.14570735 = queryWeight, product of:
                1.5427296 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.010854624 = queryNorm
              0.54382217 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.061500516 = weight(abstract_txt:texts in 5045) [ClassicSimilarity], result of:
            0.061500516 = score(doc=5045,freq=2.0), product of:
              0.12305759 = queryWeight, product of:
                2.005018 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.010854624 = queryNorm
              0.4997702 = fieldWeight in 5045, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.041411247 = weight(abstract_txt:high in 5045) [ClassicSimilarity], result of:
            0.041411247 = score(doc=5045,freq=1.0), product of:
              0.13634583 = queryWeight, product of:
                2.584822 = boost
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.010854624 = queryNorm
              0.30372214 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.46675733 = weight(abstract_txt:words in 5045) [ClassicSimilarity], result of:
            0.46675733 = score(doc=5045,freq=10.0), product of:
              0.44117746 = queryWeight, product of:
                7.592778 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.010854624 = queryNorm
              1.0579809 = fieldWeight in 5045, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
          0.22774681 = weight(abstract_txt:frequency in 5045) [ClassicSimilarity], result of:
            0.22774681 = score(doc=5045,freq=1.0), product of:
              0.61268646 = queryWeight, product of:
                9.490508 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.010854624 = queryNorm
              0.37171838 = fieldWeight in 5045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=5045)
        0.24 = coord(6/25)
    
  3. Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.21
    0.20693189 = sum of:
      0.20693189 = product of:
        0.86221623 = sum of:
          0.060246713 = weight(abstract_txt:chinese in 609) [ClassicSimilarity], result of:
            0.060246713 = score(doc=609,freq=4.0), product of:
              0.0764642 = queryWeight, product of:
                1.1175787 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.010854624 = queryNorm
              0.7879075 = fieldWeight in 609, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=609)
          0.053206112 = weight(abstract_txt:formula in 609) [ClassicSimilarity], result of:
            0.053206112 = score(doc=609,freq=1.0), product of:
              0.1117284 = queryWeight, product of:
                1.3509238 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.010854624 = queryNorm
              0.47620937 = fieldWeight in 609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=609)
          0.025965197 = weight(abstract_txt:between in 609) [ClassicSimilarity], result of:
            0.025965197 = score(doc=609,freq=3.0), product of:
              0.06925487 = queryWeight, product of:
                1.8421918 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.010854624 = queryNorm
              0.37492234 = fieldWeight in 609, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.0625 = fieldNorm(doc=609)
          0.05856435 = weight(abstract_txt:high in 609) [ClassicSimilarity], result of:
            0.05856435 = score(doc=609,freq=2.0), product of:
              0.13634583 = queryWeight, product of:
                2.584822 = boost
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.010854624 = queryNorm
              0.42952797 = fieldWeight in 609, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8595543 = idf(docFreq=931, maxDocs=44218)
                0.0625 = fieldNorm(doc=609)
          0.20874022 = weight(abstract_txt:words in 609) [ClassicSimilarity], result of:
            0.20874022 = score(doc=609,freq=2.0), product of:
              0.44117746 = queryWeight, product of:
                7.592778 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.010854624 = queryNorm
              0.47314343 = fieldWeight in 609, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=609)
          0.45549363 = weight(abstract_txt:frequency in 609) [ClassicSimilarity], result of:
            0.45549363 = score(doc=609,freq=4.0), product of:
              0.61268646 = queryWeight, product of:
                9.490508 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.010854624 = queryNorm
              0.74343675 = fieldWeight in 609, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=609)
        0.24 = coord(6/25)
    
  4. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.20
    0.20202847 = sum of:
      0.20202847 = product of:
        1.0101423 = sum of:
          0.07876618 = weight(abstract_txt:ranks in 2417) [ClassicSimilarity], result of:
            0.07876618 = score(doc=2417,freq=1.0), product of:
              0.11075286 = queryWeight, product of:
                1.3450131 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.010854624 = queryNorm
              0.71118873 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.07980917 = weight(abstract_txt:formula in 2417) [ClassicSimilarity], result of:
            0.07980917 = score(doc=2417,freq=1.0), product of:
              0.1117284 = queryWeight, product of:
                1.3509238 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.010854624 = queryNorm
              0.71431404 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.03846093 = weight(abstract_txt:same in 2417) [ClassicSimilarity], result of:
            0.03846093 = score(doc=2417,freq=1.0), product of:
              0.08652699 = queryWeight, product of:
                1.6812793 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.010854624 = queryNorm
              0.44449633 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.22140244 = weight(abstract_txt:words in 2417) [ClassicSimilarity], result of:
            0.22140244 = score(doc=2417,freq=1.0), product of:
              0.44117746 = queryWeight, product of:
                7.592778 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.010854624 = queryNorm
              0.5018444 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.5917036 = weight(abstract_txt:frequency in 2417) [ClassicSimilarity], result of:
            0.5917036 = score(doc=2417,freq=3.0), product of:
              0.61268646 = queryWeight, product of:
                9.490508 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.010854624 = queryNorm
              0.9657527 = fieldWeight in 2417, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
        0.2 = coord(5/25)
    
  5. Ferrer-i-Cancho, R.; Vitevitch, M.S.: ¬The origins of Zipf's meaning-frequency law (2018) 0.19
    0.18933578 = sum of:
      0.18933578 = product of:
        0.9466789 = sum of:
          0.07581042 = weight(abstract_txt:root in 4546) [ClassicSimilarity], result of:
            0.07581042 = score(doc=4546,freq=1.0), product of:
              0.12191818 = queryWeight, product of:
                1.4111829 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.010854624 = queryNorm
              0.6218139 = fieldWeight in 4546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.090356275 = weight(abstract_txt:square in 4546) [ClassicSimilarity], result of:
            0.090356275 = score(doc=4546,freq=1.0), product of:
              0.13705286 = queryWeight, product of:
                1.496212 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.010854624 = queryNorm
              0.6592805 = fieldWeight in 4546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.026500618 = weight(abstract_txt:between in 4546) [ClassicSimilarity], result of:
            0.026500618 = score(doc=4546,freq=2.0), product of:
              0.06925487 = queryWeight, product of:
                1.8421918 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.010854624 = queryNorm
              0.3826535 = fieldWeight in 4546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.26092526 = weight(abstract_txt:words in 4546) [ClassicSimilarity], result of:
            0.26092526 = score(doc=4546,freq=2.0), product of:
              0.44117746 = queryWeight, product of:
                7.592778 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.010854624 = queryNorm
              0.5914293 = fieldWeight in 4546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.4930863 = weight(abstract_txt:frequency in 4546) [ClassicSimilarity], result of:
            0.4930863 = score(doc=4546,freq=3.0), product of:
              0.61268646 = queryWeight, product of:
                9.490508 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.010854624 = queryNorm
              0.8047939 = fieldWeight in 4546, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
        0.2 = coord(5/25)