Document (#34763)

Author
Ferrer-i-Cancho, R.
Gavaldà, R.
Title
¬The frequency spectrum of finite samples from the intermittent silence process
Source
Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.837-843
Year
2009
Abstract
It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so-called intermittent silence process. The same kind of process could reproduce or explain the counts of other kinds of units from a wide range of disciplines. Taking the linguistic metaphor, we focus on the frequency spectrum, i.e., the number of words with a certain frequency, and the vocabulary size, i.e., the number of different words of text generated by an intermittent silence process. We derive and explain how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.

Similar documents (author)

  1. Sapena, A. Ferrer- => Ferrer-Sapena, A.: 5.04
    5.0379567 = sum of:
      5.0379567 = weight(author_txt:ferrer in 5771) [ClassicSimilarity], result of:
        5.0379567 = fieldWeight in 5771, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=5771)
    
  2. Miro, A.B.; Sahun, X.B.; Ferrer, M.E.: ¬La Library of Congress Classification à la Biblioteca de la Universitat Pompeu Fabra (1993) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:ferrer in 7090) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 7090, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=7090)
    
  3. Ferrer Morillo, L.M.; Portillo de Hernández, R.: Tesauros transdisciplinarios : del reduccionismo científico a la unidad del conocimiento (2007) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:ferrer in 1107) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 1107, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=1107)
    
  4. Ferrer-i-Cancho, R.; Vitevitch, M.S.: ¬The origins of Zipf's meaning-frequency law (2018) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:ferrer in 4546) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 4546, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=4546)
    
  5. Cole, T.W.; Mischo, W.H.; Habing, T.G.; Ferrer, R.H.: Using XML and XSLT to process and render online journals (2001) 2.97
    2.9686446 = sum of:
      2.9686446 = weight(author_txt:ferrer in 4802) [ClassicSimilarity], result of:
        2.9686446 = fieldWeight in 4802, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.3125 = fieldNorm(doc=4802)
    

Similar documents (content)

  1. Sun, Q.; Shaw, D.; Davis, C.H.: ¬A model for estimating the occurence of same-frequency words and the boundary between high- and low-frequency words in texts (1999) 0.11
    0.10736057 = sum of:
      0.10736057 = product of:
        0.6710036 = sum of:
          0.037961695 = weight(abstract_txt:text in 3063) [ClassicSimilarity], result of:
            0.037961695 = score(doc=3063,freq=2.0), product of:
              0.07080477 = queryWeight, product of:
                1.1895392 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014719292 = queryNorm
              0.53614604 = fieldWeight in 3063, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=3063)
          0.02865007 = weight(abstract_txt:number in 3063) [ClassicSimilarity], result of:
            0.02865007 = score(doc=3063,freq=1.0), product of:
              0.07394791 = queryWeight, product of:
                1.2156553 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.014719292 = queryNorm
              0.38743585 = fieldWeight in 3063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.09375 = fieldNorm(doc=3063)
          0.1525138 = weight(abstract_txt:words in 3063) [ClassicSimilarity], result of:
            0.1525138 = score(doc=3063,freq=6.0), product of:
              0.124069326 = queryWeight, product of:
                1.5746343 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014719292 = queryNorm
              1.2292627 = fieldWeight in 3063, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=3063)
          0.451878 = weight(abstract_txt:frequency in 3063) [ClassicSimilarity], result of:
            0.451878 = score(doc=3063,freq=7.0), product of:
              0.30631402 = queryWeight, product of:
                3.499016 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.014719292 = queryNorm
              1.4752116 = fieldWeight in 3063, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.09375 = fieldNorm(doc=3063)
        0.16 = coord(4/25)
    
  2. Ferrer-i-Cancho, R.; Vitevitch, M.S.: ¬The origins of Zipf's meaning-frequency law (2018) 0.09
    0.09449489 = sum of:
      0.09449489 = product of:
        0.47247443 = sum of:
          0.07383625 = weight(abstract_txt:frequencies in 4546) [ClassicSimilarity], result of:
            0.07383625 = score(doc=4546,freq=1.0), product of:
              0.12458508 = queryWeight, product of:
                1.1157465 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.014719292 = queryNorm
              0.59265727 = fieldWeight in 4546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.03376443 = weight(abstract_txt:number in 4546) [ClassicSimilarity], result of:
            0.03376443 = score(doc=4546,freq=2.0), product of:
              0.07394791 = queryWeight, product of:
                1.2156553 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.014719292 = queryNorm
              0.4565975 = fieldWeight in 4546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.073378235 = weight(abstract_txt:words in 4546) [ClassicSimilarity], result of:
            0.073378235 = score(doc=4546,freq=2.0), product of:
              0.124069326 = queryWeight, product of:
                1.5746343 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014719292 = queryNorm
              0.5914293 = fieldWeight in 4546, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.044975843 = weight(abstract_txt:process in 4546) [ClassicSimilarity], result of:
            0.044975843 = score(doc=4546,freq=1.0), product of:
              0.14211038 = queryWeight, product of:
                2.3832817 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.014719292 = queryNorm
              0.3164853 = fieldWeight in 4546, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
          0.24651965 = weight(abstract_txt:frequency in 4546) [ClassicSimilarity], result of:
            0.24651965 = score(doc=4546,freq=3.0), product of:
              0.30631402 = queryWeight, product of:
                3.499016 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.014719292 = queryNorm
              0.8047939 = fieldWeight in 4546, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.078125 = fieldNorm(doc=4546)
        0.2 = coord(5/25)
    
  3. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.08
    0.0844897 = sum of:
      0.0844897 = product of:
        0.5280607 = sum of:
          0.07383625 = weight(abstract_txt:frequencies in 3463) [ClassicSimilarity], result of:
            0.07383625 = score(doc=3463,freq=1.0), product of:
              0.12458508 = queryWeight, product of:
                1.1157465 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.014719292 = queryNorm
              0.59265727 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.10377249 = weight(abstract_txt:words in 3463) [ClassicSimilarity], result of:
            0.10377249 = score(doc=3463,freq=4.0), product of:
              0.124069326 = queryWeight, product of:
                1.5746343 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014719292 = queryNorm
              0.8364073 = fieldWeight in 3463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.10393228 = weight(abstract_txt:size in 3463) [ClassicSimilarity], result of:
            0.10393228 = score(doc=3463,freq=1.0), product of:
              0.2256803 = queryWeight, product of:
                2.6009977 = boost
                5.8947687 = idf(docFreq=330, maxDocs=44218)
                0.014719292 = queryNorm
              0.4605288 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8947687 = idf(docFreq=330, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.24651965 = weight(abstract_txt:frequency in 3463) [ClassicSimilarity], result of:
            0.24651965 = score(doc=3463,freq=3.0), product of:
              0.30631402 = queryWeight, product of:
                3.499016 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.014719292 = queryNorm
              0.8047939 = fieldWeight in 3463, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
        0.16 = coord(4/25)
    
  4. Colina, J.: ¬Un algoritmo informetrico para la evaluacion de un vocabulario de busqueda (1995) 0.08
    0.08359514 = sum of:
      0.08359514 = product of:
        1.0449393 = sum of:
          0.10408024 = weight(abstract_txt:vocabulary in 6755) [ClassicSimilarity], result of:
            0.10408024 = score(doc=6755,freq=1.0), product of:
              0.124314494 = queryWeight, product of:
                1.5761893 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.014719292 = queryNorm
              0.8372333 = fieldWeight in 6755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.15625 = fieldNorm(doc=6755)
          0.9408591 = weight(abstract_txt:silence in 6755) [ClassicSimilarity], result of:
            0.9408591 = score(doc=6755,freq=1.0), product of:
              0.61752766 = queryWeight, product of:
                4.3025045 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.014719292 = queryNorm
              1.5235902 = fieldWeight in 6755, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.15625 = fieldNorm(doc=6755)
        0.08 = coord(2/25)
    
  5. Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.08
    0.07811608 = sum of:
      0.07811608 = product of:
        0.32548368 = sum of:
          0.042526834 = weight(abstract_txt:spaces in 3913) [ClassicSimilarity], result of:
            0.042526834 = score(doc=3913,freq=1.0), product of:
              0.10007716 = queryWeight, product of:
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.014719292 = queryNorm
              0.42494047 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.030995594 = weight(abstract_txt:text in 3913) [ClassicSimilarity], result of:
            0.030995594 = score(doc=3913,freq=3.0), product of:
              0.07080477 = queryWeight, product of:
                1.1895392 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014719292 = queryNorm
              0.4377614 = fieldWeight in 3913, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.019100046 = weight(abstract_txt:number in 3913) [ClassicSimilarity], result of:
            0.019100046 = score(doc=3913,freq=1.0), product of:
              0.07394791 = queryWeight, product of:
                1.2156553 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.014719292 = queryNorm
              0.25829056 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.08301799 = weight(abstract_txt:words in 3913) [ClassicSimilarity], result of:
            0.08301799 = score(doc=3913,freq=4.0), product of:
              0.124069326 = queryWeight, product of:
                1.5746343 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.014719292 = queryNorm
              0.66912585 = fieldWeight in 3913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.035980675 = weight(abstract_txt:process in 3913) [ClassicSimilarity], result of:
            0.035980675 = score(doc=3913,freq=1.0), product of:
              0.14211038 = queryWeight, product of:
                2.3832817 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.014719292 = queryNorm
              0.25318822 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.11386255 = weight(abstract_txt:frequency in 3913) [ClassicSimilarity], result of:
            0.11386255 = score(doc=3913,freq=1.0), product of:
              0.30631402 = queryWeight, product of:
                3.499016 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.014719292 = queryNorm
              0.37171838 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
        0.24 = coord(6/25)