Document (#33999)

Author
Leroy, G.
Miller, T.
Rosemblat, G.
Browne, A.
Title
¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1409-1419
Year
2008
Abstract
Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.
Theme
Automatisches Klassifizieren
Field
Medizin

Similar documents (author)

  1. Rosemblat, G.; Graham, L.: Cross-language search in a monolingual health information system : flexible designs and lexical processes (2006) 0.72
    0.7232668 = sum of:
      0.7232668 = product of:
        2.8930671 = sum of:
          2.8930671 = weight(author_txt:rosemblat in 241) [ClassicSimilarity], result of:
            2.8930671 = score(doc=241,freq=1.0), product of:
              0.58415544 = queryWeight, product of:
                1.388724 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.04246708 = queryNorm
              4.952564 = fieldWeight in 241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.5 = fieldNorm(doc=241)
        0.25 = coord(1/4)
    
  2. Browne, G.: Scope notes for LISA subject headings (1992) 0.71
    0.708463 = sum of:
      0.708463 = product of:
        2.833852 = sum of:
          2.833852 = weight(author_txt:browne in 1430) [ClassicSimilarity], result of:
            2.833852 = score(doc=1430,freq=1.0), product of:
              0.49651706 = queryWeight, product of:
                1.2803209 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.04246708 = queryNorm
              5.7074614 = fieldWeight in 1430, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.625 = fieldNorm(doc=1430)
        0.25 = coord(1/4)
    
  3. Browne, G.: Professional liability of indexers (1996) 0.71
    0.708463 = sum of:
      0.708463 = product of:
        2.833852 = sum of:
          2.833852 = weight(author_txt:browne in 3643) [ClassicSimilarity], result of:
            2.833852 = score(doc=3643,freq=1.0), product of:
              0.49651706 = queryWeight, product of:
                1.2803209 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.04246708 = queryNorm
              5.7074614 = fieldWeight in 3643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.625 = fieldNorm(doc=3643)
        0.25 = coord(1/4)
    
  4. Browne, G.: ¬The definite article : acknowledging The in index entries (2001) 0.71
    0.708463 = sum of:
      0.708463 = product of:
        2.833852 = sum of:
          2.833852 = weight(author_txt:browne in 12) [ClassicSimilarity], result of:
            2.833852 = score(doc=12,freq=1.0), product of:
              0.49651706 = queryWeight, product of:
                1.2803209 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.04246708 = queryNorm
              5.7074614 = fieldWeight in 12, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.625 = fieldNorm(doc=12)
        0.25 = coord(1/4)
    
  5. Browne, G.: Changes in website indexing (2007) 0.71
    0.708463 = sum of:
      0.708463 = product of:
        2.833852 = sum of:
          2.833852 = weight(author_txt:browne in 747) [ClassicSimilarity], result of:
            2.833852 = score(doc=747,freq=1.0), product of:
              0.49651706 = queryWeight, product of:
                1.2803209 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.04246708 = queryNorm
              5.7074614 = fieldWeight in 747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.625 = fieldNorm(doc=747)
        0.25 = coord(1/4)
    

Similar documents (content)

  1. Collins-Thompson, K.; Callan, J.: Predicting reading difficulty with statistical language models (2005) 0.28
    0.27906615 = sum of:
      0.27906615 = product of:
        0.87208176 = sum of:
          0.02361966 = weight(abstract_txt:text in 4579) [ClassicSimilarity], result of:
            0.02361966 = score(doc=4579,freq=3.0), product of:
              0.053955555 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012910258 = queryNorm
              0.4377614 = fieldWeight in 4579, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
          0.016311215 = weight(abstract_txt:document in 4579) [ClassicSimilarity], result of:
            0.016311215 = score(doc=4579,freq=1.0), product of:
              0.060797404 = queryWeight, product of:
                1.0970562 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012910258 = queryNorm
              0.26828802 = fieldWeight in 4579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
          0.0058520497 = weight(abstract_txt:information in 4579) [ClassicSimilarity], result of:
            0.0058520497 = score(doc=4579,freq=1.0), product of:
              0.038676113 = queryWeight, product of:
                1.2374359 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012910258 = queryNorm
              0.15130915 = fieldWeight in 4579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
          0.041478954 = weight(abstract_txt:levels in 4579) [ClassicSimilarity], result of:
            0.041478954 = score(doc=4579,freq=2.0), product of:
              0.089902 = queryWeight, product of:
                1.3340466 = boost
                5.219915 = idf(docFreq=649, maxDocs=44218)
                0.012910258 = queryNorm
              0.46137965 = fieldWeight in 4579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.219915 = idf(docFreq=649, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
          0.13976704 = weight(abstract_txt:grade in 4579) [ClassicSimilarity], result of:
            0.13976704 = score(doc=4579,freq=2.0), product of:
              0.20206325 = queryWeight, product of:
                2.0 = boost
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.012910258 = queryNorm
              0.69169945 = fieldWeight in 4579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
          0.077053055 = weight(abstract_txt:pages in 4579) [ClassicSimilarity], result of:
            0.077053055 = score(doc=4579,freq=2.0), product of:
              0.15551579 = queryWeight, product of:
                2.1489148 = boost
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.012910258 = queryNorm
              0.49546772 = fieldWeight in 4579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
          0.1566842 = weight(abstract_txt:classifier in 4579) [ClassicSimilarity], result of:
            0.1566842 = score(doc=4579,freq=1.0), product of:
              0.346142 = queryWeight, product of:
                3.7019322 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.012910258 = queryNorm
              0.45265874 = fieldWeight in 4579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
          0.41131553 = weight(abstract_txt:readability in 4579) [ClassicSimilarity], result of:
            0.41131553 = score(doc=4579,freq=2.0), product of:
              0.56318045 = queryWeight, product of:
                5.279346 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012910258 = queryNorm
              0.7303441 = fieldWeight in 4579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0625 = fieldNorm(doc=4579)
        0.32 = coord(8/25)
    
  2. Denning, J.; Pera, M.S.; Ng, Y.-K.: ¬A readability level prediction tool for K-12 books (2016) 0.24
    0.24223459 = sum of:
      0.24223459 = product of:
        1.2111729 = sum of:
          0.013636817 = weight(abstract_txt:text in 2772) [ClassicSimilarity], result of:
            0.013636817 = score(doc=2772,freq=1.0), product of:
              0.053955555 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012910258 = queryNorm
              0.25274166 = fieldWeight in 2772, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2772)
          0.041478954 = weight(abstract_txt:levels in 2772) [ClassicSimilarity], result of:
            0.041478954 = score(doc=2772,freq=2.0), product of:
              0.089902 = queryWeight, product of:
                1.3340466 = boost
                5.219915 = idf(docFreq=649, maxDocs=44218)
                0.012910258 = queryNorm
              0.46137965 = fieldWeight in 2772, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.219915 = idf(docFreq=649, maxDocs=44218)
                0.0625 = fieldNorm(doc=2772)
          0.13976704 = weight(abstract_txt:grade in 2772) [ClassicSimilarity], result of:
            0.13976704 = score(doc=2772,freq=2.0), product of:
              0.20206325 = queryWeight, product of:
                2.0 = boost
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.012910258 = queryNorm
              0.69169945 = fieldWeight in 2772, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.0625 = fieldNorm(doc=2772)
          0.24678929 = weight(abstract_txt:formulas in 2772) [ClassicSimilarity], result of:
            0.24678929 = score(doc=2772,freq=2.0), product of:
              0.33790824 = queryWeight, product of:
                3.1676073 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012910258 = queryNorm
              0.7303441 = fieldWeight in 2772, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0625 = fieldNorm(doc=2772)
          0.76950085 = weight(abstract_txt:readability in 2772) [ClassicSimilarity], result of:
            0.76950085 = score(doc=2772,freq=7.0), product of:
              0.56318045 = queryWeight, product of:
                5.279346 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012910258 = queryNorm
              1.3663486 = fieldWeight in 2772, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0625 = fieldNorm(doc=2772)
        0.2 = coord(5/25)
    
  3. Mengle, S.S.R.; Goharian, N.: Ambiguity measure feature-selection algorithm (2009) 0.20
    0.19759764 = sum of:
      0.19759764 = product of:
        0.8233235 = sum of:
          0.033403244 = weight(abstract_txt:text in 2804) [ClassicSimilarity], result of:
            0.033403244 = score(doc=2804,freq=6.0), product of:
              0.053955555 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012910258 = queryNorm
              0.6190881 = fieldWeight in 2804, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2804)
          0.023067541 = weight(abstract_txt:document in 2804) [ClassicSimilarity], result of:
            0.023067541 = score(doc=2804,freq=2.0), product of:
              0.060797404 = queryWeight, product of:
                1.0970562 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012910258 = queryNorm
              0.37941656 = fieldWeight in 2804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2804)
          0.1685422 = weight(abstract_txt:naïve in 2804) [ClassicSimilarity], result of:
            0.1685422 = score(doc=2804,freq=2.0), product of:
              0.22892328 = queryWeight, product of:
                2.1287825 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.012910258 = queryNorm
              0.73623884 = fieldWeight in 2804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0625 = fieldNorm(doc=2804)
          0.18029258 = weight(abstract_txt:bayes in 2804) [ClassicSimilarity], result of:
            0.18029258 = score(doc=2804,freq=2.0), product of:
              0.23944332 = queryWeight, product of:
                2.1771467 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.012910258 = queryNorm
              0.75296557 = fieldWeight in 2804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=2804)
          0.06766139 = weight(abstract_txt:difficult in 2804) [ClassicSimilarity], result of:
            0.06766139 = score(doc=2804,freq=1.0), product of:
              0.1977568 = queryWeight, product of:
                2.7981246 = boost
                5.474311 = idf(docFreq=503, maxDocs=44218)
                0.012910258 = queryNorm
              0.34214443 = fieldWeight in 2804, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.474311 = idf(docFreq=503, maxDocs=44218)
                0.0625 = fieldNorm(doc=2804)
          0.35035652 = weight(abstract_txt:classifier in 2804) [ClassicSimilarity], result of:
            0.35035652 = score(doc=2804,freq=5.0), product of:
              0.346142 = queryWeight, product of:
                3.7019322 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.012910258 = queryNorm
              1.0121757 = fieldWeight in 2804, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0625 = fieldNorm(doc=2804)
        0.24 = coord(6/25)
    
  4. Azpiazu, I.M.; Soledad Pera, M.: Is cross-lingual readability assessment possible? (2020) 0.11
    0.11447615 = sum of:
      0.11447615 = product of:
        0.715476 = sum of:
          0.007241542 = weight(abstract_txt:information in 5868) [ClassicSimilarity], result of:
            0.007241542 = score(doc=5868,freq=2.0), product of:
              0.038676113 = queryWeight, product of:
                1.2374359 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012910258 = queryNorm
              0.18723552 = fieldWeight in 5868, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5868)
          0.025663793 = weight(abstract_txt:levels in 5868) [ClassicSimilarity], result of:
            0.025663793 = score(doc=5868,freq=1.0), product of:
              0.089902 = queryWeight, product of:
                1.3340466 = boost
                5.219915 = idf(docFreq=649, maxDocs=44218)
                0.012910258 = queryNorm
              0.2854641 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.219915 = idf(docFreq=649, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5868)
          0.05920371 = weight(abstract_txt:difficult in 5868) [ClassicSimilarity], result of:
            0.05920371 = score(doc=5868,freq=1.0), product of:
              0.1977568 = queryWeight, product of:
                2.7981246 = boost
                5.474311 = idf(docFreq=503, maxDocs=44218)
                0.012910258 = queryNorm
              0.29937637 = fieldWeight in 5868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.474311 = idf(docFreq=503, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5868)
          0.62336695 = weight(abstract_txt:readability in 5868) [ClassicSimilarity], result of:
            0.62336695 = score(doc=5868,freq=6.0), product of:
              0.56318045 = queryWeight, product of:
                5.279346 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012910258 = queryNorm
              1.106869 = fieldWeight in 5868, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5868)
        0.16 = coord(4/25)
    
  5. Lantz, C.: Evaluating the readability of instructional visuals (1996) 0.10
    0.1019728 = sum of:
      0.1019728 = product of:
        0.8497734 = sum of:
          0.029524574 = weight(abstract_txt:text in 7480) [ClassicSimilarity], result of:
            0.029524574 = score(doc=7480,freq=3.0), product of:
              0.053955555 = queryWeight, product of:
                1.0334855 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.012910258 = queryNorm
              0.54720175 = fieldWeight in 7480, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=7480)
          0.007315062 = weight(abstract_txt:information in 7480) [ClassicSimilarity], result of:
            0.007315062 = score(doc=7480,freq=1.0), product of:
              0.038676113 = queryWeight, product of:
                1.2374359 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012910258 = queryNorm
              0.18913643 = fieldWeight in 7480, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=7480)
          0.81293374 = weight(abstract_txt:readability in 7480) [ClassicSimilarity], result of:
            0.81293374 = score(doc=7480,freq=5.0), product of:
              0.56318045 = queryWeight, product of:
                5.279346 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.012910258 = queryNorm
              1.4434694 = fieldWeight in 7480, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.078125 = fieldNorm(doc=7480)
        0.12 = coord(3/25)