Document (#31854)

Kok, Y.H.
Holaday, D.A.
Goh, A.
Holaday, D.A.
Using cluster analysis to determine the media agenda
Aslib proceedings. 51(1999) no.10, S.361-371
This paper describes a software tool that aids researchers in the study of agenda setting. Agenda setting theory claims that the mass media influences what the public thinks and talks about. The tool is used to cluster documents into topically coherent groupings that are to represent issues dominating press coverage. The documents are taken from the archives of online newspapers. In addition, the tool enables results to be visualised and displayed. Three methods were investigated for the purpose of clustering, of which the Group-Average-Linkage algorithm was chosen for the final testing. The choice of the clustering algorithm was predominantly made upon the quality of clusters produced. Comparisons between the computer-based results and a method involving human readers revealed comparable findings and potential usefulness of the software.

Similar documents (content)

  1. Kishida, K.: High-speed rough clustering for very large document collections (2010) 0.16
    0.16462953 = sum of:
      0.16462953 = product of:
        0.5879626 = sum of:
          0.021551097 = weight(abstract_txt:results in 3463) [ClassicSimilarity], result of:
            0.021551097 = score(doc=3463,freq=2.0), product of:
              0.0700153 = queryWeight, product of:
                1.0163649 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01978162 = queryNorm
              0.30780554 = fieldWeight in 3463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.010182931 = weight(abstract_txt:that in 3463) [ClassicSimilarity], result of:
            0.010182931 = score(doc=3463,freq=2.0), product of:
              0.048621196 = queryWeight, product of:
                1.0373174 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01978162 = queryNorm
              0.20943399 = fieldWeight in 3463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.04374962 = weight(abstract_txt:documents in 3463) [ClassicSimilarity], result of:
            0.04374962 = score(doc=3463,freq=3.0), product of:
              0.098061636 = queryWeight, product of:
                1.2028259 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01978162 = queryNorm
              0.44614407 = fieldWeight in 3463, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.11607382 = weight(abstract_txt:algorithm in 3463) [ClassicSimilarity], result of:
            0.11607382 = score(doc=3463,freq=3.0), product of:
              0.18793419 = queryWeight, product of:
                1.6651605 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01978162 = queryNorm
              0.6176301 = fieldWeight in 3463, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.22932123 = weight(abstract_txt:clustering in 3463) [ClassicSimilarity], result of:
            0.22932123 = score(doc=3463,freq=7.0), product of:
              0.22309346 = queryWeight, product of:
                1.8142478 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.01978162 = queryNorm
              1.0279155 = fieldWeight in 3463, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.10137087 = weight(abstract_txt:cluster in 3463) [ClassicSimilarity], result of:
            0.10137087 = score(doc=3463,freq=1.0), product of:
              0.24764647 = queryWeight, product of:
                1.9114779 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.01978162 = queryNorm
              0.40933704 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
          0.06571303 = weight(abstract_txt:tool in 3463) [ClassicSimilarity], result of:
            0.06571303 = score(doc=3463,freq=1.0), product of:
              0.21233493 = queryWeight, product of:
                2.1677518 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.01978162 = queryNorm
              0.3094782 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.0625 = fieldNorm(doc=3463)
        0.28 = coord(7/25)
  2. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.14
    0.13909304 = sum of:
      0.13909304 = product of:
        0.5795544 = sum of:
          0.01904866 = weight(abstract_txt:results in 6207) [ClassicSimilarity], result of:
            0.01904866 = score(doc=6207,freq=1.0), product of:
              0.0700153 = queryWeight, product of:
                1.0163649 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01978162 = queryNorm
              0.27206424 = fieldWeight in 6207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.009000525 = weight(abstract_txt:that in 6207) [ClassicSimilarity], result of:
            0.009000525 = score(doc=6207,freq=1.0), product of:
              0.048621196 = queryWeight, product of:
                1.0373174 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01978162 = queryNorm
              0.18511525 = fieldWeight in 6207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.03157357 = weight(abstract_txt:documents in 6207) [ClassicSimilarity], result of:
            0.03157357 = score(doc=6207,freq=1.0), product of:
              0.098061636 = queryWeight, product of:
                1.2028259 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01978162 = queryNorm
              0.32197678 = fieldWeight in 6207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.08376907 = weight(abstract_txt:algorithm in 6207) [ClassicSimilarity], result of:
            0.08376907 = score(doc=6207,freq=1.0), product of:
              0.18793419 = queryWeight, product of:
                1.6651605 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01978162 = queryNorm
              0.44573617 = fieldWeight in 6207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.21668819 = weight(abstract_txt:clustering in 6207) [ClassicSimilarity], result of:
            0.21668819 = score(doc=6207,freq=4.0), product of:
              0.22309346 = queryWeight, product of:
                1.8142478 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.01978162 = queryNorm
              0.9712888 = fieldWeight in 6207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
          0.21947436 = weight(abstract_txt:cluster in 6207) [ClassicSimilarity], result of:
            0.21947436 = score(doc=6207,freq=3.0), product of:
              0.24764647 = queryWeight, product of:
                1.9114779 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.01978162 = queryNorm
              0.88624066 = fieldWeight in 6207, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=6207)
        0.24 = coord(6/25)
  3. Hu, G.; Zhou, S.; Guan, J.; Hu, X.: Towards effective document clustering : a constrained K-means based approach (2008) 0.12
    0.120636284 = sum of:
      0.120636284 = product of:
        0.6031814 = sum of:
          0.010800629 = weight(abstract_txt:that in 2113) [ClassicSimilarity], result of:
            0.010800629 = score(doc=2113,freq=1.0), product of:
              0.048621196 = queryWeight, product of:
                1.0373174 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01978162 = queryNorm
              0.22213829 = fieldWeight in 2113, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=2113)
          0.053582124 = weight(abstract_txt:documents in 2113) [ClassicSimilarity], result of:
            0.053582124 = score(doc=2113,freq=2.0), product of:
              0.098061636 = queryWeight, product of:
                1.2028259 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.01978162 = queryNorm
              0.5464127 = fieldWeight in 2113, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.09375 = fieldNorm(doc=2113)
          0.22518897 = weight(abstract_txt:clustering in 2113) [ClassicSimilarity], result of:
            0.22518897 = score(doc=2113,freq=3.0), product of:
              0.22309346 = queryWeight, product of:
                1.8142478 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.01978162 = queryNorm
              1.009393 = fieldWeight in 2113, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.09375 = fieldNorm(doc=2113)
          0.21504009 = weight(abstract_txt:cluster in 2113) [ClassicSimilarity], result of:
            0.21504009 = score(doc=2113,freq=2.0), product of:
              0.24764647 = queryWeight, product of:
                1.9114779 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.01978162 = queryNorm
              0.86833495 = fieldWeight in 2113, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.09375 = fieldNorm(doc=2113)
          0.09856955 = weight(abstract_txt:tool in 2113) [ClassicSimilarity], result of:
            0.09856955 = score(doc=2113,freq=1.0), product of:
              0.21233493 = queryWeight, product of:
                2.1677518 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.01978162 = queryNorm
              0.4642173 = fieldWeight in 2113, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.09375 = fieldNorm(doc=2113)
        0.2 = coord(5/25)
  4. Robertson, A.M.; Willett, P.: Use of genetic algorithms in information retrieval (1995) 0.10
    0.10197029 = sum of:
      0.10197029 = product of:
        0.50985146 = sum of:
          0.103770174 = weight(abstract_txt:comparable in 2418) [ClassicSimilarity], result of:
            0.103770174 = score(doc=2418,freq=1.0), product of:
              0.13747899 = queryWeight, product of:
                1.0070624 = boost
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.01978162 = queryNorm
              0.7548075 = fieldWeight in 2418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.901097 = idf(docFreq=120, maxDocs=44218)
                0.109375 = fieldNorm(doc=2418)
          0.02666812 = weight(abstract_txt:results in 2418) [ClassicSimilarity], result of:
            0.02666812 = score(doc=2418,freq=1.0), product of:
              0.0700153 = queryWeight, product of:
                1.0163649 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01978162 = queryNorm
              0.38088992 = fieldWeight in 2418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.109375 = fieldNorm(doc=2418)
          0.012600735 = weight(abstract_txt:that in 2418) [ClassicSimilarity], result of:
            0.012600735 = score(doc=2418,freq=1.0), product of:
              0.048621196 = queryWeight, product of:
                1.0373174 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01978162 = queryNorm
              0.25916135 = fieldWeight in 2418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.109375 = fieldNorm(doc=2418)
          0.16368325 = weight(abstract_txt:groupings in 2418) [ClassicSimilarity], result of:
            0.16368325 = score(doc=2418,freq=1.0), product of:
              0.18629055 = queryWeight, product of:
                1.1722862 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.01978162 = queryNorm
              0.87864494 = fieldWeight in 2418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.109375 = fieldNorm(doc=2418)
          0.20312919 = weight(abstract_txt:algorithm in 2418) [ClassicSimilarity], result of:
            0.20312919 = score(doc=2418,freq=3.0), product of:
              0.18793419 = queryWeight, product of:
                1.6651605 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.01978162 = queryNorm
              1.0808527 = fieldWeight in 2418, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.109375 = fieldNorm(doc=2418)
        0.2 = coord(5/25)
  5. Wilkinson, D.; Thelwall, M.: Trending Twitter topics in English : an international comparison (2012) 0.10
    0.09928741 = sum of:
      0.09928741 = product of:
        0.49643707 = sum of:
          0.08210702 = weight(abstract_txt:comparisons in 375) [ClassicSimilarity], result of:
            0.08210702 = score(doc=375,freq=2.0), product of:
              0.1355575 = queryWeight, product of:
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.01978162 = queryNorm
              0.6056988 = fieldWeight in 375, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.010182931 = weight(abstract_txt:that in 375) [ClassicSimilarity], result of:
            0.010182931 = score(doc=375,freq=2.0), product of:
              0.048621196 = queryWeight, product of:
                1.0373174 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01978162 = queryNorm
              0.20943399 = fieldWeight in 375, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.10641655 = weight(abstract_txt:media in 375) [ClassicSimilarity], result of:
            0.10641655 = score(doc=375,freq=5.0), product of:
              0.14959124 = queryWeight, product of:
                1.485615 = boost
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.01978162 = queryNorm
              0.71138227 = fieldWeight in 375, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.081397094 = weight(abstract_txt:setting in 375) [ClassicSimilarity], result of:
            0.081397094 = score(doc=375,freq=1.0), product of:
              0.21394196 = queryWeight, product of:
                1.7766472 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.01978162 = queryNorm
              0.38046345 = fieldWeight in 375, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
          0.21633346 = weight(abstract_txt:agenda in 375) [ClassicSimilarity], result of:
            0.21633346 = score(doc=375,freq=1.0), product of:
              0.46989727 = queryWeight, product of:
                3.2247815 = boost
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.01978162 = queryNorm
              0.4603846 = fieldWeight in 375, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.0625 = fieldNorm(doc=375)
        0.2 = coord(5/25)