Document (#37836)

Tirilly, P.
Julien, C.-A.
Random walks for subject hierarchy simplification
Categories, contexts and relations in knowledge organization: Proceedings of the Twelfth International ISKO Conference 6-9 August 2012, Mysore, India. Eds.: Neelameghan, A. u. K.S. Raghavan
Würzburg : Ergon Verlag
Advances in knowledge organization; vol.13
Although subject hierarchies are widely used to index document collections, few tools leverage their structure to facilitate collection browsing. This is mostly due to the complexity of such structures that include thousands of nodes. This paper proposes a new approach to simplify subject hierarchies based on the distribution of documents among the nodes. A random walk algorithm simulates the route of a user within the hierarchy, under the assumption that the user is attracted by the most populated nodes. Poorly visited nodes can be identified and eliminated, leaving a structure containing only the nodes that best represent the content of the collection. Experiments on a collection indexed using the Library of Congress Subject Headings (LCSH) show that, as compared to the state-of-the-art simplification method, the random walk-based approach gives access to a larger part of the collection for the same structure size, and offers more flexibility to customize the complexity of thestructure.
Verbale Doksprachen im Online-Retrieval

Similar documents (author)

  1. Julien, S.: ¬The future of GUI development tools (1992) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:julien in 4646) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 4646, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=4646)
  2. Julien, H.: Trends in the recent information needs and uses literature : a content analysis (1995) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:julien in 3346) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 3346, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=3346)
  3. Julien, H.: ¬A content analysis of the recent information needs and uses literature (1996) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:julien in 7875) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 7875, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=7875)
  4. Julien, H.E.: How career information helps adolescents' decision-making (1997) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:julien in 992) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 992, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=992)
  5. Julien, H.: Bibliographic instruction trends in Canadian academic libraries (1997) 5.35
    5.3508706 = sum of:
      5.3508706 = weight(author_txt:julien in 2072) [ClassicSimilarity], result of:
        5.3508706 = fieldWeight in 2072, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.561393 = idf(docFreq=22, maxDocs=44218)
          0.625 = fieldNorm(doc=2072)

Similar documents (content)

  1. Botafogo, R.A.; Shneiderman, B.; Rivlin, E.: Structural analysis of hypertexts identifying hierarchies and useful metrics (1992) 0.16
    0.15590326 = sum of:
      0.15590326 = product of:
        0.6495969 = sum of:
          0.012529781 = weight(abstract_txt:that in 4027) [ClassicSimilarity], result of:
            0.012529781 = score(doc=4027,freq=2.0), product of:
              0.0478615 = queryWeight, product of:
                1.2843729 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015726896 = queryNorm
              0.26179248 = fieldWeight in 4027, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=4027)
          0.0688461 = weight(abstract_txt:complexity in 4027) [ClassicSimilarity], result of:
            0.0688461 = score(doc=4027,freq=1.0), product of:
              0.14903109 = queryWeight, product of:
                1.6025863 = boost
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.015726896 = queryNorm
              0.461958 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.078125 = fieldNorm(doc=4027)
          0.093550295 = weight(abstract_txt:hierarchy in 4027) [ClassicSimilarity], result of:
            0.093550295 = score(doc=4027,freq=1.0), product of:
              0.18283279 = queryWeight, product of:
                1.7750477 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.015726896 = queryNorm
              0.5116713 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=4027)
          0.11927279 = weight(abstract_txt:hierarchies in 4027) [ClassicSimilarity], result of:
            0.11927279 = score(doc=4027,freq=1.0), product of:
              0.21497348 = queryWeight, product of:
                1.9247549 = boost
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.015726896 = queryNorm
              0.5548256 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.078125 = fieldNorm(doc=4027)
          0.06689763 = weight(abstract_txt:collection in 4027) [ClassicSimilarity], result of:
            0.06689763 = score(doc=4027,freq=1.0), product of:
              0.18420771 = queryWeight, product of:
                2.5197177 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.015726896 = queryNorm
              0.36316413 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=4027)
          0.28850037 = weight(abstract_txt:nodes in 4027) [ClassicSimilarity], result of:
            0.28850037 = score(doc=4027,freq=1.0), product of:
              0.52573663 = queryWeight, product of:
                4.7592344 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015726896 = queryNorm
              0.5487546 = fieldWeight in 4027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4027)
        0.24 = coord(6/25)
  2. Kumar, H.P.; Plaisant, C.; Shneiderman, B.: Browsing hierarchical data with multi-level dynamic queries and pruning (1997) 0.14
    0.13992502 = sum of:
      0.13992502 = product of:
        0.8745314 = sum of:
          0.008859893 = weight(abstract_txt:that in 750) [ClassicSimilarity], result of:
            0.008859893 = score(doc=750,freq=1.0), product of:
              0.0478615 = queryWeight, product of:
                1.2843729 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015726896 = queryNorm
              0.18511525 = fieldWeight in 750, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=750)
          0.11927279 = weight(abstract_txt:hierarchies in 750) [ClassicSimilarity], result of:
            0.11927279 = score(doc=750,freq=1.0), product of:
              0.21497348 = queryWeight, product of:
                1.9247549 = boost
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.015726896 = queryNorm
              0.5548256 = fieldWeight in 750, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.078125 = fieldNorm(doc=750)
          0.03971993 = weight(abstract_txt:subject in 750) [ClassicSimilarity], result of:
            0.03971993 = score(doc=750,freq=1.0), product of:
              0.13012867 = queryWeight, product of:
                2.1177976 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.015726896 = queryNorm
              0.30523583 = fieldWeight in 750, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.078125 = fieldNorm(doc=750)
          0.70667875 = weight(abstract_txt:nodes in 750) [ClassicSimilarity], result of:
            0.70667875 = score(doc=750,freq=6.0), product of:
              0.52573663 = queryWeight, product of:
                4.7592344 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015726896 = queryNorm
              1.3441688 = fieldWeight in 750, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=750)
        0.16 = coord(4/25)
  3. Torres, S.D.; Hiemstra, D.; Weber, I.; Serdyukov, P.: Query recommendation in the information domain of children (2014) 0.11
    0.11041788 = sum of:
      0.11041788 = product of:
        0.5520894 = sum of:
          0.013995924 = weight(abstract_txt:approach in 1300) [ClassicSimilarity], result of:
            0.013995924 = score(doc=1300,freq=1.0), product of:
              0.059790436 = queryWeight, product of:
                1.0150766 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.015726896 = queryNorm
              0.234083 = fieldWeight in 1300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=1300)
          0.10143571 = weight(abstract_txt:walks in 1300) [ClassicSimilarity], result of:
            0.10143571 = score(doc=1300,freq=1.0), product of:
              0.17772475 = queryWeight, product of:
                1.2374907 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.015726896 = queryNorm
              0.5707461 = fieldWeight in 1300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=1300)
          0.0141758295 = weight(abstract_txt:that in 1300) [ClassicSimilarity], result of:
            0.0141758295 = score(doc=1300,freq=4.0), product of:
              0.0478615 = queryWeight, product of:
                1.2843729 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015726896 = queryNorm
              0.2961844 = fieldWeight in 1300, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1300)
          0.22957732 = weight(abstract_txt:walk in 1300) [ClassicSimilarity], result of:
            0.22957732 = score(doc=1300,freq=2.0), product of:
              0.30636558 = queryWeight, product of:
                2.297751 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.015726896 = queryNorm
              0.7493574 = fieldWeight in 1300, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.0625 = fieldNorm(doc=1300)
          0.1929046 = weight(abstract_txt:random in 1300) [ClassicSimilarity], result of:
            0.1929046 = score(doc=1300,freq=3.0), product of:
              0.27280295 = queryWeight, product of:
                2.655542 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.015726896 = queryNorm
              0.70712066 = fieldWeight in 1300, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.0625 = fieldNorm(doc=1300)
        0.2 = coord(5/25)
  4. Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.11
    0.10855277 = sum of:
      0.10855277 = product of:
        0.5427638 = sum of:
          0.015345786 = weight(abstract_txt:that in 2218) [ClassicSimilarity], result of:
            0.015345786 = score(doc=2218,freq=3.0), product of:
              0.0478615 = queryWeight, product of:
                1.2843729 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015726896 = queryNorm
              0.320629 = fieldWeight in 2218, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=2218)
          0.13230008 = weight(abstract_txt:hierarchy in 2218) [ClassicSimilarity], result of:
            0.13230008 = score(doc=2218,freq=2.0), product of:
              0.18283279 = queryWeight, product of:
                1.7750477 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.015726896 = queryNorm
              0.7236124 = fieldWeight in 2218, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=2218)
          0.03971993 = weight(abstract_txt:subject in 2218) [ClassicSimilarity], result of:
            0.03971993 = score(doc=2218,freq=1.0), product of:
              0.13012867 = queryWeight, product of:
                2.1177976 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.015726896 = queryNorm
              0.30523583 = fieldWeight in 2218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.078125 = fieldNorm(doc=2218)
          0.06689763 = weight(abstract_txt:collection in 2218) [ClassicSimilarity], result of:
            0.06689763 = score(doc=2218,freq=1.0), product of:
              0.18420771 = queryWeight, product of:
                2.5197177 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.015726896 = queryNorm
              0.36316413 = fieldWeight in 2218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=2218)
          0.28850037 = weight(abstract_txt:nodes in 2218) [ClassicSimilarity], result of:
            0.28850037 = score(doc=2218,freq=1.0), product of:
              0.52573663 = queryWeight, product of:
                4.7592344 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015726896 = queryNorm
              0.5487546 = fieldWeight in 2218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2218)
        0.2 = coord(5/25)
  5. Zhang, J.; Zeng, M.L.: ¬A new similarity measure for subject hierarchical structures (2014) 0.11
    0.105580926 = sum of:
      0.105580926 = product of:
        0.5279046 = sum of:
          0.013995924 = weight(abstract_txt:approach in 1778) [ClassicSimilarity], result of:
            0.013995924 = score(doc=1778,freq=1.0), product of:
              0.059790436 = queryWeight, product of:
                1.0150766 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.015726896 = queryNorm
              0.234083 = fieldWeight in 1778, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=1778)
          0.010023825 = weight(abstract_txt:that in 1778) [ClassicSimilarity], result of:
            0.010023825 = score(doc=1778,freq=2.0), product of:
              0.0478615 = queryWeight, product of:
                1.2843729 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015726896 = queryNorm
              0.20943399 = fieldWeight in 1778, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1778)
          0.03307385 = weight(abstract_txt:structure in 1778) [ClassicSimilarity], result of:
            0.03307385 = score(doc=1778,freq=1.0), product of:
              0.12142765 = queryWeight, product of:
                1.7716886 = boost
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.015726896 = queryNorm
              0.27237496 = fieldWeight in 1778, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3579993 = idf(docFreq=1538, maxDocs=44218)
                0.0625 = fieldNorm(doc=1778)
          0.07105318 = weight(abstract_txt:subject in 1778) [ClassicSimilarity], result of:
            0.07105318 = score(doc=1778,freq=5.0), product of:
              0.13012867 = queryWeight, product of:
                2.1177976 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.015726896 = queryNorm
              0.5460225 = fieldWeight in 1778, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.0625 = fieldNorm(doc=1778)
          0.39975786 = weight(abstract_txt:nodes in 1778) [ClassicSimilarity], result of:
            0.39975786 = score(doc=1778,freq=3.0), product of:
              0.52573663 = queryWeight, product of:
                4.7592344 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015726896 = queryNorm
              0.76037663 = fieldWeight in 1778, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1778)
        0.2 = coord(5/25)