Document (#26669)

Author
Koch, T.
Ardö, A.
Noodén, L.
Title
¬The construction of a robot-generated subject index : DESIRE II D3.6a, Working Paper 1
Source
http://www.lub.lu.se/desire/DESIRE36a-WP1.html
Year
1999
Abstract
This working paper describes the creation of a test database to carry out the automatic classification tasks of the DESIRE II work package D3.6a on. It is an improved version of NetLab's existing "All" Engineering database created after a comparative study of the outcome of two different approaches to collecting the documents. These two methods were selected from seven different general methodologies to build robot-generated subject indices, presented in this paper. We found a surprisingly low overlap between the Engineering link collections we used as seed pages for the robot and subsequently an even more surprisingly low overlap between the resources collected by the two different approaches. That inspite of using basically the same services to start the harvesting process from. A intellectual evaluation of the contents of both databases showed almost exactly the same percentage of relevant documents (77%), indicating that the main difference between those aproaches was the coverage of the resulting database.
Theme
Automatisches Klassifizieren
Internet
Object
DESIRE

Similar documents (author)

  1. Ardö, A.; Koch, T.: Lunds Universitets Elektroniska Bibliotek : Del.2: Gopher, World Wide Web (WWW). Planerade projekt (1993) 5.91
    5.9051695 = sum of:
      5.9051695 = sum of:
        1.974901 = weight(author_txt:koch in 6001) [ClassicSimilarity], result of:
          1.974901 = score(doc=6001,freq=1.0), product of:
            0.53427523 = queryWeight, product of:
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.072269455 = queryNorm
            3.6964111 = fieldWeight in 6001, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.5 = fieldNorm(doc=6001)
        3.9302683 = weight(author_txt:ardö in 6001) [ClassicSimilarity], result of:
          3.9302683 = score(doc=6001,freq=1.0), product of:
            0.84531057 = queryWeight, product of:
              1.2578406 = boost
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.072269455 = queryNorm
            4.649496 = fieldWeight in 6001, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.5 = fieldNorm(doc=6001)
    
  2. Ardö, A.; Koch, T.: Wide-area information server (WAIS) as the hub of an electronic library service at Lund University (1993) 5.91
    5.9051695 = sum of:
      5.9051695 = sum of:
        1.974901 = weight(author_txt:koch in 8459) [ClassicSimilarity], result of:
          1.974901 = score(doc=8459,freq=1.0), product of:
            0.53427523 = queryWeight, product of:
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.072269455 = queryNorm
            3.6964111 = fieldWeight in 8459, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.5 = fieldNorm(doc=8459)
        3.9302683 = weight(author_txt:ardö in 8459) [ClassicSimilarity], result of:
          3.9302683 = score(doc=8459,freq=1.0), product of:
            0.84531057 = queryWeight, product of:
              1.2578406 = boost
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.072269455 = queryNorm
            4.649496 = fieldWeight in 8459, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.5 = fieldNorm(doc=8459)
    
  3. Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 5.91
    5.9051695 = sum of:
      5.9051695 = sum of:
        1.974901 = weight(author_txt:koch in 382) [ClassicSimilarity], result of:
          1.974901 = score(doc=382,freq=1.0), product of:
            0.53427523 = queryWeight, product of:
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.072269455 = queryNorm
            3.6964111 = fieldWeight in 382, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.5 = fieldNorm(doc=382)
        3.9302683 = weight(author_txt:ardö in 382) [ClassicSimilarity], result of:
          3.9302683 = score(doc=382,freq=1.0), product of:
            0.84531057 = queryWeight, product of:
              1.2578406 = boost
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.072269455 = queryNorm
            4.649496 = fieldWeight in 382, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.5 = fieldNorm(doc=382)
    
  4. Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 5.91
    5.9051695 = sum of:
      5.9051695 = sum of:
        1.974901 = weight(author_txt:koch in 1667) [ClassicSimilarity], result of:
          1.974901 = score(doc=1667,freq=1.0), product of:
            0.53427523 = queryWeight, product of:
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.072269455 = queryNorm
            3.6964111 = fieldWeight in 1667, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.5 = fieldNorm(doc=1667)
        3.9302683 = weight(author_txt:ardö in 1667) [ClassicSimilarity], result of:
          3.9302683 = score(doc=1667,freq=1.0), product of:
            0.84531057 = queryWeight, product of:
              1.2578406 = boost
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.072269455 = queryNorm
            4.649496 = fieldWeight in 1667, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.5 = fieldNorm(doc=1667)
    
  5. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 4.43
    4.428877 = sum of:
      4.428877 = sum of:
        1.4811757 = weight(author_txt:koch in 1669) [ClassicSimilarity], result of:
          1.4811757 = score(doc=1669,freq=1.0), product of:
            0.53427523 = queryWeight, product of:
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.072269455 = queryNorm
            2.7723083 = fieldWeight in 1669, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.3928223 = idf(docFreq=73, maxDocs=44218)
              0.375 = fieldNorm(doc=1669)
        2.9477012 = weight(author_txt:ardö in 1669) [ClassicSimilarity], result of:
          2.9477012 = score(doc=1669,freq=1.0), product of:
            0.84531057 = queryWeight, product of:
              1.2578406 = boost
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.072269455 = queryNorm
            3.487122 = fieldWeight in 1669, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.298992 = idf(docFreq=10, maxDocs=44218)
              0.375 = fieldNorm(doc=1669)
    

Similar documents (content)

  1. Ardö, A.; Godby, J.; Houghton, A.; Koch, T.; Reighart, R.; Thompson, R.; Vizine-Goetz, D.: Browsing engineering resources on the Web : a general knowledge organization scheme (Dewey) vs. a special scheme (EI) (2000) 0.25
    0.25380602 = sum of:
      0.25380602 = product of:
        1.0575252 = sum of:
          0.029746326 = weight(abstract_txt:subject in 86) [ClassicSimilarity], result of:
            0.029746326 = score(doc=86,freq=1.0), product of:
              0.08121132 = queryWeight, product of:
                1.1240057 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.018492797 = queryNorm
              0.366283 = fieldWeight in 86, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.09375 = fieldNorm(doc=86)
          0.04937594 = weight(abstract_txt:documents in 86) [ClassicSimilarity], result of:
            0.04937594 = score(doc=86,freq=2.0), product of:
              0.09036382 = queryWeight, product of:
                1.1856527 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018492797 = queryNorm
              0.5464127 = fieldWeight in 86, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.09375 = fieldNorm(doc=86)
          0.08393728 = weight(abstract_txt:generated in 86) [ClassicSimilarity], result of:
            0.08393728 = score(doc=86,freq=1.0), product of:
              0.16216767 = queryWeight, product of:
                1.5883359 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.018492797 = queryNorm
              0.51759565 = fieldWeight in 86, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.09375 = fieldNorm(doc=86)
          0.21156341 = weight(abstract_txt:engineering in 86) [ClassicSimilarity], result of:
            0.21156341 = score(doc=86,freq=4.0), product of:
              0.18920593 = queryWeight, product of:
                1.7156456 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.018492797 = queryNorm
              1.1181648 = fieldWeight in 86, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.09375 = fieldNorm(doc=86)
          0.19829196 = weight(abstract_txt:desire in 86) [ClassicSimilarity], result of:
            0.19829196 = score(doc=86,freq=1.0), product of:
              0.28765002 = queryWeight, product of:
                2.1153996 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.018492797 = queryNorm
              0.68935144 = fieldWeight in 86, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.09375 = fieldNorm(doc=86)
          0.48461026 = weight(abstract_txt:robot in 86) [ClassicSimilarity], result of:
            0.48461026 = score(doc=86,freq=1.0), product of:
              0.5974293 = queryWeight, product of:
                3.7337832 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.018492797 = queryNorm
              0.8111592 = fieldWeight in 86, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=86)
        0.24 = coord(6/25)
    
  2. Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.18
    0.18417993 = sum of:
      0.18417993 = product of:
        0.9208996 = sum of:
          0.029746326 = weight(abstract_txt:subject in 1568) [ClassicSimilarity], result of:
            0.029746326 = score(doc=1568,freq=1.0), product of:
              0.08121132 = queryWeight, product of:
                1.1240057 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.018492797 = queryNorm
              0.366283 = fieldWeight in 1568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.09375 = fieldNorm(doc=1568)
          0.14959793 = weight(abstract_txt:engineering in 1568) [ClassicSimilarity], result of:
            0.14959793 = score(doc=1568,freq=2.0), product of:
              0.18920593 = queryWeight, product of:
                1.7156456 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.018492797 = queryNorm
              0.79066193 = fieldWeight in 1568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.09375 = fieldNorm(doc=1568)
          0.058653157 = weight(abstract_txt:database in 1568) [ClassicSimilarity], result of:
            0.058653157 = score(doc=1568,freq=1.0), product of:
              0.14617924 = queryWeight, product of:
                1.8469226 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.018492797 = queryNorm
              0.40124136 = fieldWeight in 1568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.09375 = fieldNorm(doc=1568)
          0.19829196 = weight(abstract_txt:desire in 1568) [ClassicSimilarity], result of:
            0.19829196 = score(doc=1568,freq=1.0), product of:
              0.28765002 = queryWeight, product of:
                2.1153996 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.018492797 = queryNorm
              0.68935144 = fieldWeight in 1568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.09375 = fieldNorm(doc=1568)
          0.48461026 = weight(abstract_txt:robot in 1568) [ClassicSimilarity], result of:
            0.48461026 = score(doc=1568,freq=1.0), product of:
              0.5974293 = queryWeight, product of:
                3.7337832 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.018492797 = queryNorm
              0.8111592 = fieldWeight in 1568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=1568)
        0.2 = coord(5/25)
    
  3. MacCain, K.W.: Descriptor and citation retrieval in the medical behavioral sciences literature : retrieval overlaps and novelty distribution (1989) 0.15
    0.1526043 = sum of:
      0.1526043 = product of:
        0.54501534 = sum of:
          0.07641959 = weight(abstract_txt:percentage in 2290) [ClassicSimilarity], result of:
            0.07641959 = score(doc=2290,freq=1.0), product of:
              0.13653427 = queryWeight, product of:
                1.0305429 = boost
                7.1642876 = idf(docFreq=92, maxDocs=44218)
                0.018492797 = queryNorm
              0.55970997 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1642876 = idf(docFreq=92, maxDocs=44218)
                0.078125 = fieldNorm(doc=2290)
          0.024788605 = weight(abstract_txt:subject in 2290) [ClassicSimilarity], result of:
            0.024788605 = score(doc=2290,freq=1.0), product of:
              0.08121132 = queryWeight, product of:
                1.1240057 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.018492797 = queryNorm
              0.30523583 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.078125 = fieldNorm(doc=2290)
          0.029095054 = weight(abstract_txt:documents in 2290) [ClassicSimilarity], result of:
            0.029095054 = score(doc=2290,freq=1.0), product of:
              0.09036382 = queryWeight, product of:
                1.1856527 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018492797 = queryNorm
              0.32197678 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=2290)
          0.0366289 = weight(abstract_txt:between in 2290) [ClassicSimilarity], result of:
            0.0366289 = score(doc=2290,freq=2.0), product of:
              0.09572341 = queryWeight, product of:
                1.4945654 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.018492797 = queryNorm
              0.3826535 = fieldWeight in 2290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.078125 = fieldNorm(doc=2290)
          0.030704869 = weight(abstract_txt:different in 2290) [ClassicSimilarity], result of:
            0.030704869 = score(doc=2290,freq=1.0), product of:
              0.10722197 = queryWeight, product of:
                1.581786 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.018492797 = queryNorm
              0.28636733 = fieldWeight in 2290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.078125 = fieldNorm(doc=2290)
          0.06912341 = weight(abstract_txt:database in 2290) [ClassicSimilarity], result of:
            0.06912341 = score(doc=2290,freq=2.0), product of:
              0.14617924 = queryWeight, product of:
                1.8469226 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.018492797 = queryNorm
              0.47286746 = fieldWeight in 2290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.078125 = fieldNorm(doc=2290)
          0.27825493 = weight(abstract_txt:overlap in 2290) [ClassicSimilarity], result of:
            0.27825493 = score(doc=2290,freq=4.0), product of:
              0.25648212 = queryWeight, product of:
                1.9975091 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.018492797 = queryNorm
              1.0848901 = fieldWeight in 2290, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.078125 = fieldNorm(doc=2290)
        0.28 = coord(7/25)
    
  4. Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.15
    0.15020278 = sum of:
      0.15020278 = product of:
        0.93876743 = sum of:
          0.03966177 = weight(abstract_txt:subject in 4088) [ClassicSimilarity], result of:
            0.03966177 = score(doc=4088,freq=1.0), product of:
              0.08121132 = queryWeight, product of:
                1.1240057 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.018492797 = queryNorm
              0.48837733 = fieldWeight in 4088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.125 = fieldNorm(doc=4088)
          0.11191636 = weight(abstract_txt:generated in 4088) [ClassicSimilarity], result of:
            0.11191636 = score(doc=4088,freq=1.0), product of:
              0.16216767 = queryWeight, product of:
                1.5883359 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.018492797 = queryNorm
              0.6901275 = fieldWeight in 4088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.125 = fieldNorm(doc=4088)
          0.14104228 = weight(abstract_txt:engineering in 4088) [ClassicSimilarity], result of:
            0.14104228 = score(doc=4088,freq=1.0), product of:
              0.18920593 = queryWeight, product of:
                1.7156456 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.018492797 = queryNorm
              0.7454432 = fieldWeight in 4088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.125 = fieldNorm(doc=4088)
          0.646147 = weight(abstract_txt:robot in 4088) [ClassicSimilarity], result of:
            0.646147 = score(doc=4088,freq=1.0), product of:
              0.5974293 = queryWeight, product of:
                3.7337832 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.018492797 = queryNorm
              1.0815456 = fieldWeight in 4088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.125 = fieldNorm(doc=4088)
        0.16 = coord(4/25)
    
  5. Kimmel, S.: WWW search tools in reference services (1997) 0.14
    0.14359052 = sum of:
      0.14359052 = product of:
        1.1965877 = sum of:
          0.05949265 = weight(abstract_txt:subject in 619) [ClassicSimilarity], result of:
            0.05949265 = score(doc=619,freq=1.0), product of:
              0.08121132 = queryWeight, product of:
                1.1240057 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.018492797 = queryNorm
              0.732566 = fieldWeight in 619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.1875 = fieldNorm(doc=619)
          0.16787456 = weight(abstract_txt:generated in 619) [ClassicSimilarity], result of:
            0.16787456 = score(doc=619,freq=1.0), product of:
              0.16216767 = queryWeight, product of:
                1.5883359 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.018492797 = queryNorm
              1.0351913 = fieldWeight in 619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.1875 = fieldNorm(doc=619)
          0.9692205 = weight(abstract_txt:robot in 619) [ClassicSimilarity], result of:
            0.9692205 = score(doc=619,freq=1.0), product of:
              0.5974293 = queryWeight, product of:
                3.7337832 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.018492797 = queryNorm
              1.6223184 = fieldWeight in 619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.1875 = fieldNorm(doc=619)
        0.12 = coord(3/25)