Document (#26916)

Author
Gaese, V.
Title
"Automatische Klassifikation von Presseartikeln in der Gruner + Jahr Dokumentation"
Source
Bibliotheken und Informationseinrichtungen - Aufgaben, Strukturen, Ziele: 29. Arbeits- und Fortbildungstagung der ASpB / Sektion 5 im DBV in Zusammenarbeit mit der BDB, BIB, DBV, DGI und VDB, zugleich DBV-Jahrestagung, 8.-11.4.2003 in Stuttgart. Red.: Margit Bauer
Imprint
Jülich : ASpB / Sektion 5 im DBV
Year
2003
Pages
S.401-413
Abstract
Das Klassifizieren von Texten, auch Indexieren, inhaltliches Erschließen oder verschlagworten genannt, gehört seit jeher zu den zwar notwendigen aber sehr aufwändigen Aufgaben von Archiven bzw. Dokumentationen. Ihre unterschiedlichen Zwecke bzw. Anforderungen sind sicher ein Grund dafür, dass es fast ebenso viele Erschließungsinventare, Thesauri oder Schlagwortverzeichnisse wie Dokumentationen gibt. Im folgenden werden Klassifizierung, Indexierung, Erschließung und Verschlagwortung synonym verwendet. In der G+J Dokumentation arbeiten heute ca. 20 Dokumentare an Auswahl und Erschließung von täglich etwa 1.100 Artikeln aus insgesamt ca. 210 Titeln. In der G+J Pressedatenbank sind aktuell ca. 7 Mio Artikel gespeichert, gut 2 Mio als digitaler Volltext (OCR/Satzdaten). Archiviert sind nur Artikel, für die die G+J Dokumentation die entsprechenden Rechte hat.
Theme
Automatisches Indexieren
Object
Gruner + Jahr
Location
D
Hamburg
Area
Pressearchive

Similar documents (content)

  1. Schek, M.: Automatische Klassifizierung in Erschließung und Recherche eines Pressearchivs (2006) 0.17
    0.16700602 = sum of:
      0.16700602 = product of:
        0.59645003 = sum of:
          0.059912506 = weight(abstract_txt:täglich in 6043) [ClassicSimilarity], result of:
            0.059912506 = score(doc=6043,freq=1.0), product of:
              0.14378431 = queryWeight, product of:
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.018870942 = queryNorm
              0.4166832 = fieldWeight in 6043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6043)
          0.10806281 = weight(abstract_txt:klassifizierung in 6043) [ClassicSimilarity], result of:
            0.10806281 = score(doc=6043,freq=2.0), product of:
              0.16909887 = queryWeight, product of:
                1.0844626 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.018870942 = queryNorm
              0.6390511 = fieldWeight in 6043, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6043)
          0.020680316 = weight(abstract_txt:oder in 6043) [ClassicSimilarity], result of:
            0.020680316 = score(doc=6043,freq=1.0), product of:
              0.08914178 = queryWeight, product of:
                1.1135248 = boost
                4.2421675 = idf(docFreq=1727, maxDocs=44218)
                0.018870942 = queryNorm
              0.23199353 = fieldWeight in 6043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2421675 = idf(docFreq=1727, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6043)
          0.08773405 = weight(abstract_txt:dokumentare in 6043) [ClassicSimilarity], result of:
            0.08773405 = score(doc=6043,freq=1.0), product of:
              0.1854152 = queryWeight, product of:
                1.1355778 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.018870942 = queryNorm
              0.47317618 = fieldWeight in 6043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6043)
          0.19449927 = weight(abstract_txt:pressedatenbank in 6043) [ClassicSimilarity], result of:
            0.19449927 = score(doc=6043,freq=3.0), product of:
              0.21857715 = queryWeight, product of:
                1.2329533 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.018870942 = queryNorm
              0.8898426 = fieldWeight in 6043, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6043)
          0.083250776 = weight(abstract_txt:artikel in 6043) [ClassicSimilarity], result of:
            0.083250776 = score(doc=6043,freq=3.0), product of:
              0.15640897 = queryWeight, product of:
                1.4749933 = boost
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.018870942 = queryNorm
              0.53226346 = fieldWeight in 6043, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6043)
          0.042310294 = weight(abstract_txt:sind in 6043) [ClassicSimilarity], result of:
            0.042310294 = score(doc=6043,freq=3.0), product of:
              0.114024244 = queryWeight, product of:
                1.5424229 = boost
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.018870942 = queryNorm
              0.371064 = fieldWeight in 6043, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.0546875 = fieldNorm(doc=6043)
        0.28 = coord(7/25)
    
  2. Portal "Bibliothek Information Dokumentation" eingestellt (2004) 0.14
    0.14213264 = sum of:
      0.14213264 = product of:
        1.1844387 = sum of:
          0.1647938 = weight(abstract_txt:artikel in 3293) [ClassicSimilarity], result of:
            0.1647938 = score(doc=3293,freq=1.0), product of:
              0.15640897 = queryWeight, product of:
                1.4749933 = boost
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.018870942 = queryNorm
              1.0536084 = fieldWeight in 3293, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.1875 = fieldNorm(doc=3293)
          0.08375266 = weight(abstract_txt:sind in 3293) [ClassicSimilarity], result of:
            0.08375266 = score(doc=3293,freq=1.0), product of:
              0.114024244 = queryWeight, product of:
                1.5424229 = boost
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.018870942 = queryNorm
              0.73451626 = fieldWeight in 3293, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.1875 = fieldNorm(doc=3293)
          0.9358923 = weight(title_txt:dokumentation in 3293) [ClassicSimilarity], result of:
            0.9358923 = score(doc=3293,freq=1.0), product of:
              0.32396457 = queryWeight, product of:
                2.599882 = boost
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.018870942 = queryNorm
              2.8888724 = fieldWeight in 3293, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.4375 = fieldNorm(doc=3293)
        0.12 = coord(3/25)
    
  3. Rahmstorf, G.: ¬Der eigene Kern der Dokumentation im Wandel der Technik (1997) 0.11
    0.11231629 = sum of:
      0.11231629 = product of:
        0.9359691 = sum of:
          0.117073365 = weight(abstract_txt:indexieren in 2995) [ClassicSimilarity], result of:
            0.117073365 = score(doc=2995,freq=1.0), product of:
              0.15689765 = queryWeight, product of:
                1.0446059 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.018870942 = queryNorm
              0.74617666 = fieldWeight in 2995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.09375 = fieldNorm(doc=2995)
          0.15040123 = weight(abstract_txt:dokumentare in 2995) [ClassicSimilarity], result of:
            0.15040123 = score(doc=2995,freq=1.0), product of:
              0.1854152 = queryWeight, product of:
                1.1355778 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.018870942 = queryNorm
              0.8111592 = fieldWeight in 2995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=2995)
          0.6684945 = weight(title_txt:dokumentation in 2995) [ClassicSimilarity], result of:
            0.6684945 = score(doc=2995,freq=1.0), product of:
              0.32396457 = queryWeight, product of:
                2.599882 = boost
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.018870942 = queryNorm
              2.0634804 = fieldWeight in 2995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.3125 = fieldNorm(doc=2995)
        0.12 = coord(3/25)
    
  4. Gaus, W.: Information und Dokumentation in der Medizin (2004) 0.11
    0.10579957 = sum of:
      0.10579957 = product of:
        0.8816631 = sum of:
          0.023634648 = weight(abstract_txt:oder in 2952) [ClassicSimilarity], result of:
            0.023634648 = score(doc=2952,freq=1.0), product of:
              0.08914178 = queryWeight, product of:
                1.1135248 = boost
                4.2421675 = idf(docFreq=1727, maxDocs=44218)
                0.018870942 = queryNorm
              0.26513547 = fieldWeight in 2952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2421675 = idf(docFreq=1727, maxDocs=44218)
                0.0625 = fieldNorm(doc=2952)
          0.05583511 = weight(abstract_txt:sind in 2952) [ClassicSimilarity], result of:
            0.05583511 = score(doc=2952,freq=4.0), product of:
              0.114024244 = queryWeight, product of:
                1.5424229 = boost
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.018870942 = queryNorm
              0.48967752 = fieldWeight in 2952, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.0625 = fieldNorm(doc=2952)
          0.80219334 = weight(title_txt:dokumentation in 2952) [ClassicSimilarity], result of:
            0.80219334 = score(doc=2952,freq=1.0), product of:
              0.32396457 = queryWeight, product of:
                2.599882 = boost
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.018870942 = queryNorm
              2.4761763 = fieldWeight in 2952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.375 = fieldNorm(doc=2952)
        0.12 = coord(3/25)
    
  5. Grundlagen der praktischen Information und Dokumentation (2004) 0.11
    0.10537202 = sum of:
      0.10537202 = product of:
        0.87810016 = sum of:
          0.04136063 = weight(abstract_txt:oder in 693) [ClassicSimilarity], result of:
            0.04136063 = score(doc=693,freq=4.0), product of:
              0.08914178 = queryWeight, product of:
                1.1135248 = boost
                4.2421675 = idf(docFreq=1727, maxDocs=44218)
                0.018870942 = queryNorm
              0.46398705 = fieldWeight in 693, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2421675 = idf(docFreq=1727, maxDocs=44218)
                0.0546875 = fieldNorm(doc=693)
          0.03454621 = weight(abstract_txt:sind in 693) [ClassicSimilarity], result of:
            0.03454621 = score(doc=693,freq=2.0), product of:
              0.114024244 = queryWeight, product of:
                1.5424229 = boost
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.018870942 = queryNorm
              0.3029725 = fieldWeight in 693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9174201 = idf(docFreq=2390, maxDocs=44218)
                0.0546875 = fieldNorm(doc=693)
          0.80219334 = weight(title_txt:dokumentation in 693) [ClassicSimilarity], result of:
            0.80219334 = score(doc=693,freq=1.0), product of:
              0.32396457 = queryWeight, product of:
                2.599882 = boost
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.018870942 = queryNorm
              2.4761763 = fieldWeight in 693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.603137 = idf(docFreq=162, maxDocs=44218)
                0.375 = fieldNorm(doc=693)
        0.12 = coord(3/25)