Document (#36873)

Author
Bredack, J.
Lepsky, K.
Title
Automatische Extraktion von Fachterminologie aus Volltexten
Source
ABI-Technik. 34(2014) H.1, S.2-8
Year
2014
Abstract
Fachterminologie in wissenschaftlichen Texten liegt häufig in Form von Phrasen oder Mehrwortgruppen vor. Vorgestellt wird ein algorithmisches Verfahren zur Identifikation und Extraktion fachtermi­nologischer Mehrwortgruppen. Besonderer Schwerpunkt ist die Einbindung von Funktionswörtern der deutschen Sprache, um die Extraktion komplexer Mehrwortkonstruktionen zu ermöglichen. Eingesetzt wurde das automatische Indexierungssystem Lingo. Die Ergebnisse für eine Extraktion kunsthistorischer Fachterminologie aus dem Reallexikon zur Deutschen Kunstgeschichte belegen die Tauglichkeit des Verfahrens.
Theme
Automatisches Indexieren
Field
Kunst
Object
Lingo
RDK

Similar documents (author)

  1. Lepsky, K.: Art and language : Ernst H. Gombrich and Karl Bühler's theory of language (1996) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 5229) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 5229, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=5229)
    
  2. Lepsky, K.: Maschinelle Indexierung von Titelaufnahmen zur Verbesserung der sachlichen Erschließung in Online-Publikumskatalogen (1994) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 7064) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 7064, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=7064)
    
  3. Lepsky, K.: RSWK - und was noch? : Stellungnahme zum Bericht 'Sacherschließung in Online-Katalogen' der Expertengruppe Online-Kataloge (1995) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 772) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 772, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=772)
    
  4. Lepsky, K.: Bild und Wirklichkeit : die Wirklichkeit im Bild (1987) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 1346) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 1346, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=1346)
    
  5. Lepsky, K.: Ernst H. Gombrich : Theorie und Methode (1991) 5.04
    5.0370636 = sum of:
      5.0370636 = weight(author_txt:lepsky in 1685) [ClassicSimilarity], result of:
        5.0370636 = fieldWeight in 1685, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.059301 = idf(docFreq=37, maxDocs=44218)
          0.625 = fieldNorm(doc=1685)
    

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.61
    0.60917056 = sum of:
      0.60917056 = product of:
        1.3844786 = sum of:
          0.027699927 = weight(abstract_txt:verfahren in 1054) [ClassicSimilarity], result of:
            0.027699927 = score(doc=1054,freq=4.0), product of:
              0.061534103 = queryWeight, product of:
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.010679309 = queryNorm
              0.4501557 = fieldWeight in 1054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.026858496 = weight(abstract_txt:sprache in 1054) [ClassicSimilarity], result of:
            0.026858496 = score(doc=1054,freq=3.0), product of:
              0.066348456 = queryWeight, product of:
                1.0383828 = boost
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.010679309 = queryNorm
              0.40480965 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.026989765 = weight(abstract_txt:texten in 1054) [ClassicSimilarity], result of:
            0.026989765 = score(doc=1054,freq=1.0), product of:
              0.09600257 = queryWeight, product of:
                1.2490605 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.010679309 = queryNorm
              0.28113586 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.05371062 = weight(abstract_txt:einbindung in 1054) [ClassicSimilarity], result of:
            0.05371062 = score(doc=1054,freq=3.0), product of:
              0.105313286 = queryWeight, product of:
                1.3082286 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.010679309 = queryNorm
              0.51000804 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.038666427 = weight(abstract_txt:verfahrens in 1054) [ClassicSimilarity], result of:
            0.038666427 = score(doc=1054,freq=1.0), product of:
              0.122003645 = queryWeight, product of:
                1.4080837 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.010679309 = queryNorm
              0.31692845 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.113193 = weight(abstract_txt:lingo in 1054) [ClassicSimilarity], result of:
            0.113193 = score(doc=1054,freq=4.0), product of:
              0.15728109 = queryWeight, product of:
                1.598749 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010679309 = queryNorm
              0.71968603 = fieldWeight in 1054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.062066175 = weight(abstract_txt:indexierungssystem in 1054) [ClassicSimilarity], result of:
            0.062066175 = score(doc=1054,freq=1.0), product of:
              0.16725795 = queryWeight, product of:
                1.6486765 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.010679309 = queryNorm
              0.37108058 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.027717594 = weight(abstract_txt:deutschen in 1054) [ClassicSimilarity], result of:
            0.027717594 = score(doc=1054,freq=2.0), product of:
              0.09772083 = queryWeight, product of:
                1.7821761 = boost
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.010679309 = queryNorm
              0.2836406 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.50735545 = weight(abstract_txt:mehrwortgruppen in 1054) [ClassicSimilarity], result of:
            0.50735545 = score(doc=1054,freq=13.0), product of:
              0.363681 = queryWeight, product of:
                3.4380908 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010679309 = queryNorm
              1.3950564 = fieldWeight in 1054, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.16978948 = weight(abstract_txt:fachterminologie in 1054) [ClassicSimilarity], result of:
            0.16978948 = score(doc=1054,freq=1.0), product of:
              0.4718432 = queryWeight, product of:
                4.796247 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010679309 = queryNorm
              0.35984302 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.33043176 = weight(abstract_txt:extraktion in 1054) [ClassicSimilarity], result of:
            0.33043176 = score(doc=1054,freq=3.0), product of:
              0.56128603 = queryWeight, product of:
                6.0403786 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.010679309 = queryNorm
              0.58870476 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
        0.44 = coord(11/25)
    
  2. Grün, S.: Bildung von Komposita-Indextermen auf der Basis einer algorithmischen Mehrwortgruppenanalyse mit Lingo (2015) 0.23
    0.22556305 = sum of:
      0.22556305 = product of:
        1.1278152 = sum of:
          0.031013519 = weight(abstract_txt:sprache in 1335) [ClassicSimilarity], result of:
            0.031013519 = score(doc=1335,freq=1.0), product of:
              0.066348456 = queryWeight, product of:
                1.0383828 = boost
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.010679309 = queryNorm
              0.46743393 = fieldWeight in 1335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.078125 = fieldNorm(doc=1335)
          0.113193 = weight(abstract_txt:lingo in 1335) [ClassicSimilarity], result of:
            0.113193 = score(doc=1335,freq=1.0), product of:
              0.15728109 = queryWeight, product of:
                1.598749 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010679309 = queryNorm
              0.71968603 = fieldWeight in 1335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=1335)
          0.0391986 = weight(abstract_txt:deutschen in 1335) [ClassicSimilarity], result of:
            0.0391986 = score(doc=1335,freq=1.0), product of:
              0.09772083 = queryWeight, product of:
                1.7821761 = boost
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.010679309 = queryNorm
              0.40112838 = fieldWeight in 1335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.078125 = fieldNorm(doc=1335)
          0.5628604 = weight(abstract_txt:mehrwortgruppen in 1335) [ClassicSimilarity], result of:
            0.5628604 = score(doc=1335,freq=4.0), product of:
              0.363681 = queryWeight, product of:
                3.4380908 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010679309 = queryNorm
              1.5476762 = fieldWeight in 1335, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=1335)
          0.38154975 = weight(abstract_txt:extraktion in 1335) [ClassicSimilarity], result of:
            0.38154975 = score(doc=1335,freq=1.0), product of:
              0.56128603 = queryWeight, product of:
                6.0403786 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.010679309 = queryNorm
              0.67977774 = fieldWeight in 1335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=1335)
        0.2 = coord(5/25)
    
  3. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.19
    0.1921267 = sum of:
      0.1921267 = product of:
        1.200792 = sum of:
          0.08636725 = weight(abstract_txt:texten in 401) [ClassicSimilarity], result of:
            0.08636725 = score(doc=401,freq=1.0), product of:
              0.09600257 = queryWeight, product of:
                1.2490605 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.010679309 = queryNorm
              0.8996347 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.18110879 = weight(abstract_txt:lingo in 401) [ClassicSimilarity], result of:
            0.18110879 = score(doc=401,freq=1.0), product of:
              0.15728109 = queryWeight, product of:
                1.598749 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010679309 = queryNorm
              1.1514976 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.15339373 = weight(abstract_txt:automatische in 401) [ClassicSimilarity], result of:
            0.15339373 = score(doc=401,freq=1.0), product of:
              0.17739111 = queryWeight, product of:
                2.4011705 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.010679309 = queryNorm
              0.86472046 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.7799222 = weight(abstract_txt:mehrwortgruppen in 401) [ClassicSimilarity], result of:
            0.7799222 = score(doc=401,freq=3.0), product of:
              0.363681 = queryWeight, product of:
                3.4380908 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.010679309 = queryNorm
              2.144523 = fieldWeight in 401, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
        0.16 = coord(4/25)
    
  4. Witschel, H.F.: Text, Wörter, Morpheme : Möglichkeiten einer automatischen Terminologie-Extraktion (2004) 0.17
    0.16567932 = sum of:
      0.16567932 = product of:
        0.8283966 = sum of:
          0.038382143 = weight(abstract_txt:verfahren in 126) [ClassicSimilarity], result of:
            0.038382143 = score(doc=126,freq=3.0), product of:
              0.061534103 = queryWeight, product of:
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.010679309 = queryNorm
              0.623754 = fieldWeight in 126, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.0625 = fieldNorm(doc=126)
          0.024810815 = weight(abstract_txt:sprache in 126) [ClassicSimilarity], result of:
            0.024810815 = score(doc=126,freq=1.0), product of:
              0.066348456 = queryWeight, product of:
                1.0383828 = boost
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.010679309 = queryNorm
              0.37394714 = fieldWeight in 126, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9831543 = idf(docFreq=302, maxDocs=44218)
                0.0625 = fieldNorm(doc=126)
          0.06186628 = weight(abstract_txt:verfahrens in 126) [ClassicSimilarity], result of:
            0.06186628 = score(doc=126,freq=1.0), product of:
              0.122003645 = queryWeight, product of:
                1.4080837 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.010679309 = queryNorm
              0.5070855 = fieldWeight in 126, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.0625 = fieldNorm(doc=126)
          0.27166316 = weight(abstract_txt:fachterminologie in 126) [ClassicSimilarity], result of:
            0.27166316 = score(doc=126,freq=1.0), product of:
              0.4718432 = queryWeight, product of:
                4.796247 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.010679309 = queryNorm
              0.5757488 = fieldWeight in 126, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=126)
          0.43167424 = weight(abstract_txt:extraktion in 126) [ClassicSimilarity], result of:
            0.43167424 = score(doc=126,freq=2.0), product of:
              0.56128603 = queryWeight, product of:
                6.0403786 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.010679309 = queryNorm
              0.7690807 = fieldWeight in 126, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=126)
        0.2 = coord(5/25)
    
  5. Witschel, H.F.: Terminologie-Extraktion : Möglichkeiten der Kombination statistischer uns musterbasierter Verfahren (2004) 0.14
    0.14110637 = sum of:
      0.14110637 = product of:
        0.70553184 = sum of:
          0.022159941 = weight(abstract_txt:verfahren in 123) [ClassicSimilarity], result of:
            0.022159941 = score(doc=123,freq=1.0), product of:
              0.061534103 = queryWeight, product of:
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.010679309 = queryNorm
              0.36012456 = fieldWeight in 123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.034800626 = weight(abstract_txt:liegt in 123) [ClassicSimilarity], result of:
            0.034800626 = score(doc=123,freq=2.0), product of:
              0.065985955 = queryWeight, product of:
                1.0355422 = boost
                5.9667873 = idf(docFreq=307, maxDocs=44218)
                0.010679309 = queryNorm
              0.5273945 = fieldWeight in 123, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9667873 = idf(docFreq=307, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.043183625 = weight(abstract_txt:texten in 123) [ClassicSimilarity], result of:
            0.043183625 = score(doc=123,freq=1.0), product of:
              0.09600257 = queryWeight, product of:
                1.2490605 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.010679309 = queryNorm
              0.44981736 = fieldWeight in 123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.076696865 = weight(abstract_txt:automatische in 123) [ClassicSimilarity], result of:
            0.076696865 = score(doc=123,freq=1.0), product of:
              0.17739111 = queryWeight, product of:
                2.4011705 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.010679309 = queryNorm
              0.43236023 = fieldWeight in 123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.5286908 = weight(abstract_txt:extraktion in 123) [ClassicSimilarity], result of:
            0.5286908 = score(doc=123,freq=3.0), product of:
              0.56128603 = queryWeight, product of:
                6.0403786 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.010679309 = queryNorm
              0.9419276 = fieldWeight in 123, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
        0.2 = coord(5/25)