Document (#31990)

Author
Fichtner, K.
Title
Boyer-Moore Suchalgorithmus
Imprint
Münster : Institut für Wirtschaftsinformatik der Westfälische Wilhelms-Universität Münster
Year
2005
Pages
26 S
Abstract
Die Masse der Suchalgorithmen lässt sich in zwei grundlegend verschiedene Teilbereiche untergliedern. Auf der einen Seite stehen Algorithmen, die auf komplexen Datenstrukturen (häufig baumartig) ganze Datensätze unter Verwendung eines Indizes finden. Als geläufiger Vertreter sei hier die binäre Suche auf sortierten Arrays oder in binären Bäumen genannt. Die andere Gruppe, der sich diese Ausarbeitung widmet, dient dazu, Entsprechungen von Mustern in gegebenen Zeichenketten zu finden. Auf den folgenden Seiten werden nun zunächst einige Begriffe eingeführt, die für das weitere Verständnis und einen Vergleich verschiedener Suchalgorithmen nötig sind. Weiterhin wird ein naiver Suchalgorithmus dargestellt und mit der Idee von Boyer und Moore verglichen. Hierzu wird ihr Algorithmus zunächst informal beschrieben, dann mit Blick auf eine Implementation näher erläutert und anschließend einer Effizienzanalyse - sowohl empirisch als auch theoretisch - unterzogen. Abschließend findet eine kurze Bewertung mit Bezug auf Schwachstellen, Vorzüge und Verbesserungsmöglichkeiten statt, im Zuge derer einige prominente Modifikationen des Boyer-Moore Algorithmus vorgestellt werden.
Content
Ausarbeitung im Rahmen des Seminars Suchmaschinen und Suchalgorithmen, Institut für Wirtschaftsinformatik Praktische Informatik in der Wirtschaft, Westfälische Wilhelms-Universität Münster. - Vgl.: http://www-wi.uni-muenster.de/pi/lehre/ss05/seminarSuchen/Ausarbeitungen/KristoferFichtner.pdf
Theme
Retrievalalgorithmen
Object
Boyer-Moore Suchalgorithmus

Similar documents (content)

  1. Ackermann, J.: Knuth-Morris-Pratt (2005) 0.38
    0.3841519 = sum of:
      0.3841519 = product of:
        1.2004747 = sum of:
          0.020348463 = weight(abstract_txt:einen in 865) [ClassicSimilarity], result of:
            0.020348463 = score(doc=865,freq=2.0), product of:
              0.061680764 = queryWeight, product of:
                1.0441401 = boost
                4.2655873 = idf(docFreq=1687, maxDocs=44218)
                0.013848799 = queryNorm
              0.32989967 = fieldWeight in 865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2655873 = idf(docFreq=1687, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
          0.06888468 = weight(abstract_txt:mustern in 865) [ClassicSimilarity], result of:
            0.06888468 = score(doc=865,freq=1.0), product of:
              0.13906263 = queryWeight, product of:
                1.1085981 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.013848799 = queryNorm
              0.49535006 = fieldWeight in 865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
          0.12739295 = weight(abstract_txt:zeichenketten in 865) [ClassicSimilarity], result of:
            0.12739295 = score(doc=865,freq=2.0), product of:
              0.16629617 = queryWeight, product of:
                1.2122998 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.013848799 = queryNorm
              0.7660606 = fieldWeight in 865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
          0.047038812 = weight(abstract_txt:finden in 865) [ClassicSimilarity], result of:
            0.047038812 = score(doc=865,freq=2.0), product of:
              0.107836604 = queryWeight, product of:
                1.3805972 = boost
                5.6401033 = idf(docFreq=426, maxDocs=44218)
                0.013848799 = queryNorm
              0.4362045 = fieldWeight in 865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6401033 = idf(docFreq=426, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
          0.25428176 = weight(abstract_txt:algorithmus in 865) [ClassicSimilarity], result of:
            0.25428176 = score(doc=865,freq=7.0), product of:
              0.21876748 = queryWeight, product of:
                1.9664155 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.013848799 = queryNorm
              1.162338 = fieldWeight in 865, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
          0.16491473 = weight(abstract_txt:suchalgorithmen in 865) [ClassicSimilarity], result of:
            0.16491473 = score(doc=865,freq=1.0), product of:
              0.31355348 = queryWeight, product of:
                2.3541803 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.013848799 = queryNorm
              0.52595407 = fieldWeight in 865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
          0.24737206 = weight(abstract_txt:moore in 865) [ClassicSimilarity], result of:
            0.24737206 = score(doc=865,freq=1.0), product of:
              0.47033015 = queryWeight, product of:
                3.5312703 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.013848799 = queryNorm
              0.52595407 = fieldWeight in 865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
          0.27024123 = weight(abstract_txt:boyer in 865) [ClassicSimilarity], result of:
            0.27024123 = score(doc=865,freq=1.0), product of:
              0.4988885 = queryWeight, product of:
                3.6368992 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.013848799 = queryNorm
              0.54168665 = fieldWeight in 865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0546875 = fieldNorm(doc=865)
        0.32 = coord(8/25)
    
  2. Uratani, N.; Takeda, M.: ¬A fast string-searching algorithm for multiple patterns (1993) 0.07
    0.070986964 = sum of:
      0.070986964 = product of:
        0.8873371 = sum of:
          0.42406636 = weight(abstract_txt:moore in 6275) [ClassicSimilarity], result of:
            0.42406636 = score(doc=6275,freq=1.0), product of:
              0.47033015 = queryWeight, product of:
                3.5312703 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.013848799 = queryNorm
              0.9016355 = fieldWeight in 6275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.09375 = fieldNorm(doc=6275)
          0.4632707 = weight(abstract_txt:boyer in 6275) [ClassicSimilarity], result of:
            0.4632707 = score(doc=6275,freq=1.0), product of:
              0.4988885 = queryWeight, product of:
                3.6368992 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.013848799 = queryNorm
              0.9286057 = fieldWeight in 6275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.09375 = fieldNorm(doc=6275)
        0.08 = coord(2/25)
    
  3. Korves, J.: Seiten bewerten : Googles PageRank (2005) 0.05
    0.053435855 = sum of:
      0.053435855 = product of:
        0.3339741 = sum of:
          0.05647102 = weight(abstract_txt:ausarbeitung in 866) [ClassicSimilarity], result of:
            0.05647102 = score(doc=866,freq=1.0), product of:
              0.13499269 = queryWeight, product of:
                1.092255 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.013848799 = queryNorm
              0.4183265 = fieldWeight in 866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.046875 = fieldNorm(doc=866)
          0.030781446 = weight(abstract_txt:einige in 866) [ClassicSimilarity], result of:
            0.030781446 = score(doc=866,freq=1.0), product of:
              0.11349129 = queryWeight, product of:
                1.4163324 = boost
                5.7860904 = idf(docFreq=368, maxDocs=44218)
                0.013848799 = queryNorm
              0.27122298 = fieldWeight in 866, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7860904 = idf(docFreq=368, maxDocs=44218)
                0.046875 = fieldNorm(doc=866)
          0.044933777 = weight(abstract_txt:zunächst in 866) [ClassicSimilarity], result of:
            0.044933777 = score(doc=866,freq=2.0), product of:
              0.115915574 = queryWeight, product of:
                1.4313796 = boost
                5.8475623 = idf(docFreq=346, maxDocs=44218)
                0.013848799 = queryNorm
              0.38764226 = fieldWeight in 866, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8475623 = idf(docFreq=346, maxDocs=44218)
                0.046875 = fieldNorm(doc=866)
          0.20178784 = weight(abstract_txt:algorithmus in 866) [ClassicSimilarity], result of:
            0.20178784 = score(doc=866,freq=6.0), product of:
              0.21876748 = queryWeight, product of:
                1.9664155 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.013848799 = queryNorm
              0.92238504 = fieldWeight in 866, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.046875 = fieldNorm(doc=866)
        0.16 = coord(4/25)
    
  4. Westermeyer, D.: Adaptive Techniken zur Informationsgewinnung : der Webcrawler InfoSpiders (2005) 0.04
    0.03643675 = sum of:
      0.03643675 = product of:
        0.3036396 = sum of:
          0.07102819 = weight(abstract_txt:unterzogen in 4333) [ClassicSimilarity], result of:
            0.07102819 = score(doc=4333,freq=1.0), product of:
              0.12984379 = queryWeight, product of:
                1.0712221 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.013848799 = queryNorm
              0.547028 = fieldWeight in 4333, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.0625 = fieldNorm(doc=4333)
          0.04236397 = weight(abstract_txt:zunächst in 4333) [ClassicSimilarity], result of:
            0.04236397 = score(doc=4333,freq=1.0), product of:
              0.115915574 = queryWeight, product of:
                1.4313796 = boost
                5.8475623 = idf(docFreq=346, maxDocs=44218)
                0.013848799 = queryNorm
              0.36547264 = fieldWeight in 4333, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8475623 = idf(docFreq=346, maxDocs=44218)
                0.0625 = fieldNorm(doc=4333)
          0.19024742 = weight(abstract_txt:algorithmus in 4333) [ClassicSimilarity], result of:
            0.19024742 = score(doc=4333,freq=3.0), product of:
              0.21876748 = queryWeight, product of:
                1.9664155 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.013848799 = queryNorm
              0.86963296 = fieldWeight in 4333, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0625 = fieldNorm(doc=4333)
        0.12 = coord(3/25)
    
  5. Fischer, W.L.: Äquivalenz- und Toleranzstrukturen in der Linguistik : zur Theorie der Synonyma (1973) 0.03
    0.03341757 = sum of:
      0.03341757 = product of:
        0.16708785 = sum of:
          0.040931948 = weight(abstract_txt:gegebenen in 66) [ClassicSimilarity], result of:
            0.040931948 = score(doc=66,freq=1.0), product of:
              0.12300486 = queryWeight, product of:
                1.0426296 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.013848799 = queryNorm
              0.33276692 = fieldWeight in 66, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0390625 = fieldNorm(doc=66)
          0.014534617 = weight(abstract_txt:einen in 66) [ClassicSimilarity], result of:
            0.014534617 = score(doc=66,freq=2.0), product of:
              0.061680764 = queryWeight, product of:
                1.0441401 = boost
                4.2655873 = idf(docFreq=1687, maxDocs=44218)
                0.013848799 = queryNorm
              0.23564263 = fieldWeight in 66, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2655873 = idf(docFreq=1687, maxDocs=44218)
                0.0390625 = fieldNorm(doc=66)
          0.061385613 = weight(abstract_txt:entsprechungen in 66) [ClassicSimilarity], result of:
            0.061385613 = score(doc=66,freq=1.0), product of:
              0.16116042 = queryWeight, product of:
                1.1934332 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.013848799 = queryNorm
              0.38089755 = fieldWeight in 66, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0390625 = fieldNorm(doc=66)
          0.023758186 = weight(abstract_txt:finden in 66) [ClassicSimilarity], result of:
            0.023758186 = score(doc=66,freq=1.0), product of:
              0.107836604 = queryWeight, product of:
                1.3805972 = boost
                5.6401033 = idf(docFreq=426, maxDocs=44218)
                0.013848799 = queryNorm
              0.22031653 = fieldWeight in 66, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6401033 = idf(docFreq=426, maxDocs=44218)
                0.0390625 = fieldNorm(doc=66)
          0.026477482 = weight(abstract_txt:zunächst in 66) [ClassicSimilarity], result of:
            0.026477482 = score(doc=66,freq=1.0), product of:
              0.115915574 = queryWeight, product of:
                1.4313796 = boost
                5.8475623 = idf(docFreq=346, maxDocs=44218)
                0.013848799 = queryNorm
              0.2284204 = fieldWeight in 66, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8475623 = idf(docFreq=346, maxDocs=44218)
                0.0390625 = fieldNorm(doc=66)
        0.2 = coord(5/25)