Document (#32912)

Author
Nottelmann, H.
Straccia, U.
Title
Information retrieval and machine learning for probabilistic schema matching
Source
Information processing and management. 43(2007) no.3, S.552-576
Year
2007
Abstract
Schema matching is the problem of finding correspondences (mapping rules, e.g. logical formulae) between heterogeneous schemas e.g. in the data exchange domain, or for distributed IR in federated digital libraries. This paper introduces a probabilistic framework, called sPLMap, for automatically learning schema mapping rules, based on given instances of both schemas. Different techniques, mostly from the IR and machine learning fields, are combined for finding suitable mapping candidates. Our approach gives a probabilistic interpretation of the prediction weights of the candidates, selects the rule set with highest matching probability, and outputs probabilistic rules which are capable to deal with the intrinsic uncertainty of the mapping process. Our approach with different variants has been evaluated on several test sets.
Footnote
Beitrag in: Special issue on Heterogeneous and Distributed IR

Similar documents (content)

  1. Mao, M.: Ontology mapping : towards semantic interoperability in distributed and heterogeneous environments (2008) 0.41
    0.41272894 = sum of:
      0.41272894 = product of:
        1.0318223 = sum of:
          0.026232358 = weight(abstract_txt:different in 4659) [ClassicSimilarity], result of:
            0.026232358 = score(doc=4659,freq=4.0), product of:
              0.06543134 = queryWeight, product of:
                1.064825 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.016763864 = queryNorm
              0.40091425 = fieldWeight in 4659, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.024234677 = weight(abstract_txt:approach in 4659) [ClassicSimilarity], result of:
            0.024234677 = score(doc=4659,freq=3.0), product of:
              0.068312265 = queryWeight, product of:
                1.0880145 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.016763864 = queryNorm
              0.35476318 = fieldWeight in 4659, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.12626432 = weight(abstract_txt:correspondences in 4659) [ClassicSimilarity], result of:
            0.12626432 = score(doc=4659,freq=2.0), product of:
              0.18652973 = queryWeight, product of:
                1.2712892 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.016763864 = queryNorm
              0.6769126 = fieldWeight in 4659, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.055393543 = weight(abstract_txt:machine in 4659) [ClassicSimilarity], result of:
            0.055393543 = score(doc=4659,freq=2.0), product of:
              0.1356887 = queryWeight, product of:
                1.5334061 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.016763864 = queryNorm
              0.4082399 = fieldWeight in 4659, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.06172179 = weight(abstract_txt:finding in 4659) [ClassicSimilarity], result of:
            0.06172179 = score(doc=4659,freq=2.0), product of:
              0.1458355 = queryWeight, product of:
                1.5897067 = boost
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.016763864 = queryNorm
              0.42322883 = fieldWeight in 4659, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.0856739 = weight(abstract_txt:learning in 4659) [ClassicSimilarity], result of:
            0.0856739 = score(doc=4659,freq=4.0), product of:
              0.16487585 = queryWeight, product of:
                2.0701866 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016763864 = queryNorm
              0.51962674 = fieldWeight in 4659, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.057377513 = weight(abstract_txt:rules in 4659) [ClassicSimilarity], result of:
            0.057377513 = score(doc=4659,freq=1.0), product of:
              0.20034224 = queryWeight, product of:
                2.2820084 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.016763864 = queryNorm
              0.2863975 = fieldWeight in 4659, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.088372044 = weight(abstract_txt:matching in 4659) [ClassicSimilarity], result of:
            0.088372044 = score(doc=4659,freq=1.0), product of:
              0.26719067 = queryWeight, product of:
                2.63537 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.016763864 = queryNorm
              0.33074525 = fieldWeight in 4659, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.34077242 = weight(abstract_txt:mapping in 4659) [ClassicSimilarity], result of:
            0.34077242 = score(doc=4659,freq=11.0), product of:
              0.32516438 = queryWeight, product of:
                3.3570037 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.016763864 = queryNorm
              1.0480005 = fieldWeight in 4659, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
          0.16577975 = weight(abstract_txt:probabilistic in 4659) [ClassicSimilarity], result of:
            0.16577975 = score(doc=4659,freq=1.0), product of:
              0.4473134 = queryWeight, product of:
                3.9373705 = boost
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.016763864 = queryNorm
              0.37061208 = fieldWeight in 4659, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4659)
        0.4 = coord(10/25)
    
  2. Fuhr, N.: Probabilistic datalog : implementing logical information retrieval for advanced applications (2000) 0.19
    0.18510535 = sum of:
      0.18510535 = product of:
        0.7712723 = sum of:
          0.09932407 = weight(abstract_txt:probability in 4380) [ClassicSimilarity], result of:
            0.09932407 = score(doc=4380,freq=1.0), product of:
              0.11541422 = queryWeight, product of:
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.016763864 = queryNorm
              0.86058784 = fieldWeight in 4380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.125 = fieldNorm(doc=4380)
          0.11563031 = weight(abstract_txt:weights in 4380) [ClassicSimilarity], result of:
            0.11563031 = score(doc=4380,freq=1.0), product of:
              0.12772349 = queryWeight, product of:
                1.0519757 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.016763864 = queryNorm
              0.9053175 = fieldWeight in 4380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.125 = fieldNorm(doc=4380)
          0.03198148 = weight(abstract_txt:approach in 4380) [ClassicSimilarity], result of:
            0.03198148 = score(doc=4380,freq=1.0), product of:
              0.068312265 = queryWeight, product of:
                1.0880145 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.016763864 = queryNorm
              0.468166 = fieldWeight in 4380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.125 = fieldNorm(doc=4380)
          0.0142626865 = weight(abstract_txt:with in 4380) [ClassicSimilarity], result of:
            0.0142626865 = score(doc=4380,freq=1.0), product of:
              0.045645464 = queryWeight, product of:
                1.0892556 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.016763864 = queryNorm
              0.31246668 = fieldWeight in 4380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.125 = fieldNorm(doc=4380)
          0.1311486 = weight(abstract_txt:rules in 4380) [ClassicSimilarity], result of:
            0.1311486 = score(doc=4380,freq=1.0), product of:
              0.20034224 = queryWeight, product of:
                2.2820084 = boost
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.016763864 = queryNorm
              0.65462285 = fieldWeight in 4380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.236983 = idf(docFreq=638, maxDocs=44218)
                0.125 = fieldNorm(doc=4380)
          0.37892514 = weight(abstract_txt:probabilistic in 4380) [ClassicSimilarity], result of:
            0.37892514 = score(doc=4380,freq=1.0), product of:
              0.4473134 = queryWeight, product of:
                3.9373705 = boost
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.016763864 = queryNorm
              0.8471133 = fieldWeight in 4380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.125 = fieldNorm(doc=4380)
        0.24 = coord(6/25)
    
  3. Euzenat, J.; Shvaiko, P.: Ontology matching (2010) 0.17
    0.17453405 = sum of:
      0.17453405 = product of:
        0.72722524 = sum of:
          0.032127947 = weight(abstract_txt:different in 168) [ClassicSimilarity], result of:
            0.032127947 = score(doc=168,freq=6.0), product of:
              0.06543134 = queryWeight, product of:
                1.064825 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.016763864 = queryNorm
              0.4910177 = fieldWeight in 168, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0546875 = fieldNorm(doc=168)
          0.0062399255 = weight(abstract_txt:with in 168) [ClassicSimilarity], result of:
            0.0062399255 = score(doc=168,freq=1.0), product of:
              0.045645464 = queryWeight, product of:
                1.0892556 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.016763864 = queryNorm
              0.13670418 = fieldWeight in 168, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0546875 = fieldNorm(doc=168)
          0.12626432 = weight(abstract_txt:correspondences in 168) [ClassicSimilarity], result of:
            0.12626432 = score(doc=168,freq=2.0), product of:
              0.18652973 = queryWeight, product of:
                1.2712892 = boost
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.016763864 = queryNorm
              0.6769126 = fieldWeight in 168, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.0546875 = fieldNorm(doc=168)
          0.043643896 = weight(abstract_txt:finding in 168) [ClassicSimilarity], result of:
            0.043643896 = score(doc=168,freq=1.0), product of:
              0.1458355 = queryWeight, product of:
                1.5897067 = boost
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.016763864 = queryNorm
              0.29926798 = fieldWeight in 168, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4723287 = idf(docFreq=504, maxDocs=44218)
                0.0546875 = fieldNorm(doc=168)
          0.3061297 = weight(abstract_txt:matching in 168) [ClassicSimilarity], result of:
            0.3061297 = score(doc=168,freq=12.0), product of:
              0.26719067 = queryWeight, product of:
                2.63537 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.016763864 = queryNorm
              1.145735 = fieldWeight in 168, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0546875 = fieldNorm(doc=168)
          0.21281947 = weight(abstract_txt:schema in 168) [ClassicSimilarity], result of:
            0.21281947 = score(doc=168,freq=4.0), product of:
              0.30241194 = queryWeight, product of:
                2.8036928 = boost
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.016763864 = queryNorm
              0.7037403 = fieldWeight in 168, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.0546875 = fieldNorm(doc=168)
        0.24 = coord(6/25)
    
  4. Chan, L.M.; Zeng, M.L.: Metadata interoperability and standardization - a study of methodology, part II : achieving interoperability at the record and repository levels (2006) 0.14
    0.14297299 = sum of:
      0.14297299 = product of:
        0.5957208 = sum of:
          0.02271789 = weight(abstract_txt:different in 1177) [ClassicSimilarity], result of:
            0.02271789 = score(doc=1177,freq=3.0), product of:
              0.06543134 = queryWeight, product of:
                1.064825 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.016763864 = queryNorm
              0.34720194 = fieldWeight in 1177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1177)
          0.010807867 = weight(abstract_txt:with in 1177) [ClassicSimilarity], result of:
            0.010807867 = score(doc=1177,freq=3.0), product of:
              0.045645464 = queryWeight, product of:
                1.0892556 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.016763864 = queryNorm
              0.23677856 = fieldWeight in 1177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1177)
          0.08002351 = weight(abstract_txt:federated in 1177) [ClassicSimilarity], result of:
            0.08002351 = score(doc=1177,freq=1.0), product of:
              0.1734001 = queryWeight, product of:
                1.2257305 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.016763864 = queryNorm
              0.46149635 = fieldWeight in 1177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1177)
          0.18637963 = weight(abstract_txt:schemas in 1177) [ClassicSimilarity], result of:
            0.18637963 = score(doc=1177,freq=3.0), product of:
              0.26615795 = queryWeight, product of:
                2.1476083 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.016763864 = queryNorm
              0.70025945 = fieldWeight in 1177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1177)
          0.15048608 = weight(abstract_txt:schema in 1177) [ClassicSimilarity], result of:
            0.15048608 = score(doc=1177,freq=2.0), product of:
              0.30241194 = queryWeight, product of:
                2.8036928 = boost
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.016763864 = queryNorm
              0.4976195 = fieldWeight in 1177, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1177)
          0.14530583 = weight(abstract_txt:mapping in 1177) [ClassicSimilarity], result of:
            0.14530583 = score(doc=1177,freq=2.0), product of:
              0.32516438 = queryWeight, product of:
                3.3570037 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.016763864 = queryNorm
              0.44686884 = fieldWeight in 1177, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1177)
        0.24 = coord(6/25)
    
  5. Euzenat, J.; Bach, T.Le; Barrasa, J.; Bouquet, P.; Bo, J.De; Dieng, R.; Ehrig, M.; Hauswirth, M.; Jarrar, M.; Lara, R.; Maynard, D.; Napoli, A.; Stamou, G.; Stuckenschmidt, H.; Shvaiko, P.; Tessaris, S.; Acker, S. Van; Zaihrayeu, I.: State of the art on ontology alignment (2004) 0.12
    0.11853657 = sum of:
      0.11853657 = product of:
        0.42334488 = sum of:
          0.04651966 = weight(abstract_txt:instances in 172) [ClassicSimilarity], result of:
            0.04651966 = score(doc=172,freq=1.0), product of:
              0.12078007 = queryWeight, product of:
                1.022982 = boost
                7.042927 = idf(docFreq=104, maxDocs=44218)
                0.016763864 = queryNorm
              0.38516006 = fieldWeight in 172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.042927 = idf(docFreq=104, maxDocs=44218)
                0.0546875 = fieldNorm(doc=172)
          0.013116179 = weight(abstract_txt:different in 172) [ClassicSimilarity], result of:
            0.013116179 = score(doc=172,freq=1.0), product of:
              0.06543134 = queryWeight, product of:
                1.064825 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.016763864 = queryNorm
              0.20045713 = fieldWeight in 172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.0546875 = fieldNorm(doc=172)
          0.0062399255 = weight(abstract_txt:with in 172) [ClassicSimilarity], result of:
            0.0062399255 = score(doc=172,freq=1.0), product of:
              0.045645464 = queryWeight, product of:
                1.0892556 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.016763864 = queryNorm
              0.13670418 = fieldWeight in 172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0546875 = fieldNorm(doc=172)
          0.03916915 = weight(abstract_txt:machine in 172) [ClassicSimilarity], result of:
            0.03916915 = score(doc=172,freq=1.0), product of:
              0.1356887 = queryWeight, product of:
                1.5334061 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.016763864 = queryNorm
              0.2886692 = fieldWeight in 172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0546875 = fieldNorm(doc=172)
          0.04283695 = weight(abstract_txt:learning in 172) [ClassicSimilarity], result of:
            0.04283695 = score(doc=172,freq=1.0), product of:
              0.16487585 = queryWeight, product of:
                2.0701866 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016763864 = queryNorm
              0.25981337 = fieldWeight in 172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0546875 = fieldNorm(doc=172)
          0.12497694 = weight(abstract_txt:matching in 172) [ClassicSimilarity], result of:
            0.12497694 = score(doc=172,freq=2.0), product of:
              0.26719067 = queryWeight, product of:
                2.63537 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.016763864 = queryNorm
              0.4677444 = fieldWeight in 172, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0546875 = fieldNorm(doc=172)
          0.15048608 = weight(abstract_txt:schema in 172) [ClassicSimilarity], result of:
            0.15048608 = score(doc=172,freq=2.0), product of:
              0.30241194 = queryWeight, product of:
                2.8036928 = boost
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.016763864 = queryNorm
              0.4976195 = fieldWeight in 172, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.0546875 = fieldNorm(doc=172)
        0.28 = coord(7/25)