Document (#38575)

Wisser, K.
¬The errors of our ways : using metadata quality research to understand common error patterns in the application of name headings
Metadata and semantics research: 8th Research Conference, MTSR 2014, Karlsruhe, Germany, November 27-29, 2014, Proceedings. Eds.: S. Closs et al
Cham : Springer
Communications in computer and information science; 478
Using data culled during a metadata quality research project for the Social Network and Archival Context (SNAC) project, this article discusses common errors and problems in the use of standardized languages, specifically unambiguous names for persons and corporate bodies. Errors such as misspelling, qualifiers, format, and miss-encoding point to several areas where quality control measures can improve aggregation of data. Results from a large data set indicate that there are predictable problems that can be retrospectively corrected before aggregation. This research looked specifically at name formation and expression in metadata records, but the errors detected could be extended to other controlled vocabularies as well.

Similar documents (content)

  1. Beall, J.; Kafadar, K.: ¬The effectiveness of copy cotaloging at eliminating typographical errors in shared bibliographic records (2004) 0.28
    0.2750996 = sum of:
      0.2750996 = product of:
        1.1462483 = sum of:
          0.08399072 = weight(abstract_txt:error in 4849) [ClassicSimilarity], result of:
            0.08399072 = score(doc=4849,freq=1.0), product of:
              0.13073693 = queryWeight, product of:
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.019078163 = queryNorm
              0.6424407 = fieldWeight in 4849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.09375 = fieldNorm(doc=4849)
          0.041430533 = weight(abstract_txt:problems in 4849) [ClassicSimilarity], result of:
            0.041430533 = score(doc=4849,freq=1.0), product of:
              0.10283346 = queryWeight, product of:
                1.254247 = boost
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.019078163 = queryNorm
              0.4028896 = fieldWeight in 4849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.09375 = fieldNorm(doc=4849)
          0.25704104 = weight(abstract_txt:corrected in 4849) [ClassicSimilarity], result of:
            0.25704104 = score(doc=4849,freq=2.0), product of:
              0.218727 = queryWeight, product of:
                1.2934572 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.019078163 = queryNorm
              1.1751683 = fieldWeight in 4849, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.09375 = fieldNorm(doc=4849)
          0.029079124 = weight(abstract_txt:data in 4849) [ClassicSimilarity], result of:
            0.029079124 = score(doc=4849,freq=1.0), product of:
              0.09296912 = queryWeight, product of:
                1.4605985 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019078163 = queryNorm
              0.31278262 = fieldWeight in 4849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=4849)
          0.07887252 = weight(abstract_txt:quality in 4849) [ClassicSimilarity], result of:
            0.07887252 = score(doc=4849,freq=1.0), product of:
              0.18081522 = queryWeight, product of:
                2.0369434 = boost
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.019078163 = queryNorm
              0.4362051 = fieldWeight in 4849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.09375 = fieldNorm(doc=4849)
          0.6558344 = weight(abstract_txt:errors in 4849) [ClassicSimilarity], result of:
            0.6558344 = score(doc=4849,freq=5.0), product of:
              0.4776798 = queryWeight, product of:
                3.8229558 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.019078163 = queryNorm
              1.3729582 = fieldWeight in 4849, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.09375 = fieldNorm(doc=4849)
        0.24 = coord(6/25)
  2. Pope, J.T.; Holley, R.P.: Google Book Search and metadata (2011) 0.16
    0.16334625 = sum of:
      0.16334625 = product of:
        0.68060935 = sum of:
          0.06999227 = weight(abstract_txt:error in 1887) [ClassicSimilarity], result of:
            0.06999227 = score(doc=1887,freq=1.0), product of:
              0.13073693 = queryWeight, product of:
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.019078163 = queryNorm
              0.5353672 = fieldWeight in 1887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.078125 = fieldNorm(doc=1887)
          0.034525443 = weight(abstract_txt:problems in 1887) [ClassicSimilarity], result of:
            0.034525443 = score(doc=1887,freq=1.0), product of:
              0.10283346 = queryWeight, product of:
                1.254247 = boost
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.019078163 = queryNorm
              0.33574134 = fieldWeight in 1887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.078125 = fieldNorm(doc=1887)
          0.03651116 = weight(abstract_txt:project in 1887) [ClassicSimilarity], result of:
            0.03651116 = score(doc=1887,freq=1.0), product of:
              0.10673954 = queryWeight, product of:
                1.277846 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.019078163 = queryNorm
              0.34205842 = fieldWeight in 1887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.078125 = fieldNorm(doc=1887)
          0.024232604 = weight(abstract_txt:data in 1887) [ClassicSimilarity], result of:
            0.024232604 = score(doc=1887,freq=1.0), product of:
              0.09296912 = queryWeight, product of:
                1.4605985 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019078163 = queryNorm
              0.26065218 = fieldWeight in 1887, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=1887)
          0.16969286 = weight(abstract_txt:metadata in 1887) [ClassicSimilarity], result of:
            0.16969286 = score(doc=1887,freq=5.0), product of:
              0.19900212 = queryWeight, product of:
                2.1369302 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019078163 = queryNorm
              0.85271883 = fieldWeight in 1887, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=1887)
          0.34565502 = weight(abstract_txt:errors in 1887) [ClassicSimilarity], result of:
            0.34565502 = score(doc=1887,freq=2.0), product of:
              0.4776798 = queryWeight, product of:
                3.8229558 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.019078163 = queryNorm
              0.7236124 = fieldWeight in 1887, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=1887)
        0.24 = coord(6/25)
  3. Lardy, J.P.; Herzhaft, L.: Bibliometric treatments according to bibliographic errors and data heterogenity : the end-user point of view (1992) 0.16
    0.16180216 = sum of:
      0.16180216 = product of:
        0.8090108 = sum of:
          0.081962995 = weight(abstract_txt:common in 5064) [ClassicSimilarity], result of:
            0.081962995 = score(doc=5064,freq=2.0), product of:
              0.12862419 = queryWeight, product of:
                1.40274 = boost
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.019078163 = queryNorm
              0.6372285 = fieldWeight in 5064, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.09375 = fieldNorm(doc=5064)
          0.04112409 = weight(abstract_txt:data in 5064) [ClassicSimilarity], result of:
            0.04112409 = score(doc=5064,freq=2.0), product of:
              0.09296912 = queryWeight, product of:
                1.4605985 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019078163 = queryNorm
              0.44234142 = fieldWeight in 5064, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=5064)
          0.099044085 = weight(abstract_txt:name in 5064) [ClassicSimilarity], result of:
            0.099044085 = score(doc=5064,freq=1.0), product of:
              0.18385401 = queryWeight, product of:
                1.6770747 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.019078163 = queryNorm
              0.5387105 = fieldWeight in 5064, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.09375 = fieldNorm(doc=5064)
          0.07887252 = weight(abstract_txt:quality in 5064) [ClassicSimilarity], result of:
            0.07887252 = score(doc=5064,freq=1.0), product of:
              0.18081522 = queryWeight, product of:
                2.0369434 = boost
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.019078163 = queryNorm
              0.4362051 = fieldWeight in 5064, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.09375 = fieldNorm(doc=5064)
          0.5080071 = weight(abstract_txt:errors in 5064) [ClassicSimilarity], result of:
            0.5080071 = score(doc=5064,freq=3.0), product of:
              0.4776798 = queryWeight, product of:
                3.8229558 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.019078163 = queryNorm
              1.0634888 = fieldWeight in 5064, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.09375 = fieldNorm(doc=5064)
        0.2 = coord(5/25)
  4. Tani, A.; Candela, L.; Castelli, D.: Dealing with metadata quality : the legacy of digital library efforts (2013) 0.15
    0.15290514 = sum of:
      0.15290514 = product of:
        0.63710475 = sum of:
          0.041430533 = weight(abstract_txt:problems in 2662) [ClassicSimilarity], result of:
            0.041430533 = score(doc=2662,freq=1.0), product of:
              0.10283346 = queryWeight, product of:
                1.254247 = boost
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.019078163 = queryNorm
              0.4028896 = fieldWeight in 2662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.297489 = idf(docFreq=1634, maxDocs=44218)
                0.09375 = fieldNorm(doc=2662)
          0.05795659 = weight(abstract_txt:common in 2662) [ClassicSimilarity], result of:
            0.05795659 = score(doc=2662,freq=1.0), product of:
              0.12862419 = queryWeight, product of:
                1.40274 = boost
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.019078163 = queryNorm
              0.45058858 = fieldWeight in 2662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.09375 = fieldNorm(doc=2662)
          0.04112409 = weight(abstract_txt:data in 2662) [ClassicSimilarity], result of:
            0.04112409 = score(doc=2662,freq=2.0), product of:
              0.09296912 = queryWeight, product of:
                1.4605985 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019078163 = queryNorm
              0.44234142 = fieldWeight in 2662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=2662)
          0.13661121 = weight(abstract_txt:quality in 2662) [ClassicSimilarity], result of:
            0.13661121 = score(doc=2662,freq=3.0), product of:
              0.18081522 = queryWeight, product of:
                2.0369434 = boost
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.019078163 = queryNorm
              0.7555294 = fieldWeight in 2662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.09375 = fieldNorm(doc=2662)
          0.20225006 = weight(abstract_txt:aggregation in 2662) [ClassicSimilarity], result of:
            0.20225006 = score(doc=2662,freq=1.0), product of:
              0.2959238 = queryWeight, product of:
                2.1276774 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.019078163 = queryNorm
              0.6834532 = fieldWeight in 2662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.09375 = fieldNorm(doc=2662)
          0.15773222 = weight(abstract_txt:metadata in 2662) [ClassicSimilarity], result of:
            0.15773222 = score(doc=2662,freq=3.0), product of:
              0.19900212 = queryWeight, product of:
                2.1369302 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019078163 = queryNorm
              0.7926158 = fieldWeight in 2662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=2662)
        0.24 = coord(6/25)
  5. Jarke, M.; Lenzerini, M.; Vassiliou, Y.: Fundamentals of data warehousing (1999) 0.14
    0.13589592 = sum of:
      0.13589592 = product of:
        0.56623304 = sum of:
          0.021680944 = weight(abstract_txt:using in 1302) [ClassicSimilarity], result of:
            0.021680944 = score(doc=1302,freq=1.0), product of:
              0.06677894 = queryWeight, product of:
                1.0107311 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.019078163 = queryNorm
              0.32466736 = fieldWeight in 1302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.09375 = fieldNorm(doc=1302)
          0.043813393 = weight(abstract_txt:project in 1302) [ClassicSimilarity], result of:
            0.043813393 = score(doc=1302,freq=1.0), product of:
              0.10673954 = queryWeight, product of:
                1.277846 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.019078163 = queryNorm
              0.41047013 = fieldWeight in 1302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.09375 = fieldNorm(doc=1302)
          0.05815825 = weight(abstract_txt:data in 1302) [ClassicSimilarity], result of:
            0.05815825 = score(doc=1302,freq=4.0), product of:
              0.09296912 = queryWeight, product of:
                1.4605985 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.019078163 = queryNorm
              0.62556523 = fieldWeight in 1302, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=1302)
          0.11154258 = weight(abstract_txt:quality in 1302) [ClassicSimilarity], result of:
            0.11154258 = score(doc=1302,freq=2.0), product of:
              0.18081522 = queryWeight, product of:
                2.0369434 = boost
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.019078163 = queryNorm
              0.61688715 = fieldWeight in 1302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6528544 = idf(docFreq=1145, maxDocs=44218)
                0.09375 = fieldNorm(doc=1302)
          0.20225006 = weight(abstract_txt:aggregation in 1302) [ClassicSimilarity], result of:
            0.20225006 = score(doc=1302,freq=1.0), product of:
              0.2959238 = queryWeight, product of:
                2.1276774 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.019078163 = queryNorm
              0.6834532 = fieldWeight in 1302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.09375 = fieldNorm(doc=1302)
          0.12878782 = weight(abstract_txt:metadata in 1302) [ClassicSimilarity], result of:
            0.12878782 = score(doc=1302,freq=2.0), product of:
              0.19900212 = queryWeight, product of:
                2.1369302 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019078163 = queryNorm
              0.64716804 = fieldWeight in 1302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=1302)
        0.24 = coord(6/25)