Document (#32916)

Author
Foulonneau, M.
Title
Information redundancy across metadata collections
Source
Information processing and management. 43(2007) no.3, S.740-751
Year
2007
Abstract
Metadata records made available by content providers often lack the implicit information of their original use environment. Metadata aggregators therefore tend to emphasize completeness as a primary quality for shareable metadata. However, when adding implicit information to item-level records, data providers increase the redundancy of information contained in records from the same collection. The present paper reports on an effort to assess the extent and potential impact of information redundancy in metadata collections aggregated using the Open Archives Protocol for Metadata Harvesting. The first experiment quantifies the resemblance of metadata records on a collection-by-collection basis across 176 metadata collections aggregated for the CIC metadata portal. A second experiment measures the tendency of items from the same collection to appear together in results lists generated for a set of user queries. Results of the analyses correlate and suggest that within some collections item-level metadata records are not sufficiently differentiated to support certain digital library functions well. Metadata collections have a distinct role when included in larger aggregations, and in that role a minimum level of descriptive granularity is required to support digital library functions implemented by service providers. The experiments suggest possible ways to deal simultaneously with metadata record completeness, consistency, and redundancy.
Footnote
Beitrag in: Special issue on Heterogeneous and Distributed IR
Theme
Metadaten

Similar documents (content)

  1. Renear, A.H.; Wickett, K.M.; Urban, R.J.; Dubin, D.; Shreeves, S.L.: Collection/item metadata relationships (2008) 0.35
    0.34791395 = sum of:
      0.34791395 = product of:
        1.0872312 = sum of:
          0.024335448 = weight(abstract_txt:support in 2623) [ClassicSimilarity], result of:
            0.024335448 = score(doc=2623,freq=1.0), product of:
              0.07114413 = queryWeight, product of:
                1.0104718 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.016080687 = queryNorm
              0.34205842 = fieldWeight in 2623, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
          0.03839342 = weight(abstract_txt:across in 2623) [ClassicSimilarity], result of:
            0.03839342 = score(doc=2623,freq=1.0), product of:
              0.09641635 = queryWeight, product of:
                1.1763322 = boost
                5.097017 = idf(docFreq=734, maxDocs=44218)
                0.016080687 = queryNorm
              0.39820445 = fieldWeight in 2623, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.097017 = idf(docFreq=734, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
          0.014545178 = weight(abstract_txt:information in 2623) [ClassicSimilarity], result of:
            0.014545178 = score(doc=2623,freq=2.0), product of:
              0.054378707 = queryWeight, product of:
                1.3968157 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.016080687 = queryNorm
              0.2674793 = fieldWeight in 2623, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
          0.13155821 = weight(abstract_txt:item in 2623) [ClassicSimilarity], result of:
            0.13155821 = score(doc=2623,freq=3.0), product of:
              0.15194428 = queryWeight, product of:
                1.4767156 = boost
                6.39857 = idf(docFreq=199, maxDocs=44218)
                0.016080687 = queryNorm
              0.8658319 = fieldWeight in 2623, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.39857 = idf(docFreq=199, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
          0.08849738 = weight(abstract_txt:level in 2623) [ClassicSimilarity], result of:
            0.08849738 = score(doc=2623,freq=5.0), product of:
              0.11262639 = queryWeight, product of:
                1.5571139 = boost
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.016080687 = queryNorm
              0.7857606 = fieldWeight in 2623, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
          0.14267665 = weight(abstract_txt:collection in 2623) [ClassicSimilarity], result of:
            0.14267665 = score(doc=2623,freq=6.0), product of:
              0.1603889 = queryWeight, product of:
                2.14564 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.016080687 = queryNorm
              0.88956684 = fieldWeight in 2623, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
          0.074959785 = weight(abstract_txt:collections in 2623) [ClassicSimilarity], result of:
            0.074959785 = score(doc=2623,freq=1.0), product of:
              0.2044145 = queryWeight, product of:
                2.7081985 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.016080687 = queryNorm
              0.36670482 = fieldWeight in 2623, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
          0.572265 = weight(abstract_txt:metadata in 2623) [ClassicSimilarity], result of:
            0.572265 = score(doc=2623,freq=8.0), product of:
              0.5305562 = queryWeight, product of:
                6.7592106 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016080687 = queryNorm
              1.0786134 = fieldWeight in 2623, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=2623)
        0.32 = coord(8/25)
    
  2. Stvilia, B.; Gasser, L.: Value-based metadata quality assessment (2008) 0.27
    0.26970378 = sum of:
      0.26970378 = product of:
        1.1237657 = sum of:
          0.037083484 = weight(abstract_txt:same in 252) [ClassicSimilarity], result of:
            0.037083484 = score(doc=252,freq=1.0), product of:
              0.0834281 = queryWeight, product of:
                1.0942357 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.016080687 = queryNorm
              0.44449633 = fieldWeight in 252, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.09375 = fieldNorm(doc=252)
          0.13906316 = weight(abstract_txt:aggregated in 252) [ClassicSimilarity], result of:
            0.13906316 = score(doc=252,freq=1.0), product of:
              0.20137246 = queryWeight, product of:
                1.7000226 = boost
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.016080687 = queryNorm
              0.6905769 = fieldWeight in 252, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.09375 = fieldNorm(doc=252)
          0.13979399 = weight(abstract_txt:collection in 252) [ClassicSimilarity], result of:
            0.13979399 = score(doc=252,freq=4.0), product of:
              0.1603889 = queryWeight, product of:
                2.14564 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.016080687 = queryNorm
              0.87159395 = fieldWeight in 252, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.09375 = fieldNorm(doc=252)
          0.13801841 = weight(abstract_txt:providers in 252) [ClassicSimilarity], result of:
            0.13801841 = score(doc=252,freq=1.0), product of:
              0.22935794 = queryWeight, product of:
                2.222067 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.016080687 = queryNorm
              0.6017599 = fieldWeight in 252, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.09375 = fieldNorm(doc=252)
          0.07509143 = weight(abstract_txt:records in 252) [ClassicSimilarity], result of:
            0.07509143 = score(doc=252,freq=1.0), product of:
              0.1812309 = queryWeight, product of:
                2.5500035 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.016080687 = queryNorm
              0.4143412 = fieldWeight in 252, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.09375 = fieldNorm(doc=252)
          0.59471524 = weight(abstract_txt:metadata in 252) [ClassicSimilarity], result of:
            0.59471524 = score(doc=252,freq=6.0), product of:
              0.5305562 = queryWeight, product of:
                6.7592106 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016080687 = queryNorm
              1.1209279 = fieldWeight in 252, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=252)
        0.24 = coord(6/25)
    
  3. Park, J.-r.: Semantic interoperability and metadata quality : an analysis of metadata item records of digital image collections (2006) 0.25
    0.25139618 = sum of:
      0.25139618 = product of:
        0.89784354 = sum of:
          0.03268266 = weight(abstract_txt:digital in 172) [ClassicSimilarity], result of:
            0.03268266 = score(doc=172,freq=3.0), product of:
              0.0696772 = queryWeight, product of:
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.016080687 = queryNorm
              0.4690582 = fieldWeight in 172, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.0625 = fieldNorm(doc=172)
          0.04343719 = weight(abstract_txt:across in 172) [ClassicSimilarity], result of:
            0.04343719 = score(doc=172,freq=2.0), product of:
              0.09641635 = queryWeight, product of:
                1.1763322 = boost
                5.097017 = idf(docFreq=734, maxDocs=44218)
                0.016080687 = queryNorm
              0.45051688 = fieldWeight in 172, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.097017 = idf(docFreq=734, maxDocs=44218)
                0.0625 = fieldNorm(doc=172)
          0.04632836 = weight(abstract_txt:suggest in 172) [ClassicSimilarity], result of:
            0.04632836 = score(doc=172,freq=2.0), product of:
              0.10064853 = queryWeight, product of:
                1.2018725 = boost
                5.207682 = idf(docFreq=657, maxDocs=44218)
                0.016080687 = queryNorm
              0.46029842 = fieldWeight in 172, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.207682 = idf(docFreq=657, maxDocs=44218)
                0.0625 = fieldNorm(doc=172)
          0.060764134 = weight(abstract_txt:item in 172) [ClassicSimilarity], result of:
            0.060764134 = score(doc=172,freq=1.0), product of:
              0.15194428 = queryWeight, product of:
                1.4767156 = boost
                6.39857 = idf(docFreq=199, maxDocs=44218)
                0.016080687 = queryNorm
              0.39991063 = fieldWeight in 172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.39857 = idf(docFreq=199, maxDocs=44218)
                0.0625 = fieldNorm(doc=172)
          0.050060954 = weight(abstract_txt:records in 172) [ClassicSimilarity], result of:
            0.050060954 = score(doc=172,freq=1.0), product of:
              0.1812309 = queryWeight, product of:
                2.5500035 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.016080687 = queryNorm
              0.27622747 = fieldWeight in 172, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.0625 = fieldNorm(doc=172)
          0.10386733 = weight(abstract_txt:collections in 172) [ClassicSimilarity], result of:
            0.10386733 = score(doc=172,freq=3.0), product of:
              0.2044145 = queryWeight, product of:
                2.7081985 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.016080687 = queryNorm
              0.50812113 = fieldWeight in 172, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.0625 = fieldNorm(doc=172)
          0.5607029 = weight(abstract_txt:metadata in 172) [ClassicSimilarity], result of:
            0.5607029 = score(doc=172,freq=12.0), product of:
              0.5305562 = queryWeight, product of:
                6.7592106 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016080687 = queryNorm
              1.056821 = fieldWeight in 172, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=172)
        0.28 = coord(7/25)
    
  4. McCallum, S.H.: Library of Congress metadata landscape (2003) 0.22
    0.22287452 = sum of:
      0.22287452 = product of:
        0.6964829 = sum of:
          0.018869342 = weight(abstract_txt:digital in 1760) [ClassicSimilarity], result of:
            0.018869342 = score(doc=1760,freq=1.0), product of:
              0.0696772 = queryWeight, product of:
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.016080687 = queryNorm
              0.27081087 = fieldWeight in 1760, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
          0.019468358 = weight(abstract_txt:support in 1760) [ClassicSimilarity], result of:
            0.019468358 = score(doc=1760,freq=1.0), product of:
              0.07114413 = queryWeight, product of:
                1.0104718 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.016080687 = queryNorm
              0.27364674 = fieldWeight in 1760, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
          0.024722325 = weight(abstract_txt:same in 1760) [ClassicSimilarity], result of:
            0.024722325 = score(doc=1760,freq=1.0), product of:
              0.0834281 = queryWeight, product of:
                1.0942357 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.016080687 = queryNorm
              0.2963309 = fieldWeight in 1760, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
          0.060764134 = weight(abstract_txt:item in 1760) [ClassicSimilarity], result of:
            0.060764134 = score(doc=1760,freq=1.0), product of:
              0.15194428 = queryWeight, product of:
                1.4767156 = boost
                6.39857 = idf(docFreq=199, maxDocs=44218)
                0.016080687 = queryNorm
              0.39991063 = fieldWeight in 1760, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.39857 = idf(docFreq=199, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
          0.04477652 = weight(abstract_txt:level in 1760) [ClassicSimilarity], result of:
            0.04477652 = score(doc=1760,freq=2.0), product of:
              0.11262639 = queryWeight, product of:
                1.5571139 = boost
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.016080687 = queryNorm
              0.39756688 = fieldWeight in 1760, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
          0.046598 = weight(abstract_txt:collection in 1760) [ClassicSimilarity], result of:
            0.046598 = score(doc=1760,freq=1.0), product of:
              0.1603889 = queryWeight, product of:
                2.14564 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.016080687 = queryNorm
              0.2905313 = fieldWeight in 1760, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
          0.084807314 = weight(abstract_txt:collections in 1760) [ClassicSimilarity], result of:
            0.084807314 = score(doc=1760,freq=2.0), product of:
              0.2044145 = queryWeight, product of:
                2.7081985 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.016080687 = queryNorm
              0.41487914 = fieldWeight in 1760, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
          0.39647686 = weight(abstract_txt:metadata in 1760) [ClassicSimilarity], result of:
            0.39647686 = score(doc=1760,freq=6.0), product of:
              0.5305562 = queryWeight, product of:
                6.7592106 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016080687 = queryNorm
              0.7472853 = fieldWeight in 1760, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=1760)
        0.32 = coord(8/25)
    
  5. Zavalina, O.L.: Complementarity in subject metadata in large-scale digital libraries : a comparative analysis (2014) 0.22
    0.2167326 = sum of:
      0.2167326 = product of:
        0.90305257 = sum of:
          0.047173355 = weight(abstract_txt:digital in 1972) [ClassicSimilarity], result of:
            0.047173355 = score(doc=1972,freq=4.0), product of:
              0.0696772 = queryWeight, product of:
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.016080687 = queryNorm
              0.67702717 = fieldWeight in 1972, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.078125 = fieldNorm(doc=1972)
          0.068549775 = weight(abstract_txt:level in 1972) [ClassicSimilarity], result of:
            0.068549775 = score(doc=1972,freq=3.0), product of:
              0.11262639 = queryWeight, product of:
                1.5571139 = boost
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.016080687 = queryNorm
              0.6086475 = fieldWeight in 1972, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.078125 = fieldNorm(doc=1972)
          0.11588597 = weight(abstract_txt:aggregated in 1972) [ClassicSimilarity], result of:
            0.11588597 = score(doc=1972,freq=1.0), product of:
              0.20137246 = queryWeight, product of:
                1.7000226 = boost
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.016080687 = queryNorm
              0.57548076 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.078125 = fieldNorm(doc=1972)
          0.10088761 = weight(abstract_txt:collection in 1972) [ClassicSimilarity], result of:
            0.10088761 = score(doc=1972,freq=3.0), product of:
              0.1603889 = queryWeight, product of:
                2.14564 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.016080687 = queryNorm
              0.62901866 = fieldWeight in 1972, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=1972)
          0.074959785 = weight(abstract_txt:collections in 1972) [ClassicSimilarity], result of:
            0.074959785 = score(doc=1972,freq=1.0), product of:
              0.2044145 = queryWeight, product of:
                2.7081985 = boost
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.016080687 = queryNorm
              0.36670482 = fieldWeight in 1972, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.693822 = idf(docFreq=1099, maxDocs=44218)
                0.078125 = fieldNorm(doc=1972)
          0.49559608 = weight(abstract_txt:metadata in 1972) [ClassicSimilarity], result of:
            0.49559608 = score(doc=1972,freq=6.0), product of:
              0.5305562 = queryWeight, product of:
                6.7592106 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.016080687 = queryNorm
              0.93410665 = fieldWeight in 1972, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=1972)
        0.24 = coord(6/25)