Document (#36759)

Author
Cortez, E.
Herrera, M.R.
Silva, A.S. da
Moura, E.S. de
Neubert, M.
Title
Lightweight methods for large-scale product categorization
Source
Journal of the American Society for Information Science and Technology. 62(2011) no.9, S.1839-1848
Year
2011
Abstract
In this article, we present a study about classification methods for large-scale categorization of product offers on e-shopping web sites. We present a study about the performance of previously proposed approaches and deployed a probabilistic approach to model the classification problem. We also studied an alternative way of modeling information about the description of product offers and investigated the usage of price and store of product offers as features adopted in the classification process. Our experiments used two collections of over a million product offers previously categorized by human editors and taxonomies of hundreds of categories from a real e-shopping web site. In these experiments, our method achieved an improvement of up to 9% in the quality of the categorization in comparison with the best baseline we have found.
Theme
Automatisches Klassifizieren
Area
Informationswirtschaft

Similar documents (author)

  1. Cortez, E.; Silva, A.S. da; Gonçalves, M.A.; Mesquita, F.; Moura, E.S. de: ¬A flexible approach for extracting metadata from bibliographic citations (2009) 1.67
    1.6667988 = sum of:
      1.6667988 = product of:
        2.777998 = sum of:
          0.5961975 = weight(author_txt:silva in 2848) [ClassicSimilarity], result of:
            0.5961975 = score(doc=2848,freq=1.0), product of:
              0.31766564 = queryWeight, product of:
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.04231461 = queryNorm
              1.8768082 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.25 = fieldNorm(doc=2848)
          0.9282847 = weight(author_txt:moura in 2848) [ClassicSimilarity], result of:
            0.9282847 = score(doc=2848,freq=1.0), product of:
              0.4267409 = queryWeight, product of:
                1.1590363 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.04231461 = queryNorm
              2.1752887 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.25 = fieldNorm(doc=2848)
          1.2535156 = weight(author_txt:cortez in 2848) [ClassicSimilarity], result of:
            1.2535156 = score(doc=2848,freq=1.0), product of:
              0.52135074 = queryWeight, product of:
                1.2810907 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.04231461 = queryNorm
              2.4043615 = fieldWeight in 2848, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.25 = fieldNorm(doc=2848)
        0.6 = coord(3/5)
    
  2. Cortez, E.M.: Use of metadata vocabularies in data retrieval (1999) 0.63
    0.6267578 = sum of:
      0.6267578 = product of:
        3.133789 = sum of:
          3.133789 = weight(author_txt:cortez in 4057) [ClassicSimilarity], result of:
            3.133789 = score(doc=4057,freq=1.0), product of:
              0.52135074 = queryWeight, product of:
                1.2810907 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.04231461 = queryNorm
              6.010904 = fieldWeight in 4057, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.625 = fieldNorm(doc=4057)
        0.2 = coord(1/5)
    
  3. Cortez, E.M.: Planning and implementing a high performance knowledge base (1999) 0.63
    0.6267578 = sum of:
      0.6267578 = product of:
        3.133789 = sum of:
          3.133789 = weight(author_txt:cortez in 6551) [ClassicSimilarity], result of:
            3.133789 = score(doc=6551,freq=1.0), product of:
              0.52135074 = queryWeight, product of:
                1.2810907 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.04231461 = queryNorm
              6.010904 = fieldWeight in 6551, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.625 = fieldNorm(doc=6551)
        0.2 = coord(1/5)
    
  4. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.61
    0.6097929 = sum of:
      0.6097929 = product of:
        1.5244823 = sum of:
          0.5961975 = weight(author_txt:silva in 4119) [ClassicSimilarity], result of:
            0.5961975 = score(doc=4119,freq=1.0), product of:
              0.31766564 = queryWeight, product of:
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.04231461 = queryNorm
              1.8768082 = fieldWeight in 4119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.25 = fieldNorm(doc=4119)
          0.9282847 = weight(author_txt:moura in 4119) [ClassicSimilarity], result of:
            0.9282847 = score(doc=4119,freq=1.0), product of:
              0.4267409 = queryWeight, product of:
                1.1590363 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.04231461 = queryNorm
              2.1752887 = fieldWeight in 4119, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.25 = fieldNorm(doc=4119)
        0.4 = coord(2/5)
    
  5. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.61
    0.6097929 = sum of:
      0.6097929 = product of:
        1.5244823 = sum of:
          0.5961975 = weight(author_txt:silva in 278) [ClassicSimilarity], result of:
            0.5961975 = score(doc=278,freq=1.0), product of:
              0.31766564 = queryWeight, product of:
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.04231461 = queryNorm
              1.8768082 = fieldWeight in 278, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.25 = fieldNorm(doc=278)
          0.9282847 = weight(author_txt:moura in 278) [ClassicSimilarity], result of:
            0.9282847 = score(doc=278,freq=1.0), product of:
              0.4267409 = queryWeight, product of:
                1.1590363 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.04231461 = queryNorm
              2.1752887 = fieldWeight in 278, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.25 = fieldNorm(doc=278)
        0.4 = coord(2/5)
    

Similar documents (content)

  1. Li, Y.; Xu, S.; Luo, X.; Lin, S.: ¬A new algorithm for product image search based on salient edge characterization (2014) 0.19
    0.1908944 = sum of:
      0.1908944 = product of:
        0.95447195 = sum of:
          0.03028971 = weight(abstract_txt:large in 1552) [ClassicSimilarity], result of:
            0.03028971 = score(doc=1552,freq=1.0), product of:
              0.10880683 = queryWeight, product of:
                1.4480817 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.016869577 = queryNorm
              0.27838057 = fieldWeight in 1552, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.0625 = fieldNorm(doc=1552)
          0.051916078 = weight(abstract_txt:experiments in 1552) [ClassicSimilarity], result of:
            0.051916078 = score(doc=1552,freq=1.0), product of:
              0.15583345 = queryWeight, product of:
                1.7329872 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.016869577 = queryNorm
              0.33315104 = fieldWeight in 1552, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.0625 = fieldNorm(doc=1552)
          0.05629623 = weight(abstract_txt:scale in 1552) [ClassicSimilarity], result of:
            0.05629623 = score(doc=1552,freq=1.0), product of:
              0.1644797 = queryWeight, product of:
                1.7804147 = boost
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.016869577 = queryNorm
              0.34226856 = fieldWeight in 1552, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.0625 = fieldNorm(doc=1552)
          0.26441345 = weight(abstract_txt:shopping in 1552) [ClassicSimilarity], result of:
            0.26441345 = score(doc=1552,freq=2.0), product of:
              0.36613265 = queryWeight, product of:
                2.6563435 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.016869577 = queryNorm
              0.72217935 = fieldWeight in 1552, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.0625 = fieldNorm(doc=1552)
          0.55155647 = weight(abstract_txt:product in 1552) [ClassicSimilarity], result of:
            0.55155647 = score(doc=1552,freq=9.0), product of:
              0.49138126 = queryWeight, product of:
                4.8656883 = boost
                5.98646 = idf(docFreq=301, maxDocs=44218)
                0.016869577 = queryNorm
              1.1224613 = fieldWeight in 1552, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.98646 = idf(docFreq=301, maxDocs=44218)
                0.0625 = fieldNorm(doc=1552)
        0.2 = coord(5/25)
    
  2. Goren-Bar, D.; Kuflik, T.: Supporting user-subjective categorization with self-organizing maps and learning vector quantization (2005) 0.17
    0.16863556 = sum of:
      0.16863556 = product of:
        0.70264816 = sum of:
          0.013757698 = weight(abstract_txt:study in 3325) [ClassicSimilarity], result of:
            0.013757698 = score(doc=3325,freq=1.0), product of:
              0.064291954 = queryWeight, product of:
                1.1131234 = boost
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.016869577 = queryNorm
              0.21398787 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.0779277 = weight(abstract_txt:hundreds in 3325) [ClassicSimilarity], result of:
            0.0779277 = score(doc=3325,freq=1.0), product of:
              0.16214766 = queryWeight, product of:
                1.2499865 = boost
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.016869577 = queryNorm
              0.48059714 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.024442326 = weight(abstract_txt:methods in 3325) [ClassicSimilarity], result of:
            0.024442326 = score(doc=3325,freq=1.0), product of:
              0.094309285 = queryWeight, product of:
                1.3481624 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.016869577 = queryNorm
              0.259172 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.043840833 = weight(abstract_txt:about in 3325) [ClassicSimilarity], result of:
            0.043840833 = score(doc=3325,freq=2.0), product of:
              0.12649278 = queryWeight, product of:
                1.9122448 = boost
                3.9211915 = idf(docFreq=2381, maxDocs=44218)
                0.016869577 = queryNorm
              0.34658763 = fieldWeight in 3325, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9211915 = idf(docFreq=2381, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.0327119 = weight(abstract_txt:classification in 3325) [ClassicSimilarity], result of:
            0.0327119 = score(doc=3325,freq=1.0), product of:
              0.13110736 = queryWeight, product of:
                1.9468126 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.016869577 = queryNorm
              0.2495047 = fieldWeight in 3325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
          0.50996774 = weight(abstract_txt:categorization in 3325) [ClassicSimilarity], result of:
            0.50996774 = score(doc=3325,freq=12.0), product of:
              0.35737535 = queryWeight, product of:
                3.2142 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.016869577 = queryNorm
              1.4269807 = fieldWeight in 3325, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.0625 = fieldNorm(doc=3325)
        0.24 = coord(6/25)
    
  3. Li, H.; Bhowmick, S.S.; Sun, A.: AffRank: affinity-driven ranking of products in online social rating networks (2011) 0.14
    0.14274846 = sum of:
      0.14274846 = product of:
        0.7137423 = sum of:
          0.055537723 = weight(abstract_txt:baseline in 4483) [ClassicSimilarity], result of:
            0.055537723 = score(doc=4483,freq=1.0), product of:
              0.12937236 = queryWeight, product of:
                1.1165309 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.016869577 = queryNorm
              0.42928585 = fieldWeight in 4483, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=4483)
          0.024442326 = weight(abstract_txt:methods in 4483) [ClassicSimilarity], result of:
            0.024442326 = score(doc=4483,freq=1.0), product of:
              0.094309285 = queryWeight, product of:
                1.3481624 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.016869577 = queryNorm
              0.259172 = fieldWeight in 4483, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=4483)
          0.03028971 = weight(abstract_txt:large in 4483) [ClassicSimilarity], result of:
            0.03028971 = score(doc=4483,freq=1.0), product of:
              0.10880683 = queryWeight, product of:
                1.4480817 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.016869577 = queryNorm
              0.27838057 = fieldWeight in 4483, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.0625 = fieldNorm(doc=4483)
          0.051916078 = weight(abstract_txt:experiments in 4483) [ClassicSimilarity], result of:
            0.051916078 = score(doc=4483,freq=1.0), product of:
              0.15583345 = queryWeight, product of:
                1.7329872 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.016869577 = queryNorm
              0.33315104 = fieldWeight in 4483, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.0625 = fieldNorm(doc=4483)
          0.55155647 = weight(abstract_txt:product in 4483) [ClassicSimilarity], result of:
            0.55155647 = score(doc=4483,freq=9.0), product of:
              0.49138126 = queryWeight, product of:
                4.8656883 = boost
                5.98646 = idf(docFreq=301, maxDocs=44218)
                0.016869577 = queryNorm
              1.1224613 = fieldWeight in 4483, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.98646 = idf(docFreq=301, maxDocs=44218)
                0.0625 = fieldNorm(doc=4483)
        0.2 = coord(5/25)
    
  4. Rijnsoever, F.J. van; Castaldi, C.: Extending consumer categorization based on innovativeness : intentions and technology clusters in consumer electronics (2011) 0.12
    0.1226747 = sum of:
      0.1226747 = product of:
        0.7667169 = sum of:
          0.059850406 = weight(abstract_txt:improvement in 4634) [ClassicSimilarity], result of:
            0.059850406 = score(doc=4634,freq=1.0), product of:
              0.10377674 = queryWeight, product of:
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.016869577 = queryNorm
              0.57672274 = fieldWeight in 4634, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.09375 = fieldNorm(doc=4634)
          0.11879807 = weight(abstract_txt:previously in 4634) [ClassicSimilarity], result of:
            0.11879807 = score(doc=4634,freq=1.0), product of:
              0.20650862 = queryWeight, product of:
                1.9949595 = boost
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.016869577 = queryNorm
              0.5752693 = fieldWeight in 4634, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1362057 = idf(docFreq=259, maxDocs=44218)
                0.09375 = fieldNorm(doc=4634)
          0.3122902 = weight(abstract_txt:categorization in 4634) [ClassicSimilarity], result of:
            0.3122902 = score(doc=4634,freq=2.0), product of:
              0.35737535 = queryWeight, product of:
                3.2142 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.016869577 = queryNorm
              0.87384367 = fieldWeight in 4634, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.09375 = fieldNorm(doc=4634)
          0.27577823 = weight(abstract_txt:product in 4634) [ClassicSimilarity], result of:
            0.27577823 = score(doc=4634,freq=1.0), product of:
              0.49138126 = queryWeight, product of:
                4.8656883 = boost
                5.98646 = idf(docFreq=301, maxDocs=44218)
                0.016869577 = queryNorm
              0.56123066 = fieldWeight in 4634, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.98646 = idf(docFreq=301, maxDocs=44218)
                0.09375 = fieldNorm(doc=4634)
        0.16 = coord(4/25)
    
  5. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.11
    0.11192429 = sum of:
      0.11192429 = product of:
        0.55962145 = sum of:
          0.04987534 = weight(abstract_txt:improvement in 4199) [ClassicSimilarity], result of:
            0.04987534 = score(doc=4199,freq=1.0), product of:
              0.10377674 = queryWeight, product of:
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.016869577 = queryNorm
              0.48060232 = fieldWeight in 4199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1517096 = idf(docFreq=255, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.017197123 = weight(abstract_txt:study in 4199) [ClassicSimilarity], result of:
            0.017197123 = score(doc=4199,freq=1.0), product of:
              0.064291954 = queryWeight, product of:
                1.1131234 = boost
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.016869577 = queryNorm
              0.26748484 = fieldWeight in 4199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.043208335 = weight(abstract_txt:methods in 4199) [ClassicSimilarity], result of:
            0.043208335 = score(doc=4199,freq=2.0), product of:
              0.094309285 = queryWeight, product of:
                1.3481624 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.016869577 = queryNorm
              0.4581557 = fieldWeight in 4199, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.037862137 = weight(abstract_txt:large in 4199) [ClassicSimilarity], result of:
            0.037862137 = score(doc=4199,freq=1.0), product of:
              0.10880683 = queryWeight, product of:
                1.4480817 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.016869577 = queryNorm
              0.34797573 = fieldWeight in 4199, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
          0.4114785 = weight(abstract_txt:categorization in 4199) [ClassicSimilarity], result of:
            0.4114785 = score(doc=4199,freq=5.0), product of:
              0.35737535 = queryWeight, product of:
                3.2142 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.016869577 = queryNorm
              1.1513902 = fieldWeight in 4199, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.078125 = fieldNorm(doc=4199)
        0.2 = coord(5/25)