Document (#43065)

Author
Yang, T.-H.
Hsieh, Y.-L.
Liu, S.-H.
Chang, Y.-C.
Hsu, W.-L.
Title
¬A flexible template generation and matching method with applications for publication reference metadata extraction
Source
Journal of the Association for Information Science and Technology. 72(2021) no.1, S.32-45
Year
2021
Abstract
Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24391.
Theme
Automatisches Indexieren
Metadaten

Similar documents (author)

  1. Hsu, C.-N.; Chang, C.-H.; Hsieh, C.-H.; Lu, J.-J.; Chang, C.-C.: Reconfigurable Web wrapper agents for biological information integration (2005) 1.96
    1.9564322 = sum of:
      1.9564322 = product of:
        2.9346483 = sum of:
          1.485926 = weight(author_txt:chang in 5263) [ClassicSimilarity], result of:
            1.485926 = score(doc=5263,freq=2.0), product of:
              0.5491147 = queryWeight, product of:
                1.0634644 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.06746227 = queryNorm
              2.7060394 = fieldWeight in 5263, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.25 = fieldNorm(doc=5263)
          1.4487222 = weight(author_txt:hsieh in 5263) [ClassicSimilarity], result of:
            1.4487222 = score(doc=5263,freq=1.0), product of:
              0.68024445 = queryWeight, product of:
                1.1836517 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.06746227 = queryNorm
              2.1297083 = fieldWeight in 5263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.25 = fieldNorm(doc=5263)
        0.6666667 = coord(2/3)
    
  2. Hsieh-Yee, I.: ¬The cataloging practices of special libraries and their relationship with OCLC (1996) 0.97
    0.9658148 = sum of:
      0.9658148 = product of:
        2.8974445 = sum of:
          2.8974445 = weight(author_txt:hsieh in 4920) [ClassicSimilarity], result of:
            2.8974445 = score(doc=4920,freq=1.0), product of:
              0.68024445 = queryWeight, product of:
                1.1836517 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.06746227 = queryNorm
              4.2594166 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.5 = fieldNorm(doc=4920)
        0.33333334 = coord(1/3)
    
  3. Hsieh-Yee, I.: Student use of online catalogs and other information channels (1996) 0.97
    0.9658148 = sum of:
      0.9658148 = product of:
        2.8974445 = sum of:
          2.8974445 = weight(author_txt:hsieh in 5543) [ClassicSimilarity], result of:
            2.8974445 = score(doc=5543,freq=1.0), product of:
              0.68024445 = queryWeight, product of:
                1.1836517 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.06746227 = queryNorm
              4.2594166 = fieldWeight in 5543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.5 = fieldNorm(doc=5543)
        0.33333334 = coord(1/3)
    
  4. Hsieh-Yee, I.: ¬The retrieval power of selected search engines : how well do they address general reference questions and subject questions? (1998) 0.97
    0.9658148 = sum of:
      0.9658148 = product of:
        2.8974445 = sum of:
          2.8974445 = weight(author_txt:hsieh in 2186) [ClassicSimilarity], result of:
            2.8974445 = score(doc=2186,freq=1.0), product of:
              0.68024445 = queryWeight, product of:
                1.1836517 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.06746227 = queryNorm
              4.2594166 = fieldWeight in 2186, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.5 = fieldNorm(doc=2186)
        0.33333334 = coord(1/3)
    
  5. Hsieh-Yee, I.: Search tactics of Web users in searching for texts, graphics, known items and subjects : a search simulation study (1998) 0.97
    0.9658148 = sum of:
      0.9658148 = product of:
        2.8974445 = sum of:
          2.8974445 = weight(author_txt:hsieh in 2404) [ClassicSimilarity], result of:
            2.8974445 = score(doc=2404,freq=1.0), product of:
              0.68024445 = queryWeight, product of:
                1.1836517 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.06746227 = queryNorm
              4.2594166 = fieldWeight in 2404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.5 = fieldNorm(doc=2404)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.15
    0.14828664 = sum of:
      0.14828664 = product of:
        0.52959514 = sum of:
          0.044150002 = weight(abstract_txt:learning in 1611) [ClassicSimilarity], result of:
            0.044150002 = score(doc=1611,freq=2.0), product of:
              0.0841109 = queryWeight, product of:
                1.0404961 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.017015252 = queryNorm
              0.5249023 = fieldWeight in 1611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.039735638 = weight(abstract_txt:propose in 1611) [ClassicSimilarity], result of:
            0.039735638 = score(doc=1611,freq=1.0), product of:
              0.09878597 = queryWeight, product of:
                1.1276178 = boost
                5.1486683 = idf(docFreq=697, maxDocs=44218)
                0.017015252 = queryNorm
              0.4022397 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1486683 = idf(docFreq=697, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.044093944 = weight(abstract_txt:experiments in 1611) [ClassicSimilarity], result of:
            0.044093944 = score(doc=1611,freq=1.0), product of:
              0.10588337 = queryWeight, product of:
                1.1674229 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.017015252 = queryNorm
              0.41643882 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.11968934 = weight(abstract_txt:extraction in 1611) [ClassicSimilarity], result of:
            0.11968934 = score(doc=1611,freq=3.0), product of:
              0.14285815 = queryWeight, product of:
                1.3560222 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.017015252 = queryNorm
              0.83781946 = fieldWeight in 1611, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.07487443 = weight(abstract_txt:capture in 1611) [ClassicSimilarity], result of:
            0.07487443 = score(doc=1611,freq=1.0), product of:
              0.1507061 = queryWeight, product of:
                1.392771 = boost
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.017015252 = queryNorm
              0.49682415 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.04084336 = weight(abstract_txt:based in 1611) [ClassicSimilarity], result of:
            0.04084336 = score(doc=1611,freq=3.0), product of:
              0.0946809 = queryWeight, product of:
                1.7454826 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.017015252 = queryNorm
              0.4313791 = fieldWeight in 1611, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.16620839 = weight(abstract_txt:templates in 1611) [ClassicSimilarity], result of:
            0.16620839 = score(doc=1611,freq=1.0), product of:
              0.25645456 = queryWeight, product of:
                1.816852 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.017015252 = queryNorm
              0.64810073 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
        0.28 = coord(7/25)
    
  2. Yim, W.-w.; Kwan, S.W.; Yetisgen, M.: Classifying tumor event attributes in radiology reports (2017) 0.14
    0.14284174 = sum of:
      0.14284174 = product of:
        0.7142087 = sum of:
          0.024975013 = weight(abstract_txt:learning in 3929) [ClassicSimilarity], result of:
            0.024975013 = score(doc=3929,freq=1.0), product of:
              0.0841109 = queryWeight, product of:
                1.0404961 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.017015252 = queryNorm
              0.29692957 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=3929)
          0.078180745 = weight(abstract_txt:extraction in 3929) [ClassicSimilarity], result of:
            0.078180745 = score(doc=3929,freq=2.0), product of:
              0.14285815 = queryWeight, product of:
                1.3560222 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.017015252 = queryNorm
              0.54726136 = fieldWeight in 3929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=3929)
          0.018864742 = weight(abstract_txt:based in 3929) [ClassicSimilarity], result of:
            0.018864742 = score(doc=3929,freq=1.0), product of:
              0.0946809 = queryWeight, product of:
                1.7454826 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.017015252 = queryNorm
              0.19924548 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=3929)
          0.13296671 = weight(abstract_txt:templates in 3929) [ClassicSimilarity], result of:
            0.13296671 = score(doc=3929,freq=1.0), product of:
              0.25645456 = queryWeight, product of:
                1.816852 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.017015252 = queryNorm
              0.5184806 = fieldWeight in 3929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0625 = fieldNorm(doc=3929)
          0.4592215 = weight(abstract_txt:template in 3929) [ClassicSimilarity], result of:
            0.4592215 = score(doc=3929,freq=2.0), product of:
              0.6311995 = queryWeight, product of:
                4.5067935 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.017015252 = queryNorm
              0.7275379 = fieldWeight in 3929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0625 = fieldNorm(doc=3929)
        0.2 = coord(5/25)
    
  3. Taniguchi, S.: ¬A system for analyzing cataloguing rules : a feasibility study (1996) 0.13
    0.13138637 = sum of:
      0.13138637 = product of:
        0.82116485 = sum of:
          0.2022979 = weight(abstract_txt:rule in 4198) [ClassicSimilarity], result of:
            0.2022979 = score(doc=4198,freq=9.0), product of:
              0.16309045 = queryWeight, product of:
                1.4488672 = boost
                6.615483 = idf(docFreq=160, maxDocs=44218)
                0.017015252 = queryNorm
              1.240403 = fieldWeight in 4198, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.615483 = idf(docFreq=160, maxDocs=44218)
                0.0625 = fieldNorm(doc=4198)
          0.026678773 = weight(abstract_txt:based in 4198) [ClassicSimilarity], result of:
            0.026678773 = score(doc=4198,freq=2.0), product of:
              0.0946809 = queryWeight, product of:
                1.7454826 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.017015252 = queryNorm
              0.28177565 = fieldWeight in 4198, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=4198)
          0.13296671 = weight(abstract_txt:templates in 4198) [ClassicSimilarity], result of:
            0.13296671 = score(doc=4198,freq=1.0), product of:
              0.25645456 = queryWeight, product of:
                1.816852 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.017015252 = queryNorm
              0.5184806 = fieldWeight in 4198, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0625 = fieldNorm(doc=4198)
          0.4592215 = weight(abstract_txt:template in 4198) [ClassicSimilarity], result of:
            0.4592215 = score(doc=4198,freq=2.0), product of:
              0.6311995 = queryWeight, product of:
                4.5067935 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.017015252 = queryNorm
              0.7275379 = fieldWeight in 4198, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0625 = fieldNorm(doc=4198)
        0.16 = coord(4/25)
    
  4. Kuikka, E.; Salminen, A.: Two-dimensional filters for structured text (1997) 0.12
    0.1174333 = sum of:
      0.1174333 = product of:
        0.9786109 = sum of:
          0.16620839 = weight(abstract_txt:templates in 7251) [ClassicSimilarity], result of:
            0.16620839 = score(doc=7251,freq=1.0), product of:
              0.25645456 = queryWeight, product of:
                1.816852 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.017015252 = queryNorm
              0.64810073 = fieldWeight in 7251, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.078125 = fieldNorm(doc=7251)
          0.109366 = weight(abstract_txt:flexible in 7251) [ClassicSimilarity], result of:
            0.109366 = score(doc=7251,freq=1.0), product of:
              0.22208901 = queryWeight, product of:
                2.07073 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.017015252 = queryNorm
              0.4924422 = fieldWeight in 7251, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.078125 = fieldNorm(doc=7251)
          0.7030365 = weight(abstract_txt:template in 7251) [ClassicSimilarity], result of:
            0.7030365 = score(doc=7251,freq=3.0), product of:
              0.6311995 = queryWeight, product of:
                4.5067935 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.017015252 = queryNorm
              1.1138103 = fieldWeight in 7251, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.078125 = fieldNorm(doc=7251)
        0.12 = coord(3/25)
    
  5. Cheng, Y.-Y.; Xia, Y.: ¬A systematic review of methods for aligning, mapping, merging taxonomies in information sciences (2023) 0.11
    0.10630292 = sum of:
      0.10630292 = product of:
        0.5315146 = sum of:
          0.08868381 = weight(abstract_txt:threefold in 1029) [ClassicSimilarity], result of:
            0.08868381 = score(doc=1029,freq=1.0), product of:
              0.15538223 = queryWeight, product of:
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.017015252 = queryNorm
              0.5707461 = fieldWeight in 1029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.022795975 = weight(abstract_txt:approaches in 1029) [ClassicSimilarity], result of:
            0.022795975 = score(doc=1029,freq=1.0), product of:
              0.07914447 = queryWeight, product of:
                1.0093101 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.017015252 = queryNorm
              0.2880299 = fieldWeight in 1029, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.2476267 = weight(abstract_txt:alignment in 1029) [ClassicSimilarity], result of:
            0.2476267 = score(doc=1029,freq=7.0), product of:
              0.20293055 = queryWeight, product of:
                1.6161737 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.017015252 = queryNorm
              1.2202535 = fieldWeight in 1029, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.026678773 = weight(abstract_txt:based in 1029) [ClassicSimilarity], result of:
            0.026678773 = score(doc=1029,freq=2.0), product of:
              0.0946809 = queryWeight, product of:
                1.7454826 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.017015252 = queryNorm
              0.28177565 = fieldWeight in 1029, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
          0.14572935 = weight(abstract_txt:matching in 1029) [ClassicSimilarity], result of:
            0.14572935 = score(doc=1029,freq=2.0), product of:
              0.27261296 = queryWeight, product of:
                2.6491258 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.017015252 = queryNorm
              0.53456503 = fieldWeight in 1029, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=1029)
        0.2 = coord(5/25)