Document (#40051)

Author
Arora, S.K.
Li, Y.
Youtie, J.
Shapira, P.
Title
Using the wayback machine to mine websites in the social sciences : a methodological resource
Source
Journal of the Association for Information Science and Technology. 67(2016) no.8, S.1904-1915
Year
2016
Abstract
Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23503/abstract.
Theme
Informetrie
Field
Sozialwissenschaften

Similar documents (author)

  1. Shapira, B.: Hypertext browsing : a new model for information filtering based on user profiles and data clustering (1996) 5.62
    5.6180234 = sum of:
      5.6180234 = weight(author_txt:shapira in 4711) [ClassicSimilarity], result of:
        5.6180234 = fieldWeight in 4711, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.625 = fieldNorm(doc=4711)
    
  2. Shapira, B.; Zabar, B.: Personalized search : integrating collaboration and social networks (2011) 4.49
    4.4944186 = sum of:
      4.4944186 = weight(author_txt:shapira in 4140) [ClassicSimilarity], result of:
        4.4944186 = fieldWeight in 4140, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.5 = fieldNorm(doc=4140)
    
  3. Shapira, B.; Shoval, P.; Hanani, U.: Stereotypes in information filtering systems (1997) 3.37
    3.3708138 = sum of:
      3.3708138 = weight(author_txt:shapira in 157) [ClassicSimilarity], result of:
        3.3708138 = fieldWeight in 157, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.375 = fieldNorm(doc=157)
    
  4. Shapira, B.; Kantor, P.B.; Melamed, B.: ¬The effect of extrinsic motivation on user behavior in a collaborative information finding system (2001) 3.37
    3.3708138 = sum of:
      3.3708138 = weight(author_txt:shapira in 6525) [ClassicSimilarity], result of:
        3.3708138 = fieldWeight in 6525, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.375 = fieldNorm(doc=6525)
    
  5. Kuflik, T.; Shapira, B.; Shoval, P.: Stereotype-based versus personal-based filtering rules in information filtering systems (2003) 3.37
    3.3708138 = sum of:
      3.3708138 = weight(author_txt:shapira in 1234) [ClassicSimilarity], result of:
        3.3708138 = fieldWeight in 1234, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.988837 = idf(docFreq=14, maxDocs=44218)
          0.375 = fieldNorm(doc=1234)
    

Similar documents (content)

  1. Vaughan, L.: Uncovering information from social media hyperlinks (2016) 0.11
    0.11491012 = sum of:
      0.11491012 = product of:
        0.41039327 = sum of:
          0.04263895 = weight(abstract_txt:types in 2892) [ClassicSimilarity], result of:
            0.04263895 = score(doc=2892,freq=3.0), product of:
              0.08814633 = queryWeight, product of:
                1.0684203 = boost
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.018462922 = queryNorm
              0.48372915 = fieldWeight in 2892, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.0625 = fieldNorm(doc=2892)
          0.013187695 = weight(abstract_txt:research in 2892) [ClassicSimilarity], result of:
            0.013187695 = score(doc=2892,freq=1.0), product of:
              0.0665554 = queryWeight, product of:
                1.1370442 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.018462922 = queryNorm
              0.19814612 = fieldWeight in 2892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.0625 = fieldNorm(doc=2892)
          0.037594248 = weight(abstract_txt:developing in 2892) [ClassicSimilarity], result of:
            0.037594248 = score(doc=2892,freq=1.0), product of:
              0.11689284 = queryWeight, product of:
                1.230365 = boost
                5.145807 = idf(docFreq=699, maxDocs=44218)
                0.018462922 = queryNorm
              0.32161292 = fieldWeight in 2892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.145807 = idf(docFreq=699, maxDocs=44218)
                0.0625 = fieldNorm(doc=2892)
          0.06909659 = weight(abstract_txt:methodological in 2892) [ClassicSimilarity], result of:
            0.06909659 = score(doc=2892,freq=1.0), product of:
              0.17539264 = queryWeight, product of:
                1.5071126 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.018462922 = queryNorm
              0.39395374 = fieldWeight in 2892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=2892)
          0.062097557 = weight(abstract_txt:social in 2892) [ClassicSimilarity], result of:
            0.062097557 = score(doc=2892,freq=4.0), product of:
              0.11778782 = queryWeight, product of:
                1.512641 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.018462922 = queryNorm
              0.52719843 = fieldWeight in 2892, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.0625 = fieldNorm(doc=2892)
          0.07684865 = weight(abstract_txt:data in 2892) [ClassicSimilarity], result of:
            0.07684865 = score(doc=2892,freq=9.0), product of:
              0.12284674 = queryWeight, product of:
                1.9943062 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.018462922 = queryNorm
              0.62556523 = fieldWeight in 2892, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=2892)
          0.108929574 = weight(abstract_txt:websites in 2892) [ClassicSimilarity], result of:
            0.108929574 = score(doc=2892,freq=1.0), product of:
              0.27195764 = queryWeight, product of:
                2.2984564 = boost
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.018462922 = queryNorm
              0.40053877 = fieldWeight in 2892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.0625 = fieldNorm(doc=2892)
        0.28 = coord(7/25)
    
  2. Borrego, Á.: Measuring compliance with a Spanish Government open access mandate (2016) 0.10
    0.10146838 = sum of:
      0.10146838 = product of:
        0.50734186 = sum of:
          0.022841757 = weight(abstract_txt:research in 2841) [ClassicSimilarity], result of:
            0.022841757 = score(doc=2841,freq=3.0), product of:
              0.0665554 = queryWeight, product of:
                1.1370442 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.018462922 = queryNorm
              0.34319913 = fieldWeight in 2841, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.0625 = fieldNorm(doc=2841)
          0.031048778 = weight(abstract_txt:social in 2841) [ClassicSimilarity], result of:
            0.031048778 = score(doc=2841,freq=1.0), product of:
              0.11778782 = queryWeight, product of:
                1.512641 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.018462922 = queryNorm
              0.26359922 = fieldWeight in 2841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.0625 = fieldNorm(doc=2841)
          0.31890553 = weight(abstract_txt:green in 2841) [ClassicSimilarity], result of:
            0.31890553 = score(doc=2841,freq=4.0), product of:
              0.30628684 = queryWeight, product of:
                1.9916096 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.018462922 = queryNorm
              1.041199 = fieldWeight in 2841, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0625 = fieldNorm(doc=2841)
          0.025616216 = weight(abstract_txt:data in 2841) [ClassicSimilarity], result of:
            0.025616216 = score(doc=2841,freq=1.0), product of:
              0.12284674 = queryWeight, product of:
                1.9943062 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.018462922 = queryNorm
              0.20852174 = fieldWeight in 2841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=2841)
          0.108929574 = weight(abstract_txt:websites in 2841) [ClassicSimilarity], result of:
            0.108929574 = score(doc=2841,freq=1.0), product of:
              0.27195764 = queryWeight, product of:
                2.2984564 = boost
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.018462922 = queryNorm
              0.40053877 = fieldWeight in 2841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.0625 = fieldNorm(doc=2841)
        0.2 = coord(5/25)
    
  3. Xu, C.; Zhang, Q.: ¬The dominant factor of social tags for users' decision behavior on e-commerce websites : color or text (2019) 0.09
    0.09033033 = sum of:
      0.09033033 = product of:
        0.5645646 = sum of:
          0.031048778 = weight(abstract_txt:social in 5359) [ClassicSimilarity], result of:
            0.031048778 = score(doc=5359,freq=1.0), product of:
              0.11778782 = queryWeight, product of:
                1.512641 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.018462922 = queryNorm
              0.26359922 = fieldWeight in 5359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.0625 = fieldNorm(doc=5359)
          0.022918861 = weight(abstract_txt:using in 5359) [ClassicSimilarity], result of:
            0.022918861 = score(doc=5359,freq=1.0), product of:
              0.10588773 = queryWeight, product of:
                1.6560668 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.018462922 = queryNorm
              0.21644491 = fieldWeight in 5359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=5359)
          0.35654724 = weight(abstract_txt:green in 5359) [ClassicSimilarity], result of:
            0.35654724 = score(doc=5359,freq=5.0), product of:
              0.30628684 = queryWeight, product of:
                1.9916096 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.018462922 = queryNorm
              1.1640959 = fieldWeight in 5359, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0625 = fieldNorm(doc=5359)
          0.1540497 = weight(abstract_txt:websites in 5359) [ClassicSimilarity], result of:
            0.1540497 = score(doc=5359,freq=2.0), product of:
              0.27195764 = queryWeight, product of:
                2.2984564 = boost
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.018462922 = queryNorm
              0.5664474 = fieldWeight in 5359, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.0625 = fieldNorm(doc=5359)
        0.16 = coord(4/25)
    
  4. Khoo, C.S.G.; Zhang, D.; Wang, M.; Yun, X.J.: Subject organization in three types of information resources : an exploratory study (2012) 0.08
    0.08041268 = sum of:
      0.08041268 = product of:
        0.40206337 = sum of:
          0.034814555 = weight(abstract_txt:types in 831) [ClassicSimilarity], result of:
            0.034814555 = score(doc=831,freq=2.0), product of:
              0.08814633 = queryWeight, product of:
                1.0684203 = boost
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.018462922 = queryNorm
              0.3949632 = fieldWeight in 831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.05950983 = weight(abstract_txt:resource in 831) [ClassicSimilarity], result of:
            0.05950983 = score(doc=831,freq=3.0), product of:
              0.11008438 = queryWeight, product of:
                1.193996 = boost
                4.993699 = idf(docFreq=814, maxDocs=44218)
                0.018462922 = queryNorm
              0.5405838 = fieldWeight in 831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.993699 = idf(docFreq=814, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.08801865 = weight(abstract_txt:step in 831) [ClassicSimilarity], result of:
            0.08801865 = score(doc=831,freq=2.0), product of:
              0.16358624 = queryWeight, product of:
                1.4555041 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.018462922 = queryNorm
              0.53805655 = fieldWeight in 831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.031048778 = weight(abstract_txt:social in 831) [ClassicSimilarity], result of:
            0.031048778 = score(doc=831,freq=1.0), product of:
              0.11778782 = queryWeight, product of:
                1.512641 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.018462922 = queryNorm
              0.26359922 = fieldWeight in 831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.18867156 = weight(abstract_txt:websites in 831) [ClassicSimilarity], result of:
            0.18867156 = score(doc=831,freq=3.0), product of:
              0.27195764 = queryWeight, product of:
                2.2984564 = boost
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.018462922 = queryNorm
              0.6937535 = fieldWeight in 831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
        0.2 = coord(5/25)
    
  5. Aranyi, G.; Schaik, P. van: Modeling user experience with news websites (2015) 0.08
    0.07500275 = sum of:
      0.07500275 = product of:
        0.37501374 = sum of:
          0.043518197 = weight(abstract_txt:types in 2332) [ClassicSimilarity], result of:
            0.043518197 = score(doc=2332,freq=2.0), product of:
              0.08814633 = queryWeight, product of:
                1.0684203 = boost
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.018462922 = queryNorm
              0.49370402 = fieldWeight in 2332, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4684987 = idf(docFreq=1377, maxDocs=44218)
                0.078125 = fieldNorm(doc=2332)
          0.028552195 = weight(abstract_txt:research in 2332) [ClassicSimilarity], result of:
            0.028552195 = score(doc=2332,freq=3.0), product of:
              0.0665554 = queryWeight, product of:
                1.1370442 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.018462922 = queryNorm
              0.42899892 = fieldWeight in 2332, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.078125 = fieldNorm(doc=2332)
          0.028648578 = weight(abstract_txt:using in 2332) [ClassicSimilarity], result of:
            0.028648578 = score(doc=2332,freq=1.0), product of:
              0.10588773 = queryWeight, product of:
                1.6560668 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.018462922 = queryNorm
              0.27055615 = fieldWeight in 2332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.078125 = fieldNorm(doc=2332)
          0.13616197 = weight(abstract_txt:websites in 2332) [ClassicSimilarity], result of:
            0.13616197 = score(doc=2332,freq=1.0), product of:
              0.27195764 = queryWeight, product of:
                2.2984564 = boost
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.018462922 = queryNorm
              0.5006735 = fieldWeight in 2332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4086204 = idf(docFreq=197, maxDocs=44218)
                0.078125 = fieldNorm(doc=2332)
          0.13813281 = weight(abstract_txt:website in 2332) [ClassicSimilarity], result of:
            0.13813281 = score(doc=2332,freq=1.0), product of:
              0.2745756 = queryWeight, product of:
                2.3094928 = boost
                6.439392 = idf(docFreq=191, maxDocs=44218)
                0.018462922 = queryNorm
              0.5030775 = fieldWeight in 2332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.439392 = idf(docFreq=191, maxDocs=44218)
                0.078125 = fieldNorm(doc=2332)
        0.2 = coord(5/25)