E1543e - Collection of Disaster Related Web Contents and Use of Social Bookmarks

The original article published in Japanese ( http://current.ndl.go.jp/e1543 )

Current Awareness-E No.256

27 March, 2014


Collection of Disaster Related Web Contents
and Use of Social Bookmarks

●Aim of This Article

On March 11, 2014, three years after the Great East Japan Earthquake, National Diet Library (NDL) began to call for provision of information as an effort to collect records related to the Great East Japan Earthquake. NDL is asking individuals to submit photos and movies, as well as websites to be collected. The effort is made in collaboration with archive institutions engaged in disaster records. For websites in particular, NDL developed a system to recommend websites for preservation to “Digital Archive of Japan’s 2011 Disasters” of NDL and The Edwin O. Reischauer Institute of Japanese Studies at Harvard University. Such system was made possible with cooperation from Hatena Co., Ltd., who provides a social bookmark service called “Hatena Bookmark.” Using this service, users are able to tag websites that they wish to preserve. This articles discusses the purpose of this effort and its backgrounds.

●Why Collect Information? Why Preserve Records?

Lessons learned from enormous damage from the disaster must be utilized in preparing for future disasters. We, the Japanese people, live on land where earthquakes are unavoidable. As such, scientific insights drawn from knowledge gained from experience of disasters, preserved as records for examination and reuse, may protect lives of the next generation and also contribute when other countries suffer from similar disasters.

The 2011 catastrophe was one of the rare disasters in human history that was recorded by various digital media. Photos and movies were taken with individual’s cellular phones as well as car-mounted cameras. Real-time statements were written on the Internet, televisions and radios made remote broadcasts, and statistics were produced by equipments for fixed-point observation. These data tell in detail the chaos of transportation and logistics of the time, progress from damage to recovery such as breakdown of communication and power networks. In a society where digital equipments and the Internet is popularized, preserving not only publications but also primary digital data in a form that can be examined and reused, is an important task for extracting scientific implications from the disaster.

●NDL Efforts and Limitation of Archives within Japan

NDL has developed legal foundation to straighten the basis for common knowledge creation for the Japanese citizens. For example, the deposit system requires every publications issued within Japan to be deposited to NDL. Also, given the increasing importance of web contents made available on the Internet, legal foundations are being organized for collection and preservation of public organizations’ online materials as well as online resources without digital rights management (DRM) and provided free by actors other than public organizations (see E1046, E1464). However, at this moment, there is no legal foundation for NDL to collect and preserve photos, movies, and web contents of individuals. Many disaster records continue to remain under individual’s management.

Long-term preservation of digital records requires comprehensive management of physical records, information related to revival of file formats, information to reach the objective item through search, as well as information on rights related to reuse. Continuous preservation across generations is thus a difficult task.

Difficulty in dealing with portrait rights and copyrights further stands as an obstacle as libraries and private archive institutions make the effort to preserve disaster records (see E1375). If NDL were to collect private web contents, permission must be obtained for every single item. Such data collection has not been successful, because of the difficulty in verifying rights through immense procedure. United States, on the other hand, has been successful in collecting and preserving private contents. These include a collection of web contents by Internet Archive, a dark archive of individuals’ “tweet” data donated by Twitter to the Library of Congress in the United States (see E1042E1385), as well as The Vanderbilt Television News Archive of Vanderbilt University.


●Exhaustive Comprehension of Disaster Related Web Contents through Collaboration on Search and Collective Efforts to Intellectually Extract Data To Be Collected

As the best possible strategy for now, NDL has developed a portal site called “Hinagiku” (see E1413). With cooperation on search functions with archive institutions, this site allows a wide-range of search on information related to the disaster through a single window. Given the constraints on budget, technology, and human resource, it is difficult for a single institution to cover massive amount of digital information. Building partnerships with precedent disaster archives is thus necessary, including private web archive institutions with know-hows on management.

Due to its massive volume, collecting disaster records on the Internet requires appropriate selection of information. Although technologies of natural language processing and engineering of image processing has advanced, current mechanical analyses is insufficient for making decisions, extracting, and/or categorizing contents. On the other hand, classifying contents through tagging by human hands on the Internet (called folksonomy, see CA1623) is effective in extracting information on specific topics, and the system has already been developed by social bookmark services. As such, in collecting disaster related web contents, NDL decided to seek cooperation from “Hatena Bookmark” managed by Hatena Co., Ltd.. This firm has been active in building partnership with libraries through provision of free official blog services. Data for preservation is first extracted from a list of web contents selected by the users. Among them, NDL preserves data for which they were able to obtain permissions. Those that NDL was unable to collect are to be collected and preserved by Internet Archive, a partner of The Edwin O. Reischauer Institute of Japanese Studies at Harvard University.

●Mutual Partnerships and Future Challenges

Crowd-sourcing that looks for help on the Internet (see E1494) and mechanical data partnerships and mutual use (API, Open Data, Open Government), are effective in mutually conserving resources, and are useful for public agencies in providing online services. NDL intends to expand its partnership and cooperation with private institutions. In thinking about legal challenges that currently stand as an obstacle in developing disaster archives, we need to discuss and develop a legal system that fit the time, actual conditions, and the objective to analyze, extract and share memories as well as lessons learned from the disaster. These may include data with uncertain prospect for long-term preservation such as: 1) traffic logs of transportation and communication, 2) data on transmission, distribution, and demands of electricity, 3) data from fixed-point observation, 4) steps to collect and preserve primary information of social infrastructure at the time of disasters like sensors embedded in mobile equipments, and 5) handling photos and movies with difficulty in processing of copyrights and portrait rights.

Written by Takashi Magome,
Digital Information Planning Division, Digital Information Department,
the National Diet Library

 Translated by Okada Aya