The original article published in Japanese ( https://current.ndl.go.jp/ca2082 ).
Current Awareness
No.364 June 20, 2025
CA2082e
The Current Status and Problems Concerning the Acquisition of Doctoral Dissertations by the National Diet Library: Reporting in 2025
1. Introduction
To contribute to promoting public disclosure and utilization of academic research results, the National Diet Library (NDL) sees dissertations pertaining to the awarding of doctoral degrees (hereafter “doctoral dissertations”) as an important collection and is engaged in its comprehensive collection and preservation.
On April 1, 2013, a ministerial ordinance (Ordinance of Ministry of Education, Culture, Sports, Science and Technology (MEXT) No. 5 of 2013) partially amending the Degree Regulations came into effect (see E1418), requiring doctoral dissertations to be published via the Internet. Since then, the NDL has been collecting doctoral dissertations mainly in electronic form. This effort complements decentralized preservation and provision of doctoral dissertations by each degree-conferring institution, and the NDL serves as a long-term and comprehensive national-level storage of doctoral dissertations.
Considerable amount of time has passed since the 2017 survey on the revision of the Degree Regulations and the status of doctoral dissertation collection (hereinafter “previous survey”, see CA1900). This article revisits the current status of collection at the NDL and discusses problems related to the collection of doctoral dissertations.
2. Collection of Doctoral Dissertations at the NDL
Collection of doctoral dissertations at the NDL is conducted in three methods: (a) automatic collection in collaboration with the Institutional Repositories DataBase (IRDB) of the National Institute of Informatics (NII) (1), (b) electronic submission from degree-conferring institutions (hereafter “electronic submission”), and (c) submission of printed copies (see CA1900 for details). In principle, collection is conducted through only one of these methods.
However, there are cases where identical doctoral dissertations are received in two of these methods, resulting in duplicate submissions. For example, in cases of “unavoidable reasons” such as protection of personal information or multiple publications being prohibited by academic journals, full text of dissertation may be replaced with publication of a summary upon approval of the degree-conferring institution (Article 9, Paragraph 2 of the Degree Regulations). In such cases, full text of the dissertation is collected via method (b). However, once the reasons are resolved and the full text is published in the institutional repository, the identical document will be collected via method (a) automatic collection. Because currently there is no mechanism to identify and exclude duplicates, two copies of the same doctoral dissertation will be stored. Given the situation, with the aim of eliminating such duplicates, we conducted a survey on the number of collected cases by manually verifying report numbers and other relevant information from each degree-conferring institution.
2.1 Status of Collection
Based on the data on the status on awarding of degrees (fiscal year 2021) released by the MEXT, we estimated the collection rate of doctoral dissertations at the NDL. For estimation, degrees awarded in fiscal year 2021 were used as samples. First, according to the MEXT, the number of doctoral degrees awarded in fiscal year 2021 was 15,767 (2). In contrast, Table 1 presents the status of doctoral dissertations collected at the NDL.
Methods of Collection | Number of Collected Dissertations |
---|---|
Automatic Collection | 8,871 |
Electronic Submission | 3,821 |
Print Submission | 844 |
Total | 13,536 |
*In case an identical doctoral dissertation was collected in different methods, if one of the two was collected by “automatic collection,” the dissertation is counted as “automatic collection.” In case an identical doctoral dissertation was collected via “electronic transmission” and “print submission,” the dissertation is counted as “electronic transmission.”
Therefore, among 15,767 doctoral degrees awarded, the NDL collected 13,536 dissertations. The collection rate is 85.9% and an estimated rate of non-collection is 14.1%.
2.2 Reasons for Non-collection
To investigate the reasons for uncollected doctoral dissertations, we used data in the fiscal year 2021. Following the criteria used in the previous survey, “universities whose collection rate is below 20% and with 20 or more uncollected dissertations” were selected as samples. As a result, 878 doctoral dissertations from 15 universities were extracted. Please note that the following primarily refers to doctoral dissertations in electronic form.
First, there were 767 doctoral dissertations from 15 universities whose full text was not published either in institutional repositories or on university websites, and no deposits were made via electronic submission. Table 2 presents the publication status of materials other than the full text.
Status | Number of Universities | Number of Dissertations |
---|---|---|
Only abstract, review summary, and summary published | 1 | 4 |
Only abstract and review summary published | 9 | 269 |
Only summary published | 1 | 1 |
Only an item of doctoral dissertation available (no publication of abstract, review summary, or summary) | 1 | 71 |
No item of doctoral dissertation available (nothing found in institutional repository search) | 6 | 361 |
Only abstract and review summary published on university website (nothing found in institutional repository) | 1 | 31 |
No publication on university website (nothing found in institutional repository) | 1 | 30 |
Total | 20 | 767 |
* “Summary” refers to a summary of the entire dissertation including decisions on research topic(s), methodology, experiment/analysis, conclusion/discussion. A summary is different from an abstract (of the content).
* “Item” refers to a single unit of information consisting of content files (e.g. full text files) and metadata.
* Because one university may have multiple situations, the total number of universities does not match 15.
In most cases, only abstracts were published in institutional repositories, or no item of doctoral dissertation was available. In either case, because the full text of doctoral dissertation is not publicly available, it is unclear whether this constitutes “unavoidable reasons” as defined in the Degree Regulations.
Next, there were 111 cases in five universities where the full text of doctoral dissertations was published in institutional repositories but were unable to be collected because of harvesting errors due to differences between respective metadata descriptions and data linkage rules (3). The breakdown by metadata element is shown in Table 3.
Name of Metadata Element | Number of Universities | Number of Dissertations |
---|---|---|
5 Access Rights (dcterms:accessRights) |
4 | 48 |
14 Resource Type
(dc:type) |
2 | 19 |
35.1 Full text URL
(jpcoar:file/jpcoar:URI) |
5 | 109 |
Total | 11 | 176 |
* Because one dissertation may correspond to multiple elements, the totals for each column do not match 111 cases in five universities.
At the time of previous survey, the mainstream schema was junii2, which was widely distributed through NII’s IRDB as a standard metadata schema. However, JPCOAR Schema ver1.0(4) was released in October 2017 and JPCOAR Schema ver2.0 was released in December 2022. Today, this is the mainstream schema(5).
2.3 Linkage of Non-Target Data
Among doctoral dissertations that the NDL collected automatically from institutional repositories, we investigated cases where files not originally intended for collection were data linked. The investigation targeted cases identified during the fiscal year 2023, which were the most recent at the time of the survey(6). As a result, out of 16,166 that were automatically collected, 6,076 were non-target data, accounting for 37.6% (note that non-target data is not stored or made public, and is not included in Table 1). Following are the main reasons for being classified as non-target data.
- Because the status remained as scheduled for publication, the NDL could not download the full text file. Only metadata and/or abstract were collected.
- Due to system renewal of institutional repository or deletion/re-registration of items, the identifier used as the key for updating linked metadata (OAI-PMH identifier) was changed. Because of this, previously collected metadata for a doctoral dissertation was identified as a new item and was re-collected.
- Although the full text is not publicly available, metadata was collected because it met the conditions of junii2 or JPCOAR schema.
As the percentage in the previous survey was 8.7%, the percentage has significantly increased in this survey. This is likely due to version upgrade of JAIRO Cloud, impact of incomplete data linkage issues related to data collection with the NDL’s system renewal, as well as large-scale data creation and updates at each institution. In addition, the following issues may be identified during post-collection verification process.
- The full text PDF is not authorized for printing(7).
- Data corruption and overwriting is found in the full text PDF.
- Doctoral dissertation by another author has been registered by mistake.
Upon identifying these cases, the NDL requests degree-conferring institutions to correct the metadata or dissertation file.
3. Status of Internet Publication by Degree-Conferring Institutions
The revised Degree Regulation requires public disclosure of doctoral dissertations so that they can be mutually referenced. This is a mechanism to mutually ensure quality of doctoral dissertations as outcomes of university education and research. However, as mentioned earlier, if the degree-conferring institution approves that there are “unavoidable reasons,” the Regulation requires publication of a summary of the doctoral dissertation in place of full text (Article 9, Paragraph 2 of the Degree Regulations). Based on the results of the survey conducted in “2.1 Status of Collection” and “2.2 Reasons for Non-collection,” we estimated the status of full text publication of doctoral dissertations.
Doctoral dissertations collected by the NDL for the fiscal year 2021 were categorized as follows and were classified either as “published” or “not published.”
- (1) Doctoral dissertations automatically collected by the NDL from institutional repositories →Classified as “published”
- (2) Doctoral dissertations collected by the NDL from degree-conferring institutions in electronic form
・“Partial reference (URI)” entered → Investigate for publication status for all relevant dissertations
・“Partial reference (URI)” not entered → Classified as “not published” - (3) Doctoral dissertations collected by the NDL from degree-conferring institutions via print submission → classified as “not published”
- (4) Doctoral dissertations not collected by the NDL → Conduct a sample survey and directly estimated the rate of publication
Regarding (2), when submitting doctoral dissertations published on the Internet, the NDL requests degree-conferring institutions to include the URL of the website where the dissertation file is posted in the metadata field “partial reference (URI).” Among the total of 3,821 doctoral dissertations that were collected via electronic submission, 40 included “partial reference (URI)” entry. Upon reviewing all 40 cases, full text was published on the Internet for all. However, 26 of them were also collected through automatic collection, while the remaining 14 were published on university websites. Therefore, these 14 cases were classified as “published” in this category.
(4) was estimated based on the results of the investigation under “2.2 Reasons for Non-collection.” Among the 878 cases investigated as “uncollected,” 767 cases (Table 2) were classified as “not published” as the full text of these doctoral dissertations were not publicly available in institutional repositories or university websites. The remaining 111 cases with harvest errors due to metadata description issues were classified as “published” as their full text was available in institutional repositories. Therefore, the rate of publication for this category is estimated to be 12.6%, with 111 out of 878 cases investigated.
The results above are summarized in Table 4. In the survey targeting the fiscal year 2021, the estimated percentage of Internet publication of full text of doctoral dissertations was approximately 58.1%.
Number of Dissertations | Number of Publication | Rate of Publication | |
---|---|---|---|
Automatic Collection | 8,871 | 8,871 | 100% |
Electronic Submission |
3,821 | 14 | 0.4% |
Print Submission | 844 | 0 | 0% |
Not Collected by the NDL | 2,231 | (281) | 12.6% |
Total | 15,767 | 9,166 | 58.1% |
* Percentage of publication has been rounded to the nearest hundredth. The number in parentheses is a reverse calculation from the estimated percentage of publication.
To promote digitalization and open access to outcomes of education and research in graduate schools, doctoral dissertations should be made public on the Internet as much as possible. However, from the results presented in Table 4, it is estimated that more than 40% of them are subject to Article 9, Paragraph 2 of the Degree Regulation, and as such full texts are not publicly available. In addition, because of requests from degree-conferring institutions, a certain number of the doctoral dissertations collected by electronic transmission(8) are subject to restrictions on use regarding viewing and copying in the NDL. When using doctoral dissertations, please take these circumstances into consideration.
4. Problems
The following are challenges related to the collection of doctoral dissertations by the NDL.
- Data linkage in automatic collection Automatic collection of doctoral dissertations is conducted by importing into the NDL Digital Collections metadata of doctoral dissertations that meet the conditions for automatic collection among those metadata that NDL Search collected from IRDB (9). As explained in “2.3 Linkage of Non-target Data,” there are several issues on data linkage when this takes place. For example, even if the metadata description conforms to the rules for data linkage, only abstracts are collected because the status of full text file remains scheduled for publications, or previously collected dissertation is collected as a new dissertation because an item was deleted or newly created. We believe it is desirable to reduce the percentage of non-target data by changing the linkage conditions of NDL Search, but the details are still under consideration.
- Duplicate collection As mentioned in “2. Collection of Doctoral Dissertations at the NDL,” there are cases where identical doctoral dissertations are collected and stored multiple times. While manual duplicate check is an option, automatic collection occurs every time metadata is updated, and large-scale updates also take place due to operational changes at each institutional repository. Such results in a large number of items requiring verification and it is considered difficult to continue manually verifying all of these. We are considering the possibility of mechanically identifying duplicates during the collection process. However, information required for identification is insufficient(10). As such we also expect improvements and enhancements in metadata including assignment of DOIs by degree-conferring institutions.
- Missing collections As stated in “2.1 Status of Collection,” it is estimated that approximately 14.1% of all doctoral dissertations have not been collected by the NDL. Furthermore, as shown in Table 4, the percentage of Internet publication of full text of doctoral dissertations remains less than 60%, and those that have not been published need to be collected through submission by degree-conferring institutions. We are considering making requests for deposits to universities identified as having missing collections in this survey.
5. Conclusion
This article introduced the status of the collection of doctoral dissertations at the NDL and discussed the issues faced.
We assume that it is a considerable burden for each degree-conferring institution to maintain and manage institutional repositories and to transmit dissertations to the NDL. However, we ask for their continuous cooperation in our activity in order to contribute to the long-term preservation as well as provision and use of academic research results.
* References are not translated and remain in Japanese language.
(1) A database service that collects and provides metadata of contents registered in repositories of academic institutions within Japan.
“IRDBとは”. 学術機関リポジトリデータベースサポート.
https://support.irdb.nii.ac.jp/ja, (参照 2025-04-22).
(2) The figure 15,767 does not include degrees conferred by the National Institution for Academic Degrees and Quality Enhancement of Higher Evaluation.
“令和3年度博士・修士・専門職学位の学位授与状況”. 文部科学省.
https://www.mext.go.jp/a_menu/koutou/daigakuin/detail/mext_0251203.html, (参照 2025-04-22).
(3) “データ連携 – 国立国会図書館”. 学術機関リポジトリデータベースサポート.
https://support.irdb.nii.ac.jp/ja/harvest/jpcoar/dataprovide_ndl, (参照 2025-04-22).
(4) 大園隼彦ほか. JPCOARスキーマの策定:日本の学術成果の円滑な国際的流通を目指して. 情報管理. 2018, 60 (10), p. 719-729.
https://doi.org/10.1241/johokanri.60.719, (参照 2025-04-22).
(5) According to the 2024 Annual General Meeting materials of the Japan Consortium for Open Access Repository (JPCOAR), approximately 95% of national, public, and private universities are using “JAIRO Cloud.” JAIRO Cloud is a cloud-based institutional repository service jointly operated by JPCOAR and the National Institute of Informatics (NII). The underlying software “WEKO” underwent a version upgrade from WEKO2 to WEKO3 between 2020 and 2023. WEKO3 adopts the JPCOAR schema as its standard metadata schema.
“2024年度JPCOAR総会”. JPCOAR.
https://jpcoar.repo.nii.ac.jp/records/2000581, (参照2025-04-22).
(6)Because the timing for publishing doctoral dissertations varies among degree-conferring institutions, this includes doctoral dissertations awarded prior to the fiscal year 2023 (covering multiple academic years).
(7) To ensure suitability for long-term preservation and use, submissions are to be provided in a format that does not impose printing restrictions or similar limitations.
(8) While specific data is not provided in the main text of this article, the actual value among doctoral dissertations by degree-awardees (those submitted) in the fiscal year 2021 as of March 2025 is approximately 25%
(9) 植山廣紀. “「国内博士論文の収集」の舞台裏!~国立国会図書館インタビュー”. JPCOARウェブマガジン. 2024-10-29.
https://magazine.jpcoar.org/news/17822e4f-c842-49c6-acd2-69e44b5b4e95, (参照 2025-04-22).
(10) Due to differences in descriptions in Japanese and English languages, in cases of automatic collection and electronic submission, there are discrepancies in titles, author names, degree-conferring institutions, and others.
[Accepted: 2025-05-23]
Written by
Nishimura Yoshiki
Digital Library Division,
Kansai-kan of the National Diet Library
Ito Hibiki
Digital Library Division,
Kansai-kan of the National Diet Library
Shimomura Shu
Acquisitions Administration and Bibliographic Control Division,
Acquisitions and Bibliography Department, National Diet Library
* Digital Library Division, Kansai-kan of the National Diet Library at the time of writing
Translated by Okada Aya
Digital Library Division,
Kansai-kan of the National Diet Library