A new biomedical corpus covering molecular to higher-order events related to idiopathic pulmonary fibrosis (IPF) has been constructed with the aim of clarifying the pathogenetic mechanisms of IPF.

The corpus, reported in Scientific Reports, is freely available and can also be used to find information on other lung diseases.

“Text-mining systems have been developed for biomedical research, with information extraction algorithms and corpora corresponding particularly to systems biology, for which pathways and networks are often constructed,” the authors wrote. “This work particularly examines the annotation of IPF-related entities, events, and relations to facilitate the automatic extraction of IPF-related information from scientific texts.”


Continue Reading

Currently, there is limited availability of trained annotators with IPF knowledge. The research team created the new corpus by training a text mining system with 150 manually selected abstracts with 9297 entities related to IPF entities, events, and pathogenetic mechanisms. The 150 abstracts were selected from 6500 major medical journals registered in PubMed from 2013 to 2018.

Read more about IPF guidelines

Automatic and manual annotation were employed to create the corpus, and its performance was analyzed with regard to finding missing links and extracting molecules related to inflammation and fibrosis as well as to acute exacerbation and progressive respiratory failure.

Furthermore, upstream regulatory molecules of the extracted molecules can also be found using this corpus, unlike in any previously available corpora, which the researchers expect will assist with the search for treatment approaches for IPF.

The authors note that although the focus of this corpus is on IPF, it can be used to find information on other lung diseases such as lung cancer and interstitial pneumonia due to COVID-19, given that certain events and entities of IPF are also related to those diseases.  

Reference

Nagano N, Tokunaga N, Ikeda, M. et al. A novel corpus of molecular to higher-order events that facilitates the understanding of the pathogenic mechanisms of idiopathic pulmonary fibrosis. Sci Rep. Published online April 12, 2023. doi.10.1038/s41598-023-32915