Credit: Getty Images

Medical research, in crude terms, is all about generating data. The more data we have about a disease—its prevalence, incidence, and the biological processes underpinning it—the better we get at developing therapies that save lives. Generated data can also form the foundation for future research, which in turn leads to more data. We know we have reached an adequate point of information when we know enough about a disease to develop treatments good enough to allow patients to live a relatively normal life. 

However, this is more difficult to achieve with rare diseases; by definition, rare diseases are diseases that occur much less frequently in a population, meaning that any data that we have on them will also be severely limited. This sometimes has the unhappy result of discouraging further research, meaning that our understanding of certain rare diseases will always remain sparse. 

Machine learning has been touted as a possible solution to this problem. Machine learning is a form of artificial intelligence that “learns” based on an initial input of data (which can range from being limited to abundant) and then makes predictions based on it. For example, machine learning can be used to predict the weather of a locality based on historical weather data. The workings of machine learning are incredibly complex and nuanced; machine learning is also more advanced than most people think and has every potential to change the world—including, of course, the world of medicine. 

Continue Reading

In this article, we will look at the efforts of a team of researchers to use machine learning to predict the severity of hemophilia A (HA), a rare disease with relatively limited data. This study was published in npj Systems Biology and Applications.

The Shortfalls of Current Treatments

Hemophilia A is an X-linked disease caused by defective copies of the coagulation factor VIII gene, which interrupts the body’s natural coagulation pathways and causes excessive bleeding. The disease ranges from being mild (ie, when the patient experiences rare bleeding episodes) to severe (ie, when the patient experiences permanent bleeding complications, such as joint damage).

Hemophilia A is one of the few rare diseases in which we have a firm grasp of its pathophysiology. Once the coagulation pathway has been mapped out and the culprit identified (ie, the missing coagulation factor VIII), scientists were able to develop replacement therapy to replace the defective FVIII protein. This has drastically improved the quality of life and life expectancy of patients. 

However, researchers of this study have identified specific areas of weakness in current replacement therapy regimes. First, the half-life of recombinant FVIII proteins needs to be made longer to reduce the need for frequent infusions. Second, its immunogenic profile needs to be improved to stop neutralizing antibodies from developing. Third, recombinant proteins that are suitable for both the prophylaxis and treatment of serious bleeding episodes need to be developed. 

For these improvements to be made, we need to know more about the FVIII protein structure. Existing medical literature point to a single amino acid that is related to the severity of the disease. However, “the lack of strict data curation and the analysis of each property in isolation prevented these methods from predicting and mechanistically understanding the occurrence of mild, moderate, and severe phenotypes,” the authors of the study wrote. 

A Bigger Protein Picture

Researchers attempted to address this problem by using machine learning methods to analyze all of the protein properties in conjunction and named this framework “Hema-Class”. In essence, machine learning uses data to uncover patterns and generate more data; thus, the more input data the better. However, given that hemophilia A is a rare disease, existing clinical and molecular data are limited.

So, what can be done? Researchers solved this problem by “establishing a systematic data curation strategy”, and after training Hema-Class with a limited amount of data, “challenged it with prediction tasks of increasing difficulty to gradually fine-tune its parameters.” The method for doing so is highly technical, so, for the sake of brevity, we will not delve into it too deeply here.

The study team also used Hema-Class to predict the severity of all possible FVIII mutations, including those yet to be reported in the medical literature. This allows us to better understand the FVIII protein by knowing which mutations are actually detrimental to its function. In addition, researchers intentionally designed Hema-Class to be an open-source system in which future hemophilia A mutations can be fed into it and it can be retrained. 

Read more about the types of hemophilia

Researchers created a representation of the FVIII protein, which enabled them to verify substitutions to the critical residues of the protein that results in detrimental FVIII function. After they were able to collect and evaluate structural properties of the FVIII protein, they were then able to use machine learning to make predictions on disease severity. 

In addition, Hema-Class was used to analyze 344 alanine mutations in the A2 and C2 domains. “We verified a close agreement between the in silico and in vitro results, as evidenced by the fact that the most dramatic reductions in the chromogenic and secretion activities of the FVIII protein were accompanied by high Severity Scores,” the authors wrote.

This shows that machine learning can be used to capture FVIII properties that were previously only observable using in vitro assays. Importantly, the framework can be used by researchers to predict all possible mutations of the FVIII protein. 

A Bright Future 

The use of machine learning to analyze medical data is truly exciting; it opens the door for us to uncover latent secrets in other rare diseases as well.

Remember what this is all about in the case of hemophilia A: developing better drugs that have a longer half-life and improved immunogenic profiles and are more effective as both prophylaxis and treatment in severe bleeding episodes. In other words, it is about saving lives.


Lopes TJS, Rios R, Nogueira, T, et al. Prediction of hemophilia A severity using a small-input machine-learning framework. npj Syst Biol Appl. Published online May 25, 2021. doi:10.1038/s41540-021-00183-9

Peters R, Harris T. Advances and innovations in haemophilia treatment. Nat Rev Drug Discov. 2018;17(7):493-508. doi:10.1038/nrd.2018.70