Herin Setianingsih; Nasywa Zahra Sajida Tsuroyya; Hilyatuz Zahroh; Eka Diyah Putri Lestari; Didik Huswo Utomo; Muhammad Rezki Rayak
Abstract
Protein target identification is a crucial part of drug discovery. This study used a machine learning approach to screen the potential target from marine natural products. A total of ...
Read More
Protein target identification is a crucial part of drug discovery. This study used a machine learning approach to screen the potential target from marine natural products. A total of 6,314 compounds from 11 marine taxa were collected from CMNPD or the Comprehensive Marine Natural Products Database as drug repurposing candidates for COVID-19. SARS-CoV-2 well identified proteins, including Spike, PLpro, Mpro, Nucleocapsid, ORF9b, ORF3a, and ORF8, are designed as protein targets. The supervised learning classification method that we use consists of three data processing, namely logistic regression (LR), super vector machine (SVM), and random forest (RF). Machine learning is carried out using algorithm found in scikit-learn. We also carried out a deep learning model approach and predict active compounds by applying the algorithm to h2o.ai. Finally, reverse docking approach was also used to get reliable result. The result revealed that compounds from bryozoan, sponge, and bacteria have the best binding affinity score for spike proteins. The best model of machine learning is the LR model. The compilation results of screening predictions from both machine learning and deep learning showed more consistent results and were proven to show more stable bond interactions than compounds that were predicted to have activity in just one of the screening methods.