Jackson Shuey
← projects
× archivedFeb 2026Mind

Drug Discovery Pipeline

A reproducible QSAR pipeline for the GLP-1 receptor, with every reported finding tied to a PubMed-verified paper.

StackPython · RDKit · scikit-learn · ChEMBL · PubMed E-utilsLinksgithub

What it predicts

Given a SMILES string for a candidate compound, the pipeline predicts pIC50 — how strongly that molecule binds to the glucagon-like peptide 1 receptor (ChEMBL1784), one of the targets that gives Ozempic and Wegovy their effect. The goal isn't to discover a drug; it's to build a reproducible end-to-end loop from public bioactivity data to a validated prediction that an interpretable model could justify.

Methodology

342 GLP-1R IC50 measurements pulled from ChEMBL, classified by activity, converted to pIC50, and described with 30 RDKit molecular descriptors — molecular weight, LogP, hydrogen-bond donors and acceptors, topological and atom-count features. An 80/20 split feeds a comparison of 42 regression models via LazyPredict; the winner gets a test R² of 0.80 with RMSE 0.73, with NumO (oxygen atom count) as the top descriptor.

Literature validation

Every finding gets cross-referenced against the published GLP-1 / QSAR literature. A PubMed RAG searches for relevant studies, extracts their reported metrics, and aligns this pipeline's methodology with the best matching paper. Each line in the final findings table carries a PMID. A credibility report flags where this work agrees with the field and where it diverges.