About NPClassifier AI
An advanced deep-learning tool for hierarchical classification of natural products, developed at ICAR-IASRI, New Delhi.
Model Architecture
The system employs ChemBERTa, a transformer-based language model pre-trained on SMILES representations of millions of molecules. A custom hierarchical multi-head classification layer is appended for three-level natural product taxonomy prediction.
Hierarchical Taxonomy
Predictions span three levels of biological–chemical ontology:
- Pathway — Broad biosynthetic origin (e.g., Terpenoids, Polyketides)
- Superclass — Structural family (e.g., Monoterpenoids, Flavonoids)
- Class — Specific structural subgroup (e.g., Iridoids, Isoflavones)
Training Data
The model was trained on a curated dataset of natural products with verified taxonomic annotations, ensuring broad coverage across known natural product chemical space.
Performance
Evaluated using stratified cross-validation with metrics including accuracy, macro-F1, and weighted-F1 across all three classification levels. Confidence scores accompany every prediction.
Technology Stack
API Reference
/api/predict
Single / multi-SMILES prediction
/api/predict/batch
Batch prediction
/api/health
Health check
/api/model-info
Model metadata
/api/classes
List all class labels
Example Request
curl -X POST http://snehasis19-cabin-npcbert.hf.space/api/predict \
-H "Content-Type: application/json" \
-d '{"smiles": "CC(=O)Oc1ccccc1C(=O)O"}'