About NPClassifier AI

An advanced deep-learning tool for hierarchical classification of natural products, developed at ICAR-IASRI, New Delhi.

Model Architecture

The system employs ChemBERTa, a transformer-based language model pre-trained on SMILES representations of millions of molecules. A custom hierarchical multi-head classification layer is appended for three-level natural product taxonomy prediction.

Hierarchical Taxonomy

Predictions span three levels of biological–chemical ontology:

  • Pathway — Broad biosynthetic origin (e.g., Terpenoids, Polyketides)
  • Superclass — Structural family (e.g., Monoterpenoids, Flavonoids)
  • Class — Specific structural subgroup (e.g., Iridoids, Isoflavones)

Training Data

The model was trained on a curated dataset of natural products with verified taxonomic annotations, ensuring broad coverage across known natural product chemical space.

Performance

Evaluated using stratified cross-validation with metrics including accuracy, macro-F1, and weighted-F1 across all three classification levels. Confidence scores accompany every prediction.

Technology Stack

Python
PyTorch
HuggingFace
Flask
RDKit
JavaScript

API Reference

POST /api/predict Single / multi-SMILES prediction
POST /api/predict/batch Batch prediction
GET /api/health Health check
GET /api/model-info Model metadata
GET /api/classes List all class labels

Example Request

curl -X POST http://snehasis19-cabin-npcbert.hf.space/api/predict \
  -H "Content-Type: application/json" \
  -d '{"smiles": "CC(=O)Oc1ccccc1C(=O)O"}'