HELI-Cas

High-dimensional Ensemble Learning for Identification of Cas proteins

A binary machine-learning classifier that predicts whether a single protein sequence is a CRISPR-associated (Cas) protein — using a multi-modal feature pipeline and a 5-fold stacked ensemble.

Binary: Cas vs Non-Cas
5-fold stacked ensemble
PSSM · HMM · physicochemical features
Export CSV · JSON · PDF
Tool Workflow
Feature pipeline
PSSM · HMM · physico-chem · disorder
5-fold ensemble
CatBoost + LightGBM
Input FASTA
50–2000 aa
Prediction
Cas / Non-Cas + confidence
From one FASTA to a Cas / Non-Cas call.