ScamShield: Interpretable Multi-Signal Scam Detection
Authors: Vishwajeet adkine
Affiliation: GOKULNATH HIGH SCHOOL
Publication date: 2026-05-17
Journal/archive name: NSRI Student Research Journal
Volume: 1 Issue: 1 Pages/article: Pending
DOI: Pending DOI assignment
Abstract
Scam and phishing detection systems typically rely on rigid heuristic rules or opaque large language models. This paper presents ScamShield: a hybrid, interpretable pipeline combining a 24-feature supervised machine learning model, an LLM semantic safety layer, and rule-based heuristics in a unified ensemble. Evaluated on a synthetic benchmark of 19,992 messages spanning 17 scam categories and validated on the UCI SMS Spam Collection (5,574 real-world messages), ScamShield achieves cross-validated F1 = 0.9969 ± 0.0004 on the synthetic benchmark and F1 = 0.9303 ± 0.0098 on real-world data — competitive with fine-tuned DistilBERT (estimated F1 ≈ 0.97–0.99) while requiring 125× less storage, operating at sub-5 ms inference latency, and providing full per-prediction interpretability via coefficient attribution. Statistical significance is confirmed by McNemar's test against all four baselines (p < 0.001). Adversarial evaluation reveals recall drops of 71–82% under obfuscation attacks, identifying semantic embeddings and character-level robustness as the primary improvement directions.
Keywords
Applied Science - Computer Science
Citation
References
Reference metadata is pending and must be finalized before DOI deposit.