CNN-LSTM anomaly detection model
A hybrid convolutional–recurrent architecture designed for multimodal time-series classification on resource-constrained edge devices. Three 1D convolutional blocks extract local temporal features; stacked LSTM layers capture long-range sequential patterns; a dense head produces five-class anomaly probabilities every 200 ms.
tensor
6 channels
kernel=3
kernel=3
kernel=3
Pool
dropout=0.2
dropout=0.2
ReLU
C0–C4
per class
// per-class detection thresholds (τ)
// FP32 vs INT8 accuracy cost
10,847-window multimodal dataset
Two publicly licensed benchmark datasets supplemented by school-specific synthetic data. MIT-BIH Arrhythmia Database provides physiological signals; SHAR-100-20 provides IMU activity data; AMASS-generated synthetic samples fill classroom-specific gaps absent from both benchmarks.
| Class | Label | Source | Pre-aug (n) | Post-aug (n) | Train | Val | Test* |
|---|---|---|---|---|---|---|---|
| C0 Normal | Normal activity | MIT-BIH NSR + SHAR | 5,640 | 5,640 | 3,948 | 846 | 846 |
| C1 Fall/Assault | Fall · vigorous play | SHAR | 1,519 | 1,519 | 1,063 | 228 | 228 |
| C2 Seizure | VT + VF + synthetic | MIT-BIH + AMASS | 868 | 1,302 | 911 | 195 | 130 |
| C3 Unauth exit | Running + synthetic | SHAR + AMASS | 1,843 | 1,843 | 1,291 | 276 | 276 |
| C4 Phys distress | Bradycardia + synthetic | MIT-BIH + AMASS | 977 | 1,302 | 1,009 | 145 | 148 |
| TOTAL | 10,847 | 11,606 | 8,222 | 1,690 | 1,627 | ||
// class distribution — pre-augmentation
// train / val / test split by class
Federated learning round procedure
FedAvg with Paillier homomorphic encryption. 10 participating gateways, 5 local epochs per round, 24-hour aggregation schedule, 200 total rounds. Convergence criterion: Δacc <0.3% over 10 consecutive rounds.
BROADCAST θ^(t−1) to all Gᵢ ∈ S via 5G priority channel
FOR EACH Gᵢ ∈ S (parallel, local execution): LOAD local dataset Dᵢ from encrypted 72-hr rolling buffer SET θᵢ ← θ^(t−1) FOR epoch e = 1 to E_local (= 5): SAMPLE mini-batch B ⊂ Dᵢ (size = 32) UPDATE θᵢ ← θᵢ − η · ∇θᵢ(B) [SGD, η=0.01, momentum=0.9] COMPUTE gradient update: Δθᵢ = θᵢ − θ^(t−1) ENCRYPT E(Δθᵢ) ← Paillier.Encrypt(Δθᵢ, pk) [2048-bit key] TRANSMIT E(Δθᵢ) and nᵢ (sample count) to aggregation server
AT AGGREGATION SERVER: VERIFY E(Δθᵢ) hash against PoA blockchain // reject if tampered COMPUTE E(Δθ_global) ← ⊕ᵢ (nᵢ/n_total) · E(Δθᵢ) [HE additive property] DECRYPT Δθ_global ← Paillier.Decrypt(E(Δθ_global), sk_shares) [2-of-3 key] UPDATE θ^(t) ← θ^(t−1) + Δθ_global EVALUATE θ^(t) on held-out validation shard → accuracy_val LOG PoA transaction: {round_id, Merkle_root, accuracy_val, admin_signatures}
IF |accuracy_val(t) − accuracy_val(t−10)| < 0.003: CONVERGED → RETURN BROADCAST θ^(t) to all Gᵢ ∈ S
Real-time anomaly detection — gateway inference loop
Thread 1 of the school gateway — runs continuously during school hours. Each LoRaWAN packet from a wearable triggers this procedure. Two-window hysteresis prevents false positives from transient movements.
PARSE pkt → {device_id d, timestamp ts, class_idx c, prob_vector P, gps loc} RETRIEVE 30s historical feature cache H_d from LRU cache APPEND current 5s feature window to H_d
IF len(H_d) ≥ 6 (30s of history available): X ← normalize(H_d, μ_train, σ_train) P_full ← CNN_LSTM_GPU.infer(X) [full-precision, Jetson GPU, 22ms] c_pred ← argmax(P_full) IF c_pred ≠ C0 AND P_full[c_pred] ≥ τ(c_pred): hysteresis_counter[d] += 1 IF hysteresis_counter[d] ≥ 2: // TWO CONSECUTIVE WINDOWS CONFIRMED → dispatch alert ALERT confirmed → dispatch_to_thread3(c_pred, P_full, d, ts) ELSE: hysteresis_counter[d] ← 0
// PARALLEL GEOFENCE CHECK (runs concurrently with above) IF GPS loc outside school polygon > 30s → trigger C3 alert
UPDATE LRU cache: H_d ← H_d[−5:] [retain last 30s = 6 windows]
Experimental configuration for reproducibility
All stochastic elements seeded with global seed=42, five runs with offsets 42–46. The test set was accessed only once for final model evaluation after hyperparameter search on the validation partition.
| Component | Configuration | Rationale |
|---|---|---|
| Network simulator | NS-3 v3.38 | Current stable; compatible LoRaWAN module (Magrin et al. 2019) |
| ML framework | TensorFlow 2.12 | TFLite Micro for INT8 on ESP32-C3 |
| FL framework | TensorFlow Federated 0.55 | FedAvg orchestration |
| CPU | Intel Core i9-13900K (24 cores, 5.8 GHz) | Parallel NS-3 simulation |
| GPU | NVIDIA RTX 4090 (24 GB) | CNN-LSTM training acceleration |
| RAM | 64 GB DDR5-5600 | 150-node concurrent simulation |
| Random seeds | 42, 43, 44, 45, 46 | 5 independent runs for variance estimation |
| Wearable nodes | 150 simulated ESP32-C3 | 30 classes × 5 cohorts |
| Edge gateways | 10 simulated Jetson Nano | One per 25×100m school zone |
| LoRaWAN propagation | Okumura-Hata model · SF7–SF12 adaptive | West African urban sub-1 GHz calibration |
| 5G NR model | 3GPP TR 38.913 | Standard NR performance evaluation |
| FL rounds | 200 total · convergence Δacc <0.3% | Grid search optimal vs client drift |
| Paillier key length | 2048-bit | 112-bit classical security (NIST SP 800-57) |
| Dirichlet α | 0.5 | Standard heterogeneity model (Nguyen et al. 2021) |
| Campus footprint | 2.5 hectares (250×100m) | Typical Nigerian secondary school compound |
The HE participation-confidence effect
A finding not previously characterised in the federated learning literature — cryptographic privacy guarantees have an indirect positive effect on model accuracy through institutional participation decisions.
Privacy encryption indirectly improved accuracy by +1.8 pp
The ablation study (A1→A2) shows that adding Paillier homomorphic encryption to the FL pipeline improved accuracy by 1.8 percentage points — beyond what federated learning alone achieves. This is not a direct technical effect of encryption. It reflects that by providing strong cryptographic gradient confidentiality, Paillier HE enabled school administrators who would otherwise withhold participation to join the federation, increasing training data diversity across the 10-node network. At national scale (2,000 schools), this effect would be expected to be substantially larger — potentially closing the remaining 0.4 pp gap to centralised training accuracy.
// ablation — accuracy gain per component
// HE overhead vs accuracy gain
Latency distribution — mean and tail
Reporting only the mean (127 ms) omits the tail behaviour critical for systems engineering. The 95th-percentile latency of 312 ms occurs when the hysteresis filter requires a second window confirmation — adding one full 5-second window plus transmission. Both values remain within their respective clinical thresholds.
// latency distribution across 5 simulation runs
// CEICS vs baselines — mean + 95th percentile
Six constraints on generalisability
These limitations define the boundary of the experiment — they do not invalidate the findings, but they must be resolved before field deployment. Each one is matched to a specific future research direction below.
Five priority directions for next-stage work
These directions are not aspirational additions — they are specific, bounded next steps that directly address the six limitations above.
Design Science Research methodology
CEICS adopts the Design Science Research (DSR) paradigm (Hevner et al., 2004) — the primary research artifact (CEICS) is designed, prototyped as a simulation, and evaluated against explicit performance requirements. The evaluation follows the quasi-experimental structure recommended by Wohlin et al. (2024).
| Variable type | Variable | Values / levels |
|---|---|---|
| Independent | Processing paradigm | Centralised cloud · Edge-only · Standard FL · CEICS |
| Dependent | Detection accuracy | Macro accuracy, Precision, Recall, F1, AUC-ROC |
| Dependent | System latency | L_E2E = T_sensor + T_inference + T_hysteresis + T_tx |
| Dependent | Network bandwidth | B_rel = B_CEICS / B_cloud × 100% |
| Dependent | Privacy resilience | IRR = (N_thwarted / N_attempts) × 100% |
| Control | CNN-LSTM architecture | Identical across all four paradigms |
| Statistical | Primary comparison | One-way ANOVA · Tukey HSD · Bonferroni α=0.000167 |
| Statistical | Secondary comparisons | Welch t-tests · Bonferroni-corrected · 95% CIs |
FL convergence trajectory — accuracy ±SD vs round
Convergence gap at Round 200 is non-significant (Welch t(7.2)=1.94, p=0.09), confirming statistical parity with centralised training while preserving complete data locality. Communication cost at convergence (486 MB) is 73% lower than equivalent raw sensor stream volume (1,800 MB).
| FL Round | CEICS Accuracy (%) ±SD | 95% CI | Centralised (%) | Gap (pp) | Comm. Cost (MB) |
|---|---|---|---|---|---|
| 20 | 79.2 ±1.82 | [77.0, 81.4] | 95.7 | 16.5 | 48.6 |
| 60 | 86.4 ±1.11 | [85.0, 87.8] | 95.7 | 9.3 | 145.8 |
| 100 | 93.4 ±0.71 | [92.5, 94.3] | 95.7 | 2.3 | 243.0 |
| 120 | 94.1 ±0.58 | [93.4, 94.8] | 95.7 | 1.6 | 291.6 |
| 160 | 94.9 ±0.44 | [94.4, 95.4] | 95.7 | 0.8 | 388.8 |
| 200 ★ | 95.3 ±0.41 | [94.8, 95.7] | 95.7 | 0.4 | 486.0 |
Detection performance — CEICS vs all baselines
One-way ANOVA: F(3,16)=847.3, p<0.001, η²=0.994. Model type explains 99.4% of observed variance. All Tukey HSD pairwise comparisons significant at p<0.0001 (Bonferroni-corrected α=0.000167). Test set n=1,627 windows.
| Model | Accuracy ±SD | Precision ±SD | Recall ±SD | F1 ±SD | AUC-ROC ±SD | Δ F1 vs CEICS | Cohen's d |
|---|---|---|---|---|---|---|---|
| Centralised Cloud | 88.4 ±0.61% | 87.1 ±0.72% | 86.9 ±0.68% | 87.0 ±0.69% | 0.941 ±0.005 | −7.7 pp*** | 16.8 |
| Edge-Only | 84.1 ±0.53% | 82.7 ±0.61% | 83.4 ±0.58% | 83.0 ±0.59% | 0.912 ±0.007 | −11.7 pp*** | 27.3 |
| Standard FL (no HE) | 92.8 ±0.49% | 91.9 ±0.53% | 92.2 ±0.51% | 92.0 ±0.52% | 0.961 ±0.004 | −2.7 pp*** | 6.1 |
| CEICS (proposed) | 95.3 ±0.41% | 94.9 ±0.44% | 94.6 ±0.43% | 94.7 ±0.43% | 0.981 ±0.003 | Reference | — |
Per-class detection performance ±SD
C4 Physiological Distress shows the highest inter-run variance (SD=0.71%) reflecting within-class heterogeneity — conflating hypoxia, cardiac irregularity, and acute stress — and sensitivity to non-IID partition assignment. C2 Seizure recall (99.1%) is statistically equivalent to the best specialized single-class detector (IIETA 2025: 99.0%, p=0.65).
| Class | Precision ±SD | Recall ±SD | F1 ±SD | Support (n) | Best prior recall | Δ pp |
|---|---|---|---|---|---|---|
| C0 Normal | 97.2 ±0.31% | 96.8 ±0.34% | 97.0 ±0.32% | 845 | — | — |
| C1 Fall/Assault | 95.1 ±0.48% | 94.3 ±0.52% | 94.7 ±0.49% | 228 | SHAR: 91.2% | +3.1*** |
| C2 Seizure | 98.7 ±0.39% | 99.1 ±0.41% | 98.9 ±0.39% | 130 | Masci 2025: 91.4% | +7.7*** |
| C3 Unauth. Exit | 95.8 ±0.43% | 96.5 ±0.46% | 96.2 ±0.44% | 276 | GPS base: 93.1% | +3.4*** |
| C4 Phys. Distress | 91.8 ±0.68% | 90.6 ±0.71% | 91.2 ±0.71% | 148 | MIT-BIH BR: 88.3% | +2.3** |
Privacy and security performance
CEICS achieved 78.4%±1.8% intrusion resistance across 1,000 simulated attacks (400 passive eavesdropping, 300 active MITM, 300 FL replay). The dominant vulnerability for the cloud baseline is data volume in transit — edge-local processing is the critical privacy control, not just encryption.
| Security metric | Centralised cloud | Standard FL | CEICS | Δ vs Cloud | Test statistic |
|---|---|---|---|---|---|
| Overall IRR ±SD | 22.1 ±2.3% | 61.4 ±2.1% | 78.4 ±1.8% | +56.3 pp*** | t(6.8)=44.7, d=28.2 |
| IRR: Passive eavesdrop | 11.4 ±1.9% | N/A | 97.2 ±0.8% | +85.8 pp*** | t(4.6)=89.4, d=46.1 |
| IRR: MITM 5G backhaul | 31.2 ±3.1% | N/A | 71.3 ±2.4% | +40.1 pp*** | t(7.2)=23.6, d=14.9 |
| IRR: FL replay attack | N/A | 43.4 ±2.8% | 63.7 ±3.1% | +20.3 pp*** | t(7.9)=11.7, d=6.9 |
| HE overhead ±SD | N/A | N/A | 8.7 ±0.3% | — | Stable; pooled SD=0.31% |
| Blockchain TPS ±SD | N/A | N/A | 84 ±2.1 | — | 6.3× max alert rate |
| Audit completeness | ~71% | 0% | 100% | +29 pp | — |
| False alert rate ±SD | 4.8 ±0.41% | 3.6 ±0.38% | 2.1 ±0.29% | −2.7 pp*** | t(6.3)=12.4, d=7.6 |
C2 (Seizure) threshold sensitivity analysis
Post-hoc sensitivity analysis across τ ∈ {0.70, 0.75, 0.80, 0.85, 0.90, 0.92, 0.95} for the most safety-critical class. The deployed threshold τ=0.92 was selected as optimal: it satisfies the clinical recall floor of ≥0.93 derived from Shoaib et al. (2021) pediatric emergency alert literature, while achieving a false alert rate of only 1.6%. The stricter τ=0.95 was rejected because recall drops to 97.4% — below the 0.93 clinical floor.
| τ (C2) | C2 Precision | C2 Recall | C2 F1 | False Alert Rate | Clinical assessment |
|---|---|---|---|---|---|
| 0.70 | 84.1% | 99.8% | 91.3% | 11.4% | Unacceptable — FAR exceeds 5% deployment limit |
| 0.75 | 89.3% | 99.6% | 94.2% | 7.6% | Borderline — FAR above 5% limit |
| 0.80 | 93.8% | 99.4% | 96.5% | 4.2% | Acceptable — FAR within limit, high recall |
| 0.85 | 96.1% | 99.3% | 97.7% | 2.8% | Good — balanced precision-recall |
| 0.90 | 97.9% | 99.2% | 98.5% | 1.9% | Very good — low FAR, high recall |
| 0.92 ★ | 98.7% | 99.1% | 98.9% | 1.6% | ★ Selected — optimal F1; recall ≥0.93 satisfied |
| 0.95 | 99.1% | 97.4% | 98.2% | 1.2% | Rejected — recall < 0.93 clinical floor |