// §3.3 — Model architecture

CNN-LSTM anomaly detection model

A hybrid convolutional–recurrent architecture designed for multimodal time-series classification on resource-constrained edge devices. Three 1D convolutional blocks extract local temporal features; stacked LSTM layers capture long-range sequential patterns; a dense head produces five-class anomaly probabilities every 200 ms.

487,621
Total parameters (FP32)
3×Conv1D + 2×LSTM + Dense + Softmax
74.4%
Compression — INT8 quantization
4.18 MB → 1.07 MB · −1.2 pp accuracy cost
22 ms
Inference time on Jetson Nano GPU
Down from 47 ms FP32 · TensorFlow Lite Micro
Input
250×6
tensor
5s @ 50Hz
6 channels
Conv1D
32 filters
kernel=3
BN · ReLU
Conv1D
64 filters
kernel=3
BN · ReLU
Conv1D
128 filters
kernel=3
BN · ReLU
GlobalMax
Pool
dim reduction
→ 128-d vec
LSTM
64 units
dropout=0.2
LSTM
32 units
dropout=0.2
Dense
64 units
ReLU
Softmax
5 classes
C0–C4
τ thresholds
per class
Input tensor: Each 250×6 window contains 5 seconds of data at 50 Hz across 6 channels — ax, ay, az (IMU), HR (PPG), SpO₂ (PPG), EDA. GPS coordinates are processed separately through the geofencing module contributing to C3 classification independently of the CNN-LSTM pipeline.

// per-class detection thresholds (τ)

// FP32 vs INT8 accuracy cost

// §3.6 — Dataset composition

10,847-window multimodal dataset

Two publicly licensed benchmark datasets supplemented by school-specific synthetic data. MIT-BIH Arrhythmia Database provides physiological signals; SHAR-100-20 provides IMU activity data; AMASS-generated synthetic samples fill classroom-specific gaps absent from both benchmarks.

ClassLabelSourcePre-aug (n)Post-aug (n)TrainValTest*
C0 NormalNormal activityMIT-BIH NSR + SHAR5,6405,6403,948846846
C1 Fall/AssaultFall · vigorous playSHAR1,5191,5191,063228228
C2 SeizureVT + VF + syntheticMIT-BIH + AMASS8681,302911195130
C3 Unauth exitRunning + syntheticSHAR + AMASS1,8431,8431,291276276
C4 Phys distressBradycardia + syntheticMIT-BIH + AMASS9771,3021,009145148
TOTAL10,84711,6068,2221,6901,627
Anti-leakage note: SMOTE-NC oversampling (k=5) applied to training partition only. Test partition uses original pre-augmentation samples exclusively. AMASS-generated synthetic samples are confined to the training set. Dirichlet (α=0.5) non-IID distribution across 10 gateways produced mean Gini coefficient 0.52 ± 0.09.

// class distribution — pre-augmentation

// train / val / test split by class

Preprocessing pipeline (6 stages): (1) Butterworth LPF 20 Hz for IMU, bandpass 0.5–40 Hz for PPG, LPF 5 Hz for EDA. (2) 5s windows at 250 samples/50 Hz, 50% overlap. (3) Per-channel z-score normalisation using training-set statistics. (4) Linear interpolation for 1–3 missing samples; exclusion for >3 gaps (2.3% excluded). (5) SMOTE-NC oversampling to 12% training floor for C2 and C4. (6) Dirichlet α=0.5 non-IID partitioning.
// §3.4 — Algorithm 1

Federated learning round procedure

FedAvg with Paillier homomorphic encryption. 10 participating gateways, 5 local epochs per round, 24-hour aggregation schedule, 200 total rounds. Convergence criterion: Δacc <0.3% over 10 consecutive rounds.

Algorithm 1 — CEICS Federated Learning Round · Aggregation Server §3.4
// INPUT: Global model θ^(t−1), node set S={G₁,…,G_N}, Paillier key (pk, sk_shares) // OUTPUT: Updated global model θ^(t), PoA blockchain hash
BROADCAST θ^(t−1) to all Gᵢ ∈ S via 5G priority channel
FOR EACH Gᵢ ∈ S (parallel, local execution): LOAD local dataset Dᵢ from encrypted 72-hr rolling buffer SET θᵢ ← θ^(t−1) FOR epoch e = 1 to E_local (= 5): SAMPLE mini-batch B ⊂ Dᵢ (size = 32) UPDATE θᵢ ← θᵢ − η · ∇θᵢ(B) [SGD, η=0.01, momentum=0.9] COMPUTE gradient update: Δθᵢ = θᵢ − θ^(t−1) ENCRYPT E(Δθᵢ) ← Paillier.Encrypt(Δθᵢ, pk) [2048-bit key] TRANSMIT E(Δθᵢ) and nᵢ (sample count) to aggregation server
AT AGGREGATION SERVER: VERIFY E(Δθᵢ) hash against PoA blockchain // reject if tampered COMPUTE E(Δθ_global) ← ⊕ᵢ (nᵢ/n_total) · E(Δθᵢ) [HE additive property] DECRYPT Δθ_global ← Paillier.Decrypt(E(Δθ_global), sk_shares) [2-of-3 key] UPDATE θ^(t) ← θ^(t−1) + Δθ_global EVALUATE θ^(t) on held-out validation shard → accuracy_val LOG PoA transaction: {round_id, Merkle_root, accuracy_val, admin_signatures}
IF |accuracy_val(t) − accuracy_val(t−10)| < 0.003: CONVERGEDRETURN BROADCAST θ^(t) to all Gᵢ ∈ S
// §3.3 — Algorithm 2

Real-time anomaly detection — gateway inference loop

Thread 1 of the school gateway — runs continuously during school hours. Each LoRaWAN packet from a wearable triggers this procedure. Two-window hysteresis prevents false positives from transient movements.

Algorithm 2 — CEICS Real-Time Anomaly Detection · School Gateway · Thread 1 §3.3
// INPUT: LoRaWAN packet pkt from wearable device d // Runs every 200 ms per device · GPU-accelerated on Jetson Nano
PARSE pkt{device_id d, timestamp ts, class_idx c, prob_vector P, gps loc} RETRIEVE 30s historical feature cache H_d from LRU cache APPEND current 5s feature window to H_d
IF len(H_d) ≥ 6 (30s of history available): Xnormalize(H_d, μ_train, σ_train) P_fullCNN_LSTM_GPU.infer(X) [full-precision, Jetson GPU, 22ms] c_predargmax(P_full) IF c_pred ≠ C0 AND P_full[c_pred] ≥ τ(c_pred): hysteresis_counter[d] += 1 IF hysteresis_counter[d] ≥ 2: // TWO CONSECUTIVE WINDOWS CONFIRMED → dispatch alert ALERT confirmed → dispatch_to_thread3(c_pred, P_full, d, ts) ELSE: hysteresis_counter[d]0
// PARALLEL GEOFENCE CHECK (runs concurrently with above) IF GPS loc outside school polygon > 30s → trigger C3 alert
UPDATE LRU cache: H_dH_d[−5:] [retain last 30s = 6 windows]
Hysteresis design rationale: Two consecutive 5-second windows (10 seconds total) must both exceed the class-specific threshold τ before an alert is dispatched. A child who jumps up and knocks their badge produces a single-window spike that resets the counter. A genuine sustained event (seizure, fall, assault) persists across both windows and fires. This design reduces the false alert rate from ~8% to 2.1%.
// §4.1 — Simulation environment (Table 3)

Experimental configuration for reproducibility

All stochastic elements seeded with global seed=42, five runs with offsets 42–46. The test set was accessed only once for final model evaluation after hyperparameter search on the validation partition.

ComponentConfigurationRationale
Network simulatorNS-3 v3.38Current stable; compatible LoRaWAN module (Magrin et al. 2019)
ML frameworkTensorFlow 2.12TFLite Micro for INT8 on ESP32-C3
FL frameworkTensorFlow Federated 0.55FedAvg orchestration
CPUIntel Core i9-13900K (24 cores, 5.8 GHz)Parallel NS-3 simulation
GPUNVIDIA RTX 4090 (24 GB)CNN-LSTM training acceleration
RAM64 GB DDR5-5600150-node concurrent simulation
Random seeds42, 43, 44, 45, 465 independent runs for variance estimation
Wearable nodes150 simulated ESP32-C330 classes × 5 cohorts
Edge gateways10 simulated Jetson NanoOne per 25×100m school zone
LoRaWAN propagationOkumura-Hata model · SF7–SF12 adaptiveWest African urban sub-1 GHz calibration
5G NR model3GPP TR 38.913Standard NR performance evaluation
FL rounds200 total · convergence Δacc <0.3%Grid search optimal vs client drift
Paillier key length2048-bit112-bit classical security (NIST SP 800-57)
Dirichlet α0.5Standard heterogeneity model (Nguyen et al. 2021)
Campus footprint2.5 hectares (250×100m)Typical Nigerian secondary school compound
// §4.7 + §4.9 — Novel finding

The HE participation-confidence effect

A finding not previously characterised in the federated learning literature — cryptographic privacy guarantees have an indirect positive effect on model accuracy through institutional participation decisions.

// Novel contribution — ablation A2 − A1

Privacy encryption indirectly improved accuracy by +1.8 pp

The ablation study (A1→A2) shows that adding Paillier homomorphic encryption to the FL pipeline improved accuracy by 1.8 percentage points — beyond what federated learning alone achieves. This is not a direct technical effect of encryption. It reflects that by providing strong cryptographic gradient confidentiality, Paillier HE enabled school administrators who would otherwise withhold participation to join the federation, increasing training data diversity across the 10-node network. At national scale (2,000 schools), this effect would be expected to be substantially larger — potentially closing the remaining 0.4 pp gap to centralised training accuracy.

// ablation — accuracy gain per component

// HE overhead vs accuracy gain

// §4.5 — Performance transparency

Latency distribution — mean and tail

Reporting only the mean (127 ms) omits the tail behaviour critical for systems engineering. The 95th-percentile latency of 312 ms occurs when the hysteresis filter requires a second window confirmation — adding one full 5-second window plus transmission. Both values remain within their respective clinical thresholds.

// latency distribution across 5 simulation runs

// CEICS vs baselines — mean + 95th percentile

Clinical thresholds: Mean latency 127 ms meets the sub-150 ms immediate-response requirement. The 95th-percentile 312 ms (second-window hysteresis events) remains well within the 500 ms non-immediate escalation threshold from Shoaib et al. (2021). Both are orders of magnitude within the 2–5 minute first-response window for pediatric status epilepticus.
// §4.10 — Limitations

Six constraints on generalisability

These limitations define the boundary of the experiment — they do not invalidate the findings, but they must be resolved before field deployment. Each one is matched to a specific future research direction below.

LIMITATION 01
Simulation-based evaluation only
NS-3 cannot capture all real-world RF propagation variability in Nigerian school buildings. Real-world LoRaWAN latency may be 20–40% higher than simulated values due to structural interference, humidity, and vegetation.
Addresses: Future direction 1 (NE Nigeria field trials)
LIMITATION 02
Adult benchmark datasets
MIT-BIH and SHAR-100-20 were collected from adults (median age 58 and 30 respectively). Pediatric physiological signals differ in HR range (60–180 vs 60–100 bpm), smaller EDA amplitude, and higher baseline activity. Estimated C2 and C4 recall reduction: 3–7% until locally calibrated training data is available.
Addresses: Future direction 1 (pediatric dataset collection)
LIMITATION 03
Device homogeneity
All 150 wearable nodes are simulated as identical ESP32-C3 devices. Real deployments will include manufacturer variations in sensor accuracy, clock drift, and battery capacity across procurement batches.
Mitigated by: Per-channel z-score normalisation at inference time
LIMITATION 04
Fixed single-building topology
The 2.5-hectare single-building simulation does not model multi-building campuses, underground passages, or GPS-occluded corridors where satellite signal is degraded. These are common in urban Nigerian school environments.
Addresses: Future direction 4 (hierarchical FL for multi-campus)
LIMITATION 05
Limited simulation run count (n=5)
Five simulation runs provide limited statistical power for rare-class conclusions. The C2 test set contains only 130 windows, producing wider confidence intervals for the 99.1% recall claim than would be available with n≥30 runs or a larger pediatric dataset.
Confidence intervals correctly reported; wider n=5 CIs are disclosed
LIMITATION 06
No Byzantine adversarial evaluation
The FedAvg protocol is known to be vulnerable to gradient poisoning attacks (Kairouz et al., 2021). This work evaluates passive eavesdropping, MITM, and replay attacks — but not adversarial gradient injection from compromised gateways.
Addresses: Future direction 2 (Byzantine-robust aggregation)
// §5 — Future research directions

Five priority directions for next-stage work

These directions are not aspirational additions — they are specific, bounded next steps that directly address the six limitations above.

1
Prospective field trials — northeast Nigeria
Deploy CEICS at 3–5 schools in Borno and Yobe states with full ethical oversight, participatory design sessions with school communities, and collection of pediatric physiological baseline data to replace the adult benchmark datasets. Primary outcome: real-world latency, battery life, and false alert rate under Nigerian RF conditions.
Requires: NITDA ethics clearance · NBHIS data sharing agreement · UNICEF Nigeria partnership
2
Byzantine-robust FL aggregation
Integrate Krum and Coordinate-wise Median aggregation rules as alternatives to FedAvg for environments where some gateways may be physically captured and reprogrammed by hostile actors. Evaluate under simulated gradient poisoning attacks (10%, 30%, 50% compromised node ratios).
Algorithms: Blanchard et al. (2017) Krum · Yin et al. (2018) CoordMedian
3
Post-quantum cryptographic primitives
Migrate from Paillier (2048-bit classical security) to CRYSTALS-Kyber (key encapsulation) and CRYSTALS-Dilithium (signatures). Paillier's 112-bit classical security is theoretically breakable by future quantum adversaries. With a 10–15 year deployment horizon, post-quantum migration should be designed in from the next version.
Standards: NIST PQC Round 3 winners · FIPS 203/204/205
4
Hierarchical federated learning for national scale
Extend from 10-node simulation to hierarchical FL architecture supporting 2,000+ schools across Nigeria's 36 states. Local aggregation at state level, global aggregation at federal level. Expected to close the remaining 0.4 pp accuracy gap to centralised training through larger federation diversity, and amplify the HE participation-confidence effect.
Reference: Deng et al. (2021) SHARE hierarchical FL architecture
5
Integration with national emergency dispatch systems
Develop standardised API integration between CEICS blockchain alert tokens and the Nigeria Police Force Command and Control system, NEMA emergency dispatch, and UNICEF Rapid Response teams. Standardised alert token format (JSON-LD) with Oslo SSD incident category codes would enable direct incident reporting without manual transcription.
Standards: ITU-T X.1500 IODEF · W3C JSON-LD · Oslo SSD Annex B reporting format
// §3.1 — Research design

Design Science Research methodology

CEICS adopts the Design Science Research (DSR) paradigm (Hevner et al., 2004) — the primary research artifact (CEICS) is designed, prototyped as a simulation, and evaluated against explicit performance requirements. The evaluation follows the quasi-experimental structure recommended by Wohlin et al. (2024).

DSR
Research paradigm
Hevner et al., 2004 · MIS Quarterly
5
Independent simulation runs
Seeds 42–46 · variance estimation
Test set access
Accessed once after hyperparameter search
Variable typeVariableValues / levels
IndependentProcessing paradigmCentralised cloud · Edge-only · Standard FL · CEICS
DependentDetection accuracyMacro accuracy, Precision, Recall, F1, AUC-ROC
DependentSystem latencyL_E2E = T_sensor + T_inference + T_hysteresis + T_tx
DependentNetwork bandwidthB_rel = B_CEICS / B_cloud × 100%
DependentPrivacy resilienceIRR = (N_thwarted / N_attempts) × 100%
ControlCNN-LSTM architectureIdentical across all four paradigms
StatisticalPrimary comparisonOne-way ANOVA · Tukey HSD · Bonferroni α=0.000167
StatisticalSecondary comparisonsWelch t-tests · Bonferroni-corrected · 95% CIs
Replication protocol: All stochastic elements were seeded with global seed=42 (five runs with offsets 42–46). The validation partition was used for hyperparameter search (grid search). The test set was accessed exactly once for final model evaluation, after all hyperparameter decisions were locked. This prevents any form of test-set leakage into model selection.
// §4.2 — Table 4 (published)

FL convergence trajectory — accuracy ±SD vs round

Convergence gap at Round 200 is non-significant (Welch t(7.2)=1.94, p=0.09), confirming statistical parity with centralised training while preserving complete data locality. Communication cost at convergence (486 MB) is 73% lower than equivalent raw sensor stream volume (1,800 MB).

FL RoundCEICS Accuracy (%) ±SD95% CICentralised (%)Gap (pp)Comm. Cost (MB)
2079.2 ±1.82[77.0, 81.4]95.716.548.6
6086.4 ±1.11[85.0, 87.8]95.79.3145.8
10093.4 ±0.71[92.5, 94.3]95.72.3243.0
12094.1 ±0.58[93.4, 94.8]95.71.6291.6
16094.9 ±0.44[94.4, 95.4]95.70.8388.8
200 ★95.3 ±0.41[94.8, 95.7]95.70.4486.0
★ Convergence criterion: |accuracy_val(t) − accuracy_val(t−10)| < 0.003. Gap at Round 200 (0.4 pp) is non-significant (p=0.09). All gaps at Rounds 20–160 are significant (p<0.001). SD reported across 5 runs (seeds 42–46); 95% CI via t-distribution (df=4).
// §4.3 — Table 5 (published)

Detection performance — CEICS vs all baselines

One-way ANOVA: F(3,16)=847.3, p<0.001, η²=0.994. Model type explains 99.4% of observed variance. All Tukey HSD pairwise comparisons significant at p<0.0001 (Bonferroni-corrected α=0.000167). Test set n=1,627 windows.

ModelAccuracy ±SDPrecision ±SDRecall ±SDF1 ±SDAUC-ROC ±SDΔ F1 vs CEICSCohen's d
Centralised Cloud88.4 ±0.61%87.1 ±0.72%86.9 ±0.68%87.0 ±0.69%0.941 ±0.005−7.7 pp***16.8
Edge-Only84.1 ±0.53%82.7 ±0.61%83.4 ±0.58%83.0 ±0.59%0.912 ±0.007−11.7 pp***27.3
Standard FL (no HE)92.8 ±0.49%91.9 ±0.53%92.2 ±0.51%92.0 ±0.52%0.961 ±0.004−2.7 pp***6.1
CEICS (proposed)95.3 ±0.41%94.9 ±0.44%94.6 ±0.43%94.7 ±0.43%0.981 ±0.003Reference
*** p<0.0001 (Tukey HSD, Bonferroni-corrected α=0.000167). n=5 runs (seeds 42–46). Test set n=1,627 windows. Macro-averaging selected because false negatives for rare emergency classes carry greater consequence than errors on majority Normal class.
// §4.4 — Table 6 (published)

Per-class detection performance ±SD

C4 Physiological Distress shows the highest inter-run variance (SD=0.71%) reflecting within-class heterogeneity — conflating hypoxia, cardiac irregularity, and acute stress — and sensitivity to non-IID partition assignment. C2 Seizure recall (99.1%) is statistically equivalent to the best specialized single-class detector (IIETA 2025: 99.0%, p=0.65).

ClassPrecision ±SDRecall ±SDF1 ±SDSupport (n)Best prior recallΔ pp
C0 Normal97.2 ±0.31%96.8 ±0.34%97.0 ±0.32%845
C1 Fall/Assault95.1 ±0.48%94.3 ±0.52%94.7 ±0.49%228SHAR: 91.2%+3.1***
C2 Seizure98.7 ±0.39%99.1 ±0.41%98.9 ±0.39%130Masci 2025: 91.4%+7.7***
C3 Unauth. Exit95.8 ±0.43%96.5 ±0.46%96.2 ±0.44%276GPS base: 93.1%+3.4***
C4 Phys. Distress91.8 ±0.68%90.6 ±0.71%91.2 ±0.71%148MIT-BIH BR: 88.3%+2.3**
*** p<0.001, ** p<0.01 (Welch t-test vs prior recall; baseline SD assumed 0.5%). C4 SD=0.71% is highest — reflects non-IID partition sensitivity. C2 recall statistically equivalent to IIETA (2025) 99.0% (p=0.65, d=0.24).
// §4.6 — Table 8 (published)

Privacy and security performance

CEICS achieved 78.4%±1.8% intrusion resistance across 1,000 simulated attacks (400 passive eavesdropping, 300 active MITM, 300 FL replay). The dominant vulnerability for the cloud baseline is data volume in transit — edge-local processing is the critical privacy control, not just encryption.

Security metricCentralised cloudStandard FLCEICSΔ vs CloudTest statistic
Overall IRR ±SD22.1 ±2.3%61.4 ±2.1%78.4 ±1.8%+56.3 pp***t(6.8)=44.7, d=28.2
IRR: Passive eavesdrop11.4 ±1.9%N/A97.2 ±0.8%+85.8 pp***t(4.6)=89.4, d=46.1
IRR: MITM 5G backhaul31.2 ±3.1%N/A71.3 ±2.4%+40.1 pp***t(7.2)=23.6, d=14.9
IRR: FL replay attackN/A43.4 ±2.8%63.7 ±3.1%+20.3 pp***t(7.9)=11.7, d=6.9
HE overhead ±SDN/AN/A8.7 ±0.3%Stable; pooled SD=0.31%
Blockchain TPS ±SDN/AN/A84 ±2.16.3× max alert rate
Audit completeness~71%0%100%+29 pp
False alert rate ±SD4.8 ±0.41%3.6 ±0.38%2.1 ±0.29%−2.7 pp***t(6.3)=12.4, d=7.6
*** p<0.0001 (Welch t-test). IRR = Intrusion Resistance Rate. 1,000 simulated attacks: 400 passive eavesdropping, 300 active MITM on 5G backhaul, 300 FL aggregation replay. Residual 28.7% MITM vulnerability reflects simulated enterprise CA compromise — mitigable by certificate pinning.
// §4.8 — Table 10 (published)

C2 (Seizure) threshold sensitivity analysis

Post-hoc sensitivity analysis across τ ∈ {0.70, 0.75, 0.80, 0.85, 0.90, 0.92, 0.95} for the most safety-critical class. The deployed threshold τ=0.92 was selected as optimal: it satisfies the clinical recall floor of ≥0.93 derived from Shoaib et al. (2021) pediatric emergency alert literature, while achieving a false alert rate of only 1.6%. The stricter τ=0.95 was rejected because recall drops to 97.4% — below the 0.93 clinical floor.

τ (C2)C2 PrecisionC2 RecallC2 F1False Alert RateClinical assessment
0.7084.1%99.8%91.3%11.4%Unacceptable — FAR exceeds 5% deployment limit
0.7589.3%99.6%94.2%7.6%Borderline — FAR above 5% limit
0.8093.8%99.4%96.5%4.2%Acceptable — FAR within limit, high recall
0.8596.1%99.3%97.7%2.8%Good — balanced precision-recall
0.9097.9%99.2%98.5%1.9%Very good — low FAR, high recall
0.92 ★98.7%99.1%98.9%1.6%★ Selected — optimal F1; recall ≥0.93 satisfied
0.9599.1%97.4%98.2%1.2%Rejected — recall < 0.93 clinical floor
Clinical floor rationale: The ≥0.93 recall floor for C2 is derived from Shoaib et al. (2021) pediatric emergency alert acceptability threshold. Missing a seizure (false negative) is clinically more dangerous than an occasional false alarm — the asymmetric cost function justifies the recall-prioritised threshold selection.