Collaborative Edge Intelligence for Secure Child Location Monitoring in IoT Systems: A Privacy-Preserving Framework Aligned with the Oslo Safe Schools Declaration

Sylvester Oga Ogaji; Oluwagbemisola J. Akinlade-Ogaji; Adeniji Samuel Olamilekan; Adeyanju Emmanuel Abayomi; Alabi Ayomide Praise

// §3.3 — Model architecture

CNN-LSTM anomaly detection model

A hybrid convolutional–recurrent architecture designed for multimodal time-series classification on resource-constrained edge devices. Three 1D convolutional blocks extract local temporal features; stacked LSTM layers capture long-range sequential patterns; a dense head produces five-class anomaly probabilities every 200 ms.

487,621

Total parameters (FP32)

3×Conv1D + 2×LSTM + Dense + Softmax

74.4%

Compression — INT8 quantization

4.18 MB → 1.07 MB · −1.2 pp accuracy cost

22 ms

Inference time on Jetson Nano GPU

Down from 47 ms FP32 · TensorFlow Lite Micro

Input

250×6
tensor

5s @ 50Hz
6 channels

→

Conv1D

32 filters
kernel=3

BN · ReLU

Conv1D

64 filters
kernel=3

BN · ReLU

Conv1D

128 filters
kernel=3

BN · ReLU

→

GlobalMax
Pool

dim reduction

→ 128-d vec

→

LSTM

64 units
dropout=0.2

LSTM

32 units
dropout=0.2

→

Dense

64 units
ReLU

→

Softmax

5 classes
C0–C4

τ thresholds
per class

Input tensor: Each 250×6 window contains 5 seconds of data at 50 Hz across 6 channels — ax, ay, az (IMU), HR (PPG), SpO₂ (PPG), EDA. GPS coordinates are processed separately through the geofencing module contributing to C3 classification independently of the CNN-LSTM pipeline.

// per-class detection thresholds (τ)

// FP32 vs INT8 accuracy cost

// §3.6 — Dataset composition

10,847-window multimodal dataset

Two publicly licensed benchmark datasets supplemented by school-specific synthetic data. MIT-BIH Arrhythmia Database provides physiological signals; SHAR-100-20 provides IMU activity data; AMASS-generated synthetic samples fill classroom-specific gaps absent from both benchmarks.

Class	Label	Source	Pre-aug (n)	Post-aug (n)	Train	Val	Test*
C0 Normal	Normal activity	MIT-BIH NSR + SHAR	5,640	5,640	3,948	846	846
C1 Fall/Assault	Fall · vigorous play	SHAR	1,519	1,519	1,063	228	228
C2 Seizure	VT + VF + synthetic	MIT-BIH + AMASS	868	1,302	911	195	130
C3 Unauth exit	Running + synthetic	SHAR + AMASS	1,843	1,843	1,291	276	276
C4 Phys distress	Bradycardia + synthetic	MIT-BIH + AMASS	977	1,302	1,009	145	148
TOTAL			10,847	11,606	8,222	1,690	1,627

Anti-leakage note: SMOTE-NC oversampling (k=5) applied to training partition only. Test partition uses original pre-augmentation samples exclusively. AMASS-generated synthetic samples are confined to the training set. Dirichlet (α=0.5) non-IID distribution across 10 gateways produced mean Gini coefficient 0.52 ± 0.09.

// class distribution — pre-augmentation

// train / val / test split by class

Preprocessing pipeline (6 stages): (1) Butterworth LPF 20 Hz for IMU, bandpass 0.5–40 Hz for PPG, LPF 5 Hz for EDA. (2) 5s windows at 250 samples/50 Hz, 50% overlap. (3) Per-channel z-score normalisation using training-set statistics. (4) Linear interpolation for 1–3 missing samples; exclusion for >3 gaps (2.3% excluded). (5) SMOTE-NC oversampling to 12% training floor for C2 and C4. (6) Dirichlet α=0.5 non-IID partitioning.

// §3.4 — Algorithm 1

Federated learning round procedure

FedAvg with Paillier homomorphic encryption. 10 participating gateways, 5 local epochs per round, 24-hour aggregation schedule, 200 total rounds. Convergence criterion: Δacc <0.3% over 10 consecutive rounds.

Algorithm 1 — CEICS Federated Learning Round · Aggregation Server §3.4

// INPUT: Global model θ^(t−1), node set S={G₁,…,G_N}, Paillier key (pk, sk_shares) // OUTPUT: Updated global model θ^(t), PoA blockchain hash
BROADCAST θ^(t−1) to all Gᵢ ∈ S via 5G priority channel
FOR EACH Gᵢ ∈ S (parallel, local execution): LOAD local dataset Dᵢ from encrypted 72-hr rolling buffer SET θᵢ ← θ^(t−1) FOR epoch e = 1 to E_local (= 5): SAMPLE mini-batch B ⊂ Dᵢ (size = 32) UPDATE θᵢ ← θᵢ − η · ∇θᵢ(B) [SGD, η=0.01, momentum=0.9] COMPUTE gradient update: Δθᵢ = θᵢ − θ^(t−1) ENCRYPT E(Δθᵢ) ← Paillier.Encrypt(Δθᵢ, pk) [2048-bit key] TRANSMIT E(Δθᵢ) and nᵢ (sample count) to aggregation server
AT AGGREGATION SERVER: VERIFY E(Δθᵢ) hash against PoA blockchain // reject if tampered COMPUTE E(Δθ_global) ← ⊕ᵢ (nᵢ/n_total) · E(Δθᵢ) [HE additive property] DECRYPT Δθ_global ← Paillier.Decrypt(E(Δθ_global), sk_shares) [2-of-3 key] UPDATE θ^(t) ← θ^(t−1) + Δθ_global EVALUATE θ^(t) on held-out validation shard → accuracy_val LOG PoA transaction: {round_id, Merkle_root, accuracy_val, admin_signatures}
IF |accuracy_val(t) − accuracy_val(t−10)| < 0.003: CONVERGED → RETURN BROADCAST θ^(t) to all Gᵢ ∈ S

// §3.3 — Algorithm 2

Real-time anomaly detection — gateway inference loop

Thread 1 of the school gateway — runs continuously during school hours. Each LoRaWAN packet from a wearable triggers this procedure. Two-window hysteresis prevents false positives from transient movements.

Algorithm 2 — CEICS Real-Time Anomaly Detection · School Gateway · Thread 1 §3.3

// INPUT: LoRaWAN packet pkt from wearable device d // Runs every 200 ms per device · GPU-accelerated on Jetson Nano
PARSE pkt → {device_id d, timestamp ts, class_idx c, prob_vector P, gps loc} RETRIEVE 30s historical feature cache H_d from LRU cache APPEND current 5s feature window to H_d
IF len(H_d) ≥ 6 (30s of history available): X ← normalize(H_d, μ_train, σ_train) P_full ← CNN_LSTM_GPU.infer(X) [full-precision, Jetson GPU, 22ms] c_pred ← argmax(P_full) IF c_pred ≠ C0 AND P_full[c_pred] ≥ τ(c_pred): hysteresis_counter[d] += 1 IF hysteresis_counter[d] ≥ 2: // TWO CONSECUTIVE WINDOWS CONFIRMED → dispatch alert ALERT confirmed → dispatch_to_thread3(c_pred, P_full, d, ts) ELSE: hysteresis_counter[d] ← 0
// PARALLEL GEOFENCE CHECK (runs concurrently with above) IF GPS loc outside school polygon > 30s → trigger C3 alert
UPDATE LRU cache: H_d ← H_d[−5:] [retain last 30s = 6 windows]

Hysteresis design rationale: Two consecutive 5-second windows (10 seconds total) must both exceed the class-specific threshold τ before an alert is dispatched. A child who jumps up and knocks their badge produces a single-window spike that resets the counter. A genuine sustained event (seizure, fall, assault) persists across both windows and fires. This design reduces the false alert rate from ~8% to 2.1%.

// §4.1 — Simulation environment (Table 3)

Experimental configuration for reproducibility

All stochastic elements seeded with global seed=42, five runs with offsets 42–46. The test set was accessed only once for final model evaluation after hyperparameter search on the validation partition.

Component	Configuration	Rationale
Network simulator	NS-3 v3.38	Current stable; compatible LoRaWAN module (Magrin et al. 2019)
ML framework	TensorFlow 2.12	TFLite Micro for INT8 on ESP32-C3
FL framework	TensorFlow Federated 0.55	FedAvg orchestration
CPU	Intel Core i9-13900K (24 cores, 5.8 GHz)	Parallel NS-3 simulation
GPU	NVIDIA RTX 4090 (24 GB)	CNN-LSTM training acceleration
RAM	64 GB DDR5-5600	150-node concurrent simulation
Random seeds	42, 43, 44, 45, 46	5 independent runs for variance estimation
Wearable nodes	150 simulated ESP32-C3	30 classes × 5 cohorts
Edge gateways	10 simulated Jetson Nano	One per 25×100m school zone
LoRaWAN propagation	Okumura-Hata model · SF7–SF12 adaptive	West African urban sub-1 GHz calibration
5G NR model	3GPP TR 38.913	Standard NR performance evaluation
FL rounds	200 total · convergence Δacc <0.3%	Grid search optimal vs client drift
Paillier key length	2048-bit	112-bit classical security (NIST SP 800-57)
Dirichlet α	0.5	Standard heterogeneity model (Nguyen et al. 2021)
Campus footprint	2.5 hectares (250×100m)	Typical Nigerian secondary school compound

// §4.7 + §4.9 — Novel finding

The HE participation-confidence effect

A finding not previously characterised in the federated learning literature — cryptographic privacy guarantees have an indirect positive effect on model accuracy through institutional participation decisions.

// Novel contribution — ablation A2 − A1

Privacy encryption indirectly improved accuracy by +1.8 pp

The ablation study (A1→A2) shows that adding Paillier homomorphic encryption to the FL pipeline improved accuracy by 1.8 percentage points — beyond what federated learning alone achieves. This is not a direct technical effect of encryption. It reflects that by providing strong cryptographic gradient confidentiality, Paillier HE enabled school administrators who would otherwise withhold participation to join the federation, increasing training data diversity across the 10-node network. At national scale (2,000 schools), this effect would be expected to be substantially larger — potentially closing the remaining 0.4 pp gap to centralised training accuracy.

// ablation — accuracy gain per component

// HE overhead vs accuracy gain

// §4.5 — Performance transparency

Latency distribution — mean and tail

Reporting only the mean (127 ms) omits the tail behaviour critical for systems engineering. The 95th-percentile latency of 312 ms occurs when the hysteresis filter requires a second window confirmation — adding one full 5-second window plus transmission. Both values remain within their respective clinical thresholds.

// latency distribution across 5 simulation runs

// CEICS vs baselines — mean + 95th percentile

Clinical thresholds: Mean latency 127 ms meets the sub-150 ms immediate-response requirement. The 95th-percentile 312 ms (second-window hysteresis events) remains well within the 500 ms non-immediate escalation threshold from Shoaib et al. (2021). Both are orders of magnitude within the 2–5 minute first-response window for pediatric status epilepticus.

// §2.5 — Research gap (Table 1)

What existing systems leave unsolved

No prior work simultaneously addressed all five operational requirements. The table below maps each prior system's limitation to the specific CEICS design decision that fills it. CEICS is the first framework to experimentally validate all five requirements jointly.

System	Edge AI	FL training	HE privacy	Blockchain	Child safety	Offline op.	Key limitation
Shi et al. (2020)	✓	✗	✗	✗	✗	✗	No FL or privacy layer
McMahan et al. (2017)	✗	✓	✗	✗	✗	✗	No edge inference; no encryption
Bonawitz et al. (2019)	✗	✓	✓	✗	✗	✗	Not on child safety data; secret sharing only
Adeniji et al. (2026)	✓	✓	✗	✗	✗	✗	Industrial only; no child safety context
Nguyen et al. (2021)	✗	✓	✗	✗	✗	✗	Survey only; no anomaly detection evaluation
Harri et al. (2023)	✗	✗	✗	✗	✓	✗	No edge AI; cloud-dependent architecture
Masci et al. (2025)	✓	✗	✗	✗	✓	✗	No FL; no privacy; no geofencing
Li et al. (2024)	✓	✓	✗	✓	✗	✗	No HE; no child safety evaluation
Indrayani et al. (2023)	✗	✗	✗	✗	✓	✗	Cloud-dependent; no FL; no edge AI
CEICS (this work)	✓	✓	✓	✓	✓	✓	All 5 requirements jointly satisfied

// §4.10 — Limitations

Six constraints on generalisability

These limitations define the boundary of the experiment — they do not invalidate the findings, but they must be resolved before field deployment. Each one is matched to a specific future research direction below.

LIMITATION 01

Simulation-based evaluation only

NS-3 cannot capture all real-world RF propagation variability in Nigerian school buildings. Real-world LoRaWAN latency may be 20–40% higher than simulated values due to structural interference, humidity, and vegetation.

Addresses: Future direction 1 (NE Nigeria field trials)

LIMITATION 02

Adult benchmark datasets

MIT-BIH and SHAR-100-20 were collected from adults (median age 58 and 30 respectively). Pediatric physiological signals differ in HR range (60–180 vs 60–100 bpm), smaller EDA amplitude, and higher baseline activity. Estimated C2 and C4 recall reduction: 3–7% until locally calibrated training data is available.

Addresses: Future direction 1 (pediatric dataset collection)

LIMITATION 03

Device homogeneity

All 150 wearable nodes are simulated as identical ESP32-C3 devices. Real deployments will include manufacturer variations in sensor accuracy, clock drift, and battery capacity across procurement batches.

Mitigated by: Per-channel z-score normalisation at inference time

LIMITATION 04

Fixed single-building topology

The 2.5-hectare single-building simulation does not model multi-building campuses, underground passages, or GPS-occluded corridors where satellite signal is degraded. These are common in urban Nigerian school environments.

Addresses: Future direction 4 (hierarchical FL for multi-campus)

LIMITATION 05

Limited simulation run count (n=5)

Five simulation runs provide limited statistical power for rare-class conclusions. The C2 test set contains only 130 windows, producing wider confidence intervals for the 99.1% recall claim than would be available with n≥30 runs or a larger pediatric dataset.

Confidence intervals correctly reported; wider n=5 CIs are disclosed

LIMITATION 06

No Byzantine adversarial evaluation

The FedAvg protocol is known to be vulnerable to gradient poisoning attacks (Kairouz et al., 2021). This work evaluates passive eavesdropping, MITM, and replay attacks — but not adversarial gradient injection from compromised gateways.

Addresses: Future direction 2 (Byzantine-robust aggregation)

// §5 — Future research directions

Five priority directions for next-stage work

These directions are not aspirational additions — they are specific, bounded next steps that directly address the six limitations above.

1

Prospective field trials — northeast Nigeria

Deploy CEICS at 3–5 schools in Borno and Yobe states with full ethical oversight, participatory design sessions with school communities, and collection of pediatric physiological baseline data to replace the adult benchmark datasets. Primary outcome: real-world latency, battery life, and false alert rate under Nigerian RF conditions.

Requires: NITDA ethics clearance · NBHIS data sharing agreement · UNICEF Nigeria partnership

2

Byzantine-robust FL aggregation

Integrate Krum and Coordinate-wise Median aggregation rules as alternatives to FedAvg for environments where some gateways may be physically captured and reprogrammed by hostile actors. Evaluate under simulated gradient poisoning attacks (10%, 30%, 50% compromised node ratios).

Algorithms: Blanchard et al. (2017) Krum · Yin et al. (2018) CoordMedian

3

Post-quantum cryptographic primitives

Migrate from Paillier (2048-bit classical security) to CRYSTALS-Kyber (key encapsulation) and CRYSTALS-Dilithium (signatures). Paillier's 112-bit classical security is theoretically breakable by future quantum adversaries. With a 10–15 year deployment horizon, post-quantum migration should be designed in from the next version.

Standards: NIST PQC Round 3 winners · FIPS 203/204/205

4

Hierarchical federated learning for national scale

Extend from 10-node simulation to hierarchical FL architecture supporting 2,000+ schools across Nigeria's 36 states. Local aggregation at state level, global aggregation at federal level. Expected to close the remaining 0.4 pp accuracy gap to centralised training through larger federation diversity, and amplify the HE participation-confidence effect.

Reference: Deng et al. (2021) SHARE hierarchical FL architecture

5

Integration with national emergency dispatch systems

Develop standardised API integration between CEICS blockchain alert tokens and the Nigeria Police Force Command and Control system, NEMA emergency dispatch, and UNICEF Rapid Response teams. Standardised alert token format (JSON-LD) with Oslo SSD incident category codes would enable direct incident reporting without manual transcription.

Standards: ITU-T X.1500 IODEF · W3C JSON-LD · Oslo SSD Annex B reporting format

// §3.1 — Research design

Design Science Research methodology

CEICS adopts the Design Science Research (DSR) paradigm (Hevner et al., 2004) — the primary research artifact (CEICS) is designed, prototyped as a simulation, and evaluated against explicit performance requirements. The evaluation follows the quasi-experimental structure recommended by Wohlin et al. (2024).

DSR

Research paradigm

Hevner et al., 2004 · MIS Quarterly

5

Independent simulation runs

Seeds 42–46 · variance estimation

1×

Test set access

Accessed once after hyperparameter search

Variable type	Variable	Values / levels
Independent	Processing paradigm	Centralised cloud · Edge-only · Standard FL · CEICS
Dependent	Detection accuracy	Macro accuracy, Precision, Recall, F1, AUC-ROC
Dependent	System latency	L_E2E = T_sensor + T_inference + T_hysteresis + T_tx
Dependent	Network bandwidth	B_rel = B_CEICS / B_cloud × 100%
Dependent	Privacy resilience	IRR = (N_thwarted / N_attempts) × 100%
Control	CNN-LSTM architecture	Identical across all four paradigms
Statistical	Primary comparison	One-way ANOVA · Tukey HSD · Bonferroni α=0.000167
Statistical	Secondary comparisons	Welch t-tests · Bonferroni-corrected · 95% CIs

Replication protocol: All stochastic elements were seeded with global seed=42 (five runs with offsets 42–46). The validation partition was used for hyperparameter search (grid search). The test set was accessed exactly once for final model evaluation, after all hyperparameter decisions were locked. This prevents any form of test-set leakage into model selection.

// §4.2 — Table 4 (published)

FL convergence trajectory — accuracy ±SD vs round

Convergence gap at Round 200 is non-significant (Welch t(7.2)=1.94, p=0.09), confirming statistical parity with centralised training while preserving complete data locality. Communication cost at convergence (486 MB) is 73% lower than equivalent raw sensor stream volume (1,800 MB).

FL Round	CEICS Accuracy (%) ±SD	95% CI	Centralised (%)	Gap (pp)	Comm. Cost (MB)
20	79.2 ±1.82	[77.0, 81.4]	95.7	16.5	48.6
60	86.4 ±1.11	[85.0, 87.8]	95.7	9.3	145.8
100	93.4 ±0.71	[92.5, 94.3]	95.7	2.3	243.0
120	94.1 ±0.58	[93.4, 94.8]	95.7	1.6	291.6
160	94.9 ±0.44	[94.4, 95.4]	95.7	0.8	388.8
200 ★	95.3 ±0.41	[94.8, 95.7]	95.7	0.4	486.0

★ Convergence criterion: |accuracy_val(t) − accuracy_val(t−10)| < 0.003. Gap at Round 200 (0.4 pp) is non-significant (p=0.09). All gaps at Rounds 20–160 are significant (p<0.001). SD reported across 5 runs (seeds 42–46); 95% CI via t-distribution (df=4).

// §4.3 — Table 5 (published)

Detection performance — CEICS vs all baselines

One-way ANOVA: F(3,16)=847.3, p<0.001, η²=0.994. Model type explains 99.4% of observed variance. All Tukey HSD pairwise comparisons significant at p<0.0001 (Bonferroni-corrected α=0.000167). Test set n=1,627 windows.

Model	Accuracy ±SD	Precision ±SD	Recall ±SD	F1 ±SD	AUC-ROC ±SD	Δ F1 vs CEICS	Cohen's d
Centralised Cloud	88.4 ±0.61%	87.1 ±0.72%	86.9 ±0.68%	87.0 ±0.69%	0.941 ±0.005	−7.7 pp***	16.8
Edge-Only	84.1 ±0.53%	82.7 ±0.61%	83.4 ±0.58%	83.0 ±0.59%	0.912 ±0.007	−11.7 pp***	27.3
Standard FL (no HE)	92.8 ±0.49%	91.9 ±0.53%	92.2 ±0.51%	92.0 ±0.52%	0.961 ±0.004	−2.7 pp***	6.1
CEICS (proposed)	95.3 ±0.41%	94.9 ±0.44%	94.6 ±0.43%	94.7 ±0.43%	0.981 ±0.003	Reference	—

*** p<0.0001 (Tukey HSD, Bonferroni-corrected α=0.000167). n=5 runs (seeds 42–46). Test set n=1,627 windows. Macro-averaging selected because false negatives for rare emergency classes carry greater consequence than errors on majority Normal class.

// §4.4 — Table 6 (published)

Per-class detection performance ±SD

C4 Physiological Distress shows the highest inter-run variance (SD=0.71%) reflecting within-class heterogeneity — conflating hypoxia, cardiac irregularity, and acute stress — and sensitivity to non-IID partition assignment. C2 Seizure recall (99.1%) is statistically equivalent to the best specialized single-class detector (IIETA 2025: 99.0%, p=0.65).

Class	Precision ±SD	Recall ±SD	F1 ±SD	Support (n)	Best prior recall	Δ pp
C0 Normal	97.2 ±0.31%	96.8 ±0.34%	97.0 ±0.32%	845	—	—
C1 Fall/Assault	95.1 ±0.48%	94.3 ±0.52%	94.7 ±0.49%	228	SHAR: 91.2%	+3.1***
C2 Seizure	98.7 ±0.39%	99.1 ±0.41%	98.9 ±0.39%	130	Masci 2025: 91.4%	+7.7***
C3 Unauth. Exit	95.8 ±0.43%	96.5 ±0.46%	96.2 ±0.44%	276	GPS base: 93.1%	+3.4***
C4 Phys. Distress	91.8 ±0.68%	90.6 ±0.71%	91.2 ±0.71%	148	MIT-BIH BR: 88.3%	+2.3**

*** p<0.001, ** p<0.01 (Welch t-test vs prior recall; baseline SD assumed 0.5%). C4 SD=0.71% is highest — reflects non-IID partition sensitivity. C2 recall statistically equivalent to IIETA (2025) 99.0% (p=0.65, d=0.24).

// §4.6 — Table 8 (published)

Privacy and security performance

CEICS achieved 78.4%±1.8% intrusion resistance across 1,000 simulated attacks (400 passive eavesdropping, 300 active MITM, 300 FL replay). The dominant vulnerability for the cloud baseline is data volume in transit — edge-local processing is the critical privacy control, not just encryption.

Security metric	Centralised cloud	Standard FL	CEICS	Δ vs Cloud	Test statistic
Overall IRR ±SD	22.1 ±2.3%	61.4 ±2.1%	78.4 ±1.8%	+56.3 pp***	t(6.8)=44.7, d=28.2
IRR: Passive eavesdrop	11.4 ±1.9%	N/A	97.2 ±0.8%	+85.8 pp***	t(4.6)=89.4, d=46.1
IRR: MITM 5G backhaul	31.2 ±3.1%	N/A	71.3 ±2.4%	+40.1 pp***	t(7.2)=23.6, d=14.9
IRR: FL replay attack	N/A	43.4 ±2.8%	63.7 ±3.1%	+20.3 pp***	t(7.9)=11.7, d=6.9
HE overhead ±SD	N/A	N/A	8.7 ±0.3%	—	Stable; pooled SD=0.31%
Blockchain TPS ±SD	N/A	N/A	84 ±2.1	—	6.3× max alert rate
Audit completeness	~71%	0%	100%	+29 pp	—
False alert rate ±SD	4.8 ±0.41%	3.6 ±0.38%	2.1 ±0.29%	−2.7 pp***	t(6.3)=12.4, d=7.6

*** p<0.0001 (Welch t-test). IRR = Intrusion Resistance Rate. 1,000 simulated attacks: 400 passive eavesdropping, 300 active MITM on 5G backhaul, 300 FL aggregation replay. Residual 28.7% MITM vulnerability reflects simulated enterprise CA compromise — mitigable by certificate pinning.

// §4.8 — Table 10 (published)

C2 (Seizure) threshold sensitivity analysis

Post-hoc sensitivity analysis across τ ∈ {0.70, 0.75, 0.80, 0.85, 0.90, 0.92, 0.95} for the most safety-critical class. The deployed threshold τ=0.92 was selected as optimal: it satisfies the clinical recall floor of ≥0.93 derived from Shoaib et al. (2021) pediatric emergency alert literature, while achieving a false alert rate of only 1.6%. The stricter τ=0.95 was rejected because recall drops to 97.4% — below the 0.93 clinical floor.

τ (C2)	C2 Precision	C2 Recall	C2 F1	False Alert Rate	Clinical assessment
0.70	84.1%	99.8%	91.3%	11.4%	Unacceptable — FAR exceeds 5% deployment limit
0.75	89.3%	99.6%	94.2%	7.6%	Borderline — FAR above 5% limit
0.80	93.8%	99.4%	96.5%	4.2%	Acceptable — FAR within limit, high recall
0.85	96.1%	99.3%	97.7%	2.8%	Good — balanced precision-recall
0.90	97.9%	99.2%	98.5%	1.9%	Very good — low FAR, high recall
0.92 ★	98.7%	99.1%	98.9%	1.6%	★ Selected — optimal F1; recall ≥0.93 satisfied
0.95	99.1%	97.4%	98.2%	1.2%	Rejected — recall < 0.93 clinical floor

Clinical floor rationale: The ≥0.93 recall floor for C2 is derived from Shoaib et al. (2021) pediatric emergency alert acceptability threshold. Missing a seizure (false negative) is clinically more dangerous than an occasional false alarm — the asymmetric cost function justifies the recall-prioritised threshold selection.