Abstract

The voice communication channel is a significant vector for social engineering attacks and the spread of disinformation. Existing countermeasures that rely on cloud services have substantial drawbacks, including high latency, dependence on network connectivity, and privacy risks, making them unsuitable for real-time applications. This paper proposes a resource-efficient modular keyword spotting model designed for autonomous operation on resource-constrained edge devices. The model's architecture is based on the transformation of sequences of Mel-frequency cepstral coefficients into compact string "fingerprints" using differentiated weighting of informative features, followed by classification using the Levenshtein distance. Experimental validation on a Ukrainian-language command corpus demonstrated high performance: the F1-score reached 0.92 in ideal conditions and 0.78 at a signal-to-noise ratio of 5 dB. The proposed model significantly surpasses baseline and classical counterparts in the balance of accuracy, speed, and resource efficiency, which confirms its suitability for creating autonomous systems for proactive detection of auditory threats.

Модель розпізнавання ключових слів для протидії соціальній інженерії та дезінформації

Дідус Андрій, Терейковський Ігор

Національний технічний університет України «Київський політехнічний інститут імені Ігоря Сікорського»

Abstract