Loading

Hybrid Approach to Detect Prolonged Speech SegmentsCROSSMARK Color horizontal
K B Drakshayini1, Anusuya M A2

1K B Drakshayini, Research Scholar, Visvesvaraya Technological University (VTU), Belgaum (Karnataka), India.
2Dr. Anusuya MA, Associate Professor, Department of Computer Science and Engineering, JSS Science and Technology University, Mysore (Karnataka), India.
Manuscript received on 29 March 2023 | Revised Manuscript received on 06 April 2023 | Manuscript Accepted on 15 April 2023 | Manuscript published on 30 April 2023 | PP: 77-87 | Volume-12 Issue-4, April 2023 | Retrieval Number: 100.1/ijeat.D41060412423 | DOI: 10.35940/ijeat.D4106.0412423

Open Access | Editorial and Publishing Policies | Cite | Zenodo | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: In the last 10 decades, various methods have been introduced to detect prolonged speech segments automatically for stuttered speech signals. However, less attention has been paid by researchers to the detection of prolongation disorder at the parametric level. This study aims to propose a hybrid approach for detecting prolonged speech segments by combining various spectral parameters with their corresponding recognition accuracies for the reconstructed speech signal. The paper presents the detection of prolonged segments by considering individual parameters, combining various spectral parameters, validating the prolongation detection system, and extracting MFCC features. It also evaluates the basic model accuracies for the reconstructed signals. The proposed methods are simulated and experimented on the UCLASS-derived dataset. The obtained results are compared with existing works on prolongation detection at the parametric and word levels. It is observed that hybrid parameters yield a 92% recognition rate for larger frame sizes of 200ms when modelled with SVM. The results are also tabulated and discussed for various metrics, including sensitivity, specificity, and accuracy, in detecting prolonged segments. The study also focuses on the prolongation characteristics of vocalised and non-vocalised sounds at the phoneme level. The detection accuracy of 71% is observed for Vocalised prolonged vowel phonemes over non-vocalised prolonged signal. Objectives: The objective of this work is to propose a hybrid algorithm for automatically detecting prolonged segments in speech signals with prolongation disorder. The other aim is to evaluate the performance of the obtained spectral parameters by applying various evaluation metrics and models to compute the recognition accuracy of a reconstructed signal. The objective is further extended to highlight the importance of the variable frame size concept and to analyse the variations in vocalised and non-vocalised sounds. Methods: The methods adopted to detect prolonged speech segments are discussed at two levels: preprocessing and modelling. The Preprocessing level is discussed by applying various parameters at an individual level and a hybrid level by combining the Centroid, Entropy, Energy, ZCR parameters, and the MFCC feature extraction method. A new method has been applied using Specificity, Sensitivity, and accuracy metrics to validate the performance of the prolongation detection model. At the modelling level, the above parameters are discussed by applying evaluation metrics for clustering and classification models, such as K-means, FCM, and SVM. The performance of these methods is evaluated and estimated for the prolonged segment detection accuracy of reconstructed speech signals from vocalised and non-vocalised sounds. All these methods are discussed in detail in the following sections. Findings: Hybridising spectral parameters to detect prolonged speech segments automatically is a significant finding of this work. It is also found that Specificity, sensitivity and accuracy metrics play a substantial role in designing and validating the prolongation detection model. From further experiments, it is identified that the hybrid and verification metrics are better suited for vocalised and non-vocalised sounds when larger frame lengths are considered. SVM has been found to perform better for all the above considerations. Novelty: According to a literature survey, it is observed that individual and a few parameters are applied to detect prolongation. However, works do not address applying or combining more than two parameters to detect prolonged speech segments. The novelty of this work lies in selecting and combining spectral parameters at the preprocessing stage to detect prolongation disorder. Spectral centroid and entropy are considered as appropriate parameters along with ZCR and Energy parameters. Hence, hybridising these parameters results in a novel proposal for an automatic prolongation detection system. Novelty is further enhanced by applying Specificity, sensitivity, and accuracy metrics to build and evaluate the detection system for vocalised and non-vocalised prolonged sounds.

Keywords: Prolongation, Centroid, Entropy, Specificity, sensitivity, Autocorrelation, frame length, threshold.
Scope of the Article: Expert Approaches