Naqvi, Syed Meesam Raza, Tahir, Muhammad Ali, Javed, Kamran, Khan, Hassan Aqeel, Raza, Ali and Saeed, Zubair (2024). Code-mixed street address recognition and accent adaptation for voice-activated navigation services. IEEE Access, 12 , pp. 168393-168411.
Abstract
This study presents the development of a real-time application-specific Automatic Speech Recognition (ASR) system for voice-activated navigation services. The system is designed to recognize Urdu-English code-mixed street addresses, which is challenging due to their complex nature and structure, especially in under-resourced languages such as Urdu. Two separate corpora are collected for ASR system development: Unicode Urdu consisting of general Urdu recordings of around 61.82 hours by 144 speakers and Roman Urdu-English code-mixed Addresses of around 16.89 hours by 20 speakers. The Unicode Urdu data is developed to provide acoustic models with general language understanding and code-mixed street addresses to provide code-mixing or switching coverage. The hybrid ASR system employed in this study plays a crucial role in addressing the multifaceted challenges of low-resource settings (only 16.89 hours of task-specific data), especially in the context of Urdu-English code-switching. The study compares various acoustic models, with mixed Time Delay Neural Network and Long Short-Term Memory (TDNN-LSTM) performing best with a Word Error Rate (WER), Character Error Rate (CER), and Sentence Error Rate (SER) of 4.02%, 0.8%, and 15.14% respectively, on random street addresses. In addition to testing street addresses, we performed accent-based and manual decoding testing on the developed ASR system. Results indicate the need to develop and deploy custom ASR systems for better accent adaptation and application-specific coverage. The developed ASR system is integrated into the TPL Maps (https://tplmaps.com/) mobile application. It is Pakistan’s first Large Vocabulary Continuous Speech Recognition (LVCSR) real-time system to provide Urdu-based voice-activated navigation services.
| Publication DOI: | https://doi.org/10.1109/ACCESS.2024.3496617 | 
|---|---|
| Divisions: | College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Applied AI & Robotics | 
| Additional Information: | Copyright © 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ | 
| Uncontrolled Keywords: | Speech recognition,Hidden Markov models,Acoustics,Vocabulary,Speech coding,Real-time systems,Navigation,Long short term memory,Error analysis,Switches,Urdu-English code-mixing,roman Urdu addresses,hidden Markov models,accent adaptation,Gaussian mixture models,voice-activated navigation,deep neural network,General Computer Science,General Materials Science,General Engineering | 
| Publication ISSN: | 2169-3536 | 
| Last Modified: | 13 Oct 2025 16:20 | 
| Date Deposited: | 14 Nov 2024 12:54 | 
| Full Text Link: | |
| Related URLs: | https://ieeexpl ... ument/10750818/
                            (Publisher URL) http://www.scop ... tnerID=8YFLogxK (Scopus URL) | PURE Output Type: | Article | 
| Published Date: | 2024-11-22 | 
| Published Online Date: | 2024-11-12 | 
| Accepted Date: | 2024-11-08 | 
| Authors: | Naqvi, Syed Meesam Raza Tahir, Muhammad Ali Javed, Kamran Khan, Hassan Aqeel (  0000-0002-5501-160X) Raza, Ali Saeed, Zubair | 
Download
![[img]](https://publications.aston.ac.uk/style/images/fileicons/text.png) 
              
              
              
  
    Version: Accepted Version
Access Restriction: Restricted to Repository staff only
License: ["licenses_description_unspecified" not defined]
Version: Published Version
License: Creative Commons Attribution
| Preview