AUTOMATED SPEECH-TO-TEXT CONVERSION SYSTEMS IN BANGLA LANGUAGE: A SYSTEMATIC LITERATURE REVIEW

Authors

  • Aysha Akther Computer Science and Engineering Discipline, Khulna University, Khulna9208, Bangladesh
  • Rameswar Debnath Computer Science and Engineering Discipline, Khulna University, Khulna9208, Bangladesh

DOI:

https://doi.org/10.53808/KUS.2022.ICSTEM4IR.0107-se

Keywords:

Natural Language Processing, Speech-to-Text conversion, Bangla Speech Recognition, Human-computer interaction

Abstract

The 4th Industrial Revolution (4IR) is creating a new way of working and impacting all disciplines, industries, and economies. In future days, there will be needed seamless communication with machines and has to deal with an enormous amount of information. As speech is the most natural way of communication for humans, research in Natural Language Processing (NLP) is increasing with time. To make human-computer interaction effortless Speech-to-Text (STT) conversion is particularly important. A lot of research works have been carried out to allow machines to interact with humans naturally in many languages like English, Spanish, Japanese, etc. Bangla is the primary language of Bangladesh and West Bengal of India and is spoken by over 250 million people worldwide. Speech processing in Bangla language is still an open research field. This literature review studies the recent advancements in automated speech to text conversion in Bangla language. In this paper, we present a comprehensive comparative study on the state-of-the-art Bangla speech to text conversion systems in accordance with dataset size, feature extraction techniques, methodologies used, toolkits, and accuracies. Furthermore, challenges associated with Bangla speech processing research, applications of automatic speech to text conversion in different fields of Bangla language along possible future research indications are elaborated in this paper.

Downloads

Download data is not yet available.

References

Ahmed, S., Sadeq, N., Shubha, S. S., Islam, M. N., Adnan, M. A., & Islam, M. Z. (2020). Preparation of bangla speech corpus from publicly available audio & text. Proceedings of The 12th language resources and evaluation conference, 6586–6592.

Al Amin, M. A., Islam, M. T., Kibria, S., & Rahman, M. S. (2019). Continuous bengali speech recognition based on deep neural network. 2019 international conference on electrical, computer and communication engineering (ECCE), 1–6.

Botha, J., & Blunsom, P. (2014). Compositional morphology for word representations and language modelling. International Conference on Machine Learning, 1899–1907.

Bristy, I. J., Shakil, N. I., Musavee, T., & Choton, A. R. (2019). Bangla speech to text conversion using cmu sphinx (Doctoral dissertation). Brac University.

Chowdhury, N., Sattar, M. A., & Bishwas, A. K. (2009). Separating words from continuous bangla speech. Global Journal of Computer Science and Technology, 9(4).

Dalmia, S., Sanabria, R., Metze, F., & Black, A. W. (2018). Sequence-based multi-lingual low resource speech recognition. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4909–4913.

Daniluk, M., Rocktäschel, T., Welbl, J., & Riedel, S. (2017). Frustratingly short attention spans in neural language modeling. arXiv preprint arXiv:1702.04521.

Das, B., Mandal, S., & Mitra, P. (2011). Bengali speech corpus for continuous auutomatic speech recognition system. 2011 International conference on speech database and assessments (Oriental COCOSDA), 51–55.

Dave, N. (2013). Feature extraction methods lpc, plp and mfcc in speech recognition. International journal for advance research in engineering and technology, 1(6), 1–4.

Hasnat, M., Molwa, J., & Khan, M. (2007). Isolated and continuous bangla speech recognition: Implementation. Performance and application perspective.

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups. IEEE Signal processing magazine, 29(6), 82–97.

Hirsch, H.-G., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. 1995 International conference on acoustics, speech, and signal processing, 1, 153–156.

Hou, W., Zhu, H., Wang, Y., Wang, J., Qin, T., Xu, R., & Shinozaki, T. (2021). Exploiting adapters for cross-lingual low-resource speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30, 317–329.

Islam, J., Mubassira, M., Islam, M. R., & Das, A. K. (2019). A speech recognition system for bengali language using recurrent neural network. 2019 IEEE 4th international conference on computer and communication systems (ICCCS), 73–76.

Khan, M. F., & Sobhan, M. A. (2018). Construction of large scale isolated word speech corpus in bangla. Global Journal of Computer Science and Technology.

Khan, M., & Sobhan, M. (2018). Creation of connected word speech corpus for bangla speech recognition systems. Asian Journal of Research in Computer Science, 1–6.

Kjartansson, O., Sarin, S., Pipatsrisawat, K., Jansche, M., & Ha, L. (2018). Crowd-sourced speech corpora for javanese, sundanese, sinhala, nepali, and bangladeshi bengali.

Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.

Liu, S. (2021). Speech-to-text transcript accuracy rate among leading companies 2020. https : / / www. statista.com / statistics / 1133833 / speech - to - text - transcript -accuracy - rate -among - leading-companies/

Mahmud, N. A., & Munni, S. A. (2020). Qualitative analysis of plp in lstm for bangla speech recognition. The International Journal of Multimedia & Its Applications (IJMA) Vol, 12.

Mandal, S., Yadav, S., & Rai, A. (2020). End-to-end bengali speech recognition. arXiv preprint arXiv:2009.09615.

Murtoza, S., Alam, F., Sultana, R., Chowdhur, S., & Khan, M. (2011). Phonetically balanced bangla speech corpus. Proc. Conference on Human Language Technology for Development, 2011, 87–93.

Nasib, A. U., Kabir, H., Ahmed, R., & Uddin, J. (2018). A real time speech to text conversion technique for bengali language. 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2), 1–4. Rudnicky, A. I., Hauptmann, A. G., & Lee, K.-F. (n.d.). Survey of current speech technology. Communications of the ACM, 37.

Saha, S., et al. (2021). Development of a bangla speech to text conversion system using deep learning. 2021 Joint 10th International Conference on Informatics, Electronics & Vision (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 1–7.

Saurav, J. R., Amin, S., Kibria, S., & Rahman, M. S. (2018). Bangla speech recognition for voice search. 2018 international conference on Bangla speech and language processing (ICBSLP), 1–4.

Sharma, N., & Sardana, S. (2016). A real time speech to text conversion system using bidirectional kalman filter in matlab. 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2353–2357.

Sharmin, R., Rahut, S. K., & Huq, M. R. (2020). Bengali spoken digit classification: A deep learning approach using convolutional neural network. Procedia Computer Science, 171, 1381–1388.

Sultana, S., Rahman, M. S., Selim, M. R., & Iqbal, M. Z. (2021). Sust bangla emotional speech corpus (subesco): An audio-only emotional speech corpus for bangla. Plos one, 16(4), e0250173.

Sultana, S., Akhand, M., Das, P. K., & Rahman, M. H. (2012). Bangla speech-to-text conversion using sapi. 2012 International Conference on Computer and Communication Engineering (ICCCE), 385–390.

Sumon, S. A., Chowdhury, J., Debnath, S., Mohammed, N., & Momen, S. (2018). Bangla short speech commands recognition using convolutional neural networks. 2018 international conference on bangla speech and language processing (ICBSLP), 1–6.

Syfullah, S. M., Zakaria, Z. B., Uddin, M. P., Rabbi, M. F., Afjal, M. I., & Nitu, A. M. (2018). Efficient vector code-book generation using k-means and linde-buzo-gray (lbg) algorithm for bengali voice recognition. 2018 International Conference on Advancement in Electrical and Electronic

Engineering (ICAEEE), 1–4.

Tamura, S., & Waibel, A. (1988). Noise reduction using connectionist models. ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing, 553–556.

Tausif, M. T., Chowdhury, S., Hawlader, M. S., Hasanuzzaman, M., & Heickal, H. (2018). Deep learning based bangla speech-to-text conversion. 2018 5th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), 49–54.

Wikipedia. (2022). Bangla language, wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Bengali_language

Downloads

Published

20-11-2022

How to Cite

[1]
A. . Akther and . R. . Debnath, “AUTOMATED SPEECH-TO-TEXT CONVERSION SYSTEMS IN BANGLA LANGUAGE: A SYSTEMATIC LITERATURE REVIEW”, Khulna Univ. Stud., pp. 566–583, Nov. 2022.

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.