Automated Text-to-Audio Conversion for Visually Impaired People Using Optical Character Recognition

Abstract
This work aims to get text from images and documents like Portable Document Format (PDF) and PowerPoint Presentation (PPT) using Optical Character Recognition (OCR). The text is turned into speech, and thus, audio files are received. Organizing these audio files in a specific folder makes it easier to find and listen to them. The work plan is to create a tool that can take documents, PDFs, or PPT files as input and extract letters and numbers from them. This tool is great for quickly entering data from printed documents. Many images are used as input for the tool, which uses a machine to find patterns in the images and extract characters. Python is the main tool used for this work. A Python wrapper for Tesseract is used to test OCR on images first to make sure it works well. Then, the solution is used with a live video feed from a smartphone, processed with OpenCV. The text obtained is then turned into speech using Google Text-To-Speech (gTTS). With this approach, the system can read any text it finds out loud. By combining image processing, OCR, and text-to-speech, the system aims to make it easy and enjoyable to listen to text.
Keywords: Google Text-To-Speech, Opencv, Optical Character Recognition, Tesseract, Text-To-Audio Conversion.

Author(s): SM Kamali, V Malathy, G Uma Devi, SK Deepa, M Anand, S Vaishnodevi, Ratchagaraja Dhairiyasamy*, Subhav Singh
Volume: 6 Issue: 2 Pages: 992-1008
DOI: https://doi.org/10.47857/irjms.2025.v06i02.03672