PDF books to audio books using python in 5 simple steps.
- Tejas Achar
- Apr 25, 2020
- 2 min read
Updated: May 17, 2020

This thought popped up one day while listening to Harry potter sample on Audible. I soon realized not all the audio books on Audible are free, however you can find most of the books in PDF format online for free.
Since we have the google API for text-to-speech. it is possible to write a simple python script to convert the text from PDF file to MP3 audio format.
Lets see how we can do this in 5 simple steps.
Step 1:
Import all the required libraries. In this case just "gtts" and "PyPDF2".
import PyPDF2
from gtts import gTTS
Step 2:
Using open create a file object of the pdf file that needs to be converted.
pdf_File=open('Path To The PDF file', 'rb')
Step 3:
Read the text content of the file using PyPDF2.PdfFileReader and store the number of pages in the PDF file in a varible called "count".
pdf_Reader=PyPDF2.PdfFileReader(pdf_File)
count=pdf_Reader.numPages
TextList= []
Step 4:
Read text from each page of the pdf file and append all the text content from the PDF file to a list
for i in range(count) :
try:
page=pdf_Reader.getPage(i)
print(page.extractText())TextList.append(page.extractText())
except:
pass
TextString=" ".join(TextList)
Step 5:
Select the language as en(English) and initialize a gTTS object and pass the "TextString" list, Note: slow parameter takes a boolean value if set to True, the vocal speed will be slower.
language='en'
myobj=gTTS(text=TextString, lang=language, slow=False)
myobj.save("AudioBook.mp3")
Finally save to a new file having .mp3 file extension.
And there you go!
Note: The speed of conversion depends on the size of the PDF file
Even though the voice is a bit robotic, it sure is easier to listen to someone reading out a book for you.
The full project is available on git, which also includes a GUI version using KIVY library and also an API version of this code
Share and follow for more simple tricks.
Comments