In the following I’m going to detail, how to take audio either from files or pulled from online videos and finally output the speech as text files.
Currently I’ve been saving a database with all the video essays and websites I see,
with text solution being part of that system.
To try this online with a video:
Else run it locally via python
import os
import pytube as pt
import whisper
# download mp3 from youtube video
yt = pt.YouTube("VIDEO_ADDRESS_HERE")
stream = yt.streams.filter(only_audio=True).first()
print("Getting text for video: {} - {}".format(yt.author, yt.title))
tempFile = "audio_english.mp3"
stream.download(filename=tempFile)
model = whisper.load_model("base")
result = model.transcribe(tempFile)
# write temp text file
print("writing output")
outFile = "output.txt"
with open(outFile, 'w') as f:
f.write(result["text"])
# rename as final file
titleFile = "{} - {}.txt".format(yt.author, yt.title)
os.rename(outFile, titleFile)