HOWTO quality video/audio to text translation

In the following I’m going to detail, how to take audio either from files or pulled from online videos and finally output the speech as text files.

Currently I’ve been saving a database with all the video essays and websites I see,
with text solution being part of that system.

To try this online with a video:

Else run it locally via python

import os
import pytube as pt
import whisper

# download mp3 from youtube video
yt = pt.YouTube("VIDEO_ADDRESS_HERE")
stream = yt.streams.filter(only_audio=True).first()
print("Getting text for video: {} - {}".format(yt.author, yt.title))

tempFile = "audio_english.mp3"

stream.download(filename=tempFile)
model = whisper.load_model("base")
result = model.transcribe(tempFile)

# write temp text file
print("writing output")
outFile = "output.txt"
with open(outFile, 'w') as f:
    f.write(result["text"])

# rename as final file
titleFile = "{} - {}.txt".format(yt.author, yt.title)
os.rename(outFile, titleFile)
2 Likes

Interesting. Is there something like this for non-YouTube vids?

There’s no dependency here,
so just swap out the pytube library for something else or directly give it an audio file.

This is pretty sweet. Led me to Whisper. I just wish there was an easier way to churn out into a subtitle format.

Have you tired this with the subtitle flags?

jesus, I was using whisper to transcribe and generate a .srt and used MKVToolNix to throw it all together. How in the world did I not know that was there.

At least I learned how to use whisper

1 Like