HOWTO quality video/audio to text translation

Iron_Bound · January 13, 2023, 1:22pm

In the following I’m going to detail, how to take audio either from files or pulled from online videos and finally output the speech as text files.

Currently I’ve been saving a database with all the video essays and websites I see,
with text solution being part of that system.

To try this online with a video:

Else run it locally via python

import os
import pytube as pt
import whisper

# download mp3 from youtube video
yt = pt.YouTube("VIDEO_ADDRESS_HERE")
stream = yt.streams.filter(only_audio=True).first()
print("Getting text for video: {} - {}".format(yt.author, yt.title))

tempFile = "audio_english.mp3"

stream.download(filename=tempFile)
model = whisper.load_model("base")
result = model.transcribe(tempFile)

# write temp text file
print("writing output")
outFile = "output.txt"
with open(outFile, 'w') as f:
    f.write(result["text"])

# rename as final file
titleFile = "{} - {}.txt".format(yt.author, yt.title)
os.rename(outFile, titleFile)

wwed26 · January 13, 2023, 1:49pm

Interesting. Is there something like this for non-YouTube vids?

Iron_Bound · January 13, 2023, 1:52pm

There’s no dependency here,
so just swap out the pytube library for something else or directly give it an audio file.

wwed26 · January 30, 2023, 12:32am

This is pretty sweet. Led me to Whisper. I just wish there was an easier way to churn out into a subtitle format.

Iron_Bound · January 30, 2023, 10:20pm

Have you tired this with the subtitle flags?

wwed26 · February 26, 2023, 2:41am

jesus, I was using whisper to transcribe and generate a .srt and used MKVToolNix to throw it all together. How in the world did I not know that was there.

At least I learned how to use whisper