Joe Rogan Speech Synthesis (TTS)

Continuing the discussion from News Story Dump Thread - Stories Only, because discussion is not permitted there.

Some researchers/engineers from the Dessa company made a Text To Speech (TTS) engine which uses the voice of Joe Rogan. Articles about this mainly refer to this by the quality of the voice, but systems like TacoTron2 were already capable of very convincing TTS voices before this.

Is the novelty here that there was no additional recording done by the voice model specifically to make the TTS engine?
I haven’t seen any proof of this, but his initial reactions definitely sound like he was not involved with the team who did this.

Site with real/synthesizes comparison tests:

Announcement arcticle:


It sounds like reading ads and in my head the next step would be to learn how to fuck it up, and difficult words would be like stretched and not quite right, and then use wrong terms, which it corrects

OR, just yell speak few words, you could learn that from UFC I guess

Hes been talking about that thing so many times that it was just matter of time :smirk:

I read about this somewhere else too but all the articles go with how accurate itnsounds like him and I just don’t think it does. It is his voice but there is no animation in it none if the sudden raises in voice he has or the errors like @anon25377527 said. This is like a super clean version of him that is incredibly relaxed and reherssed which is not him normally.

Edit: listening to it again I remembered what I was thinking, a lotnof the sentences end with the same inflection, down. The pauses in between changing sentences don’t feel natural. It is amazing for sure but either it is not good enough yet, or they picked the wrong target. A more slow a leveled personality might have been more convincing.

Reading ads with tiny bit too much compression would be how I’d describe it if I would get that from my friend

1 Like

Combine this with deep fakes video and a concerted organized effort to provide video from different angles from a major event taking place (political rally?) and it will give a whole new meaning to the term “Fake News”.

Who’s to say which version of events is accurate?

We are headed for some interesting times…


I am actually really excited for that eventuality, I have been thinking about it since the deepfakes appeared and looked almost right. Both of these are now so close to being correct that it will be pretty amazing when it happens.

As for the way I am excited though? It will force a change in thinking in the viewer and a level of forethought for the person being impersonated and information will have to be verifiable in a way we just don’t have now. It is sad to think that we will litteraly have to deceive people on a scale never known before to get some sort of awareness and accuracy that has been needed and eroded for a long time now.

Or that could all be wishful thinking and this will just make everything worse while people run around hair in fire style not doing anything about it but screaming that the end is high.

1 Like

2020 will be severely messed up because of all this.

Let’s hope it’s used against both political parties.

Give everyone dmt.

Half the people will be scarred for life, a quarter will have a great time, nd the people that are left are pkssies that joe han eat to death or something.

That is impressive! Give that thing a bit more text to read and put minimal effort into editing and you have something that is actually believable.

I would look at the samples on the website instead, I mis-categorized a few of them despite being fairly confident from watching the video.

Rewatching the video, the Rogan-synth of “She sells seashells…” was noticeably worse than Tacotron 2.

1 Like

Will check them out for sure. I had seen the video beofee I was on here and was not aware there were others yet.