Ever wondered what it takes to translate other types of content (such as video)? - check this

Alex Coope - SN · ‎09-15-2023

Earlier in the year I was invited as co-host to one of our "Platform Academy" episodes - number 30 to be exact.

If you've not seen it check it out, as it's all about Localization in the platform. The purpose of the episode was to cover the what, the why, the how and the various tools and capabilities we have in the platform to help those of you out there who want / need to provide your instance in other languages.

The Mission

After the episode was recorded, I thought it might be neat to have it translated into some other languages. The first challenge was to decide which languages. After a bit of discussion in the team, we decided to settle on the following:

🇫🇷 French
🇨🇦 French Canadian (Quebecois)
🇩🇪 German
🇮🇹 Italian
🇯🇵 Japanese
🇧🇷 Brazilian Portuguese
🇪🇸 Spanish

Whilst this list isn't the entire list of UI languages we offer, I felt that of those languages these seemed the most sensible to offer. The main purpose of this exercise was actually for this blog post so that I can go through this journey to help you understand the complexities, subtle nuances, and the way the Localization world works.

The Scope

Now that we know what languages we wanted to offer, the next step was to figure out how deep down this rabbit-hole we wanted to go. I mean, sure if we had an endless budget we could have each slide in the video translated and re-transcode the recording per language, but in reality that's quite unrealistic in terms of outcome vs effort.

Quickly we determined that subtitles were a must. So, on the original English video you will see we've enabled each language's subtitles. Each of these were professionally translated after someone transcribed what was said. We didn't leave it to chance from YouTube's auto-transcription feature. Although, this approach did provide some interesting challenges.

The second decision we made, was if someone was going to be trying to follow along the video and we assumed their English wasn't great, we felt it was too much effort for them to keep track and follow what was happening whilst reading subtitles in their language and have to see what was happening on screen. We therefore decided to dub the audio as well! And for this we tried three approaches before settling on our final decision.

The Challenges

So, the challenges. Well, I won't sugar coat it, there were a few. The first was that the transcription comparison ended up being more complex than originally envisaged because for some of the languages (like Japanese) it needed some very specific context checks. We did end up leveraging the YouTube auto-transcribe feature as a double-check (which whilst wasn't always correct due to maybe some noises when someone said something) on the whole it did a very good job of picking up what words / sounds were said but it didn't always get the correct word and thus sometimes lost some context due to similar sounding words.

In a lot of automated solutions it's the auto-transcribed subtitles that are then MT'd (Machine Translated), which can lead to some rather funny mistakes in the translation of subtitles that YouTube makes.

The second issue we faced was how we wanted to present the dubbing. At the time we started this, YouTube had just announced a new feature where you could add additional audiostreams to videos (which is still in Beta as I write this). Unfortunately, we can't leverage it just yet, so we will revisit it in the future and bring in the audio-streams to the original video. This therefore meant we needed to post additional videos per language.

The last challenge we had was around how we would generate the audio. For this we had 2 main choices to make (and we tried them both):

Leverage an AI engine (because there are tools out there) to AI dub each language's translations.
Human voice actors.

As I thought it would be fun to see, we tried the AI route first. Testing 2 different early access tools out there:

one was a completely AI generated voice model tool specifically focussed at voice over translations,
the other was an AI model based on human recordings (much like how Siri is based on human sounds recorded).

They both have their pros and cons as an approach, the latter (using Human sound recordings) is attempting to solve the problem of inflections as well as making the sound more emotive as and where it can, however in reality we found both sounded a bit too un-human (or disconnected) for our liking. The second also took into account Lisa's voice at the beginning as well as mine, meaning it had a Female and Male voice capability in the same video. I personally believe, whilst this technology is currently in it's infancy you will see a lot more of it.

At an industry event I attended recently, I even saw a further take on this concept where they can now even deep-fake the face of the individual for the lip movements in that language, almost in real-time, which is some truly amazing stuff when you see it happen.

For us the whole purpose of this exercise was to have an output that connects to our intended audience as best as it possibly can, whilst also helping us tell the story (this story 🙂).

The Voices

So we now know what languages, we know we wanted subtitles and we know we wanted voices for each language in-scope, this then meant we needed to work with our partners to find suitable "voice actors" to meet our needs.

To be fair, any LSP (Language Service Provider) should be able to help you with this exercise, but I specifically asked for options per language. The super nuanced part was that we needed 2 voices per language and we needed to review the translations so that the voice actor didn't read out something that was essentially gibberish. This meant that the translations had to follow our "Glossary", which meant that when we talk about a feature in the platform the Voice Actor had to use the same term as what we would call it in the platform in that language.

This last part required quite a bit of too'ing and fro'ing to ensure things made sense, which meant for a language like Japanese, I had a member of my team (a Japanese native) review both the translations as well as Voice Overs so that we can do our very best at ensuring it made sense.

In reality, we have to accept that with the subtlety of languages, it's nearly impossible for it to be 100% perfect 100% of the time so instead the best objective to focus on - is whether it's usable (first and foremost) and whether it makes sense (aka the high-level context and initial meaning).

The Outputs

So, after a few months, numerous meetings and reviews here's the outputs. As I write this, Lisa said to me "this is just so wild" (in reference to hearing her voice VO'd in another language as well as her native language), I'll let you be the judge of that 🙂. If there's a version below that you can understand, then I urge you to check it out and see if you feel it makes sense. And if you have feedback, please post a comment below as I'm keen to keep this an open topic so that anyone can see the complexities of Localizing in general.

I fully expect some people to say maybe one sentence isn't ideal where-as on the flip side someone else might say it's fine, which would highlight the challenges, because sometimes it can be subjective as to what is considered a good quality translation vs "is it usable"?

🇫🇷 French

🇨🇦 French Canadian (Quebecois)

🇩🇪 German

🇮🇹 Italian

🇯🇵 Japanese

🇧🇷 Brazilian Portuguese

🇪🇸 Spanish

Summary

The question I'm sure your wondering, "did we have to do this?", well no of course not. However now that we have there is a chance that more people in our ecosystem and community can understand the content Lisa and I talked about in the episode. Either because they are early in the SN journey and want to learn more, or if they are a Business (non-tech) person and are curious on the topic.

Will it ever be perfect? no not likely, because some of the voice-overs needed multiple goes as we decided on changes to the translations or the speed of the VO in a particular moment needed adjusting or an inflection wasn't quite right.

Hopefully this will show you that it's all doable, if you have a need to provide translated content, it can be done. A bit of planning, and a few decisions later and some time to review the output (if it's high touch content) and it can be pretty good. Even more so if it serves it's purpose of helping people.

As always, please like, share and subscribe if you found this useful as it always helps.

Srert43 · ‎12-29-2023

I was honored to co-host episode 30 of our "Platform Academy," focusing on Localization. We delved into the what, why, and how of localization, showcasing tools and platform capabilities for those seeking to offer their instances in multiple languages. Check it out for insightful guidance.

bindu214301 · ‎04-17-2025

This is great content. Thank you very much Alex!! @Alex Coope - SN