Human-in-the-loop: who are the humans, and why are they in the loop?

Humans have long been obsessed with the idea of creating machines capable of speaking ‘human’. Some medieval alchemists supposedly possessed ‘brazen heads’, which were machines given life and gifted with human speech. In the 1950s, physicists created the first computer-based speech synthesis systems, and the first text-to-speech systems were developed in the following decade. Computer-synthesized speech has been in our lives via personal computers and devices since the 2010s, but it’s only in recent years that the wizardry of AI dubbing has truly become a reality.

The thing about magic, though, is that it typically requires a magician, and here at Papercup, we employ a whole host of magicians to help corral and refine our speech technologies into producing AI dubbing magic. They are our humans-in-the-loop!

What is the loop?

‘The Loop’ is the network of automated voice technologies deployed by Papercup to provide its AI dubbing services. Text-to-speech, speech-to-speech, and hybrid text-to-speech and speech-to-speech technologies are utilized at various points in our workflows depending on the type and style of the content we’re dubbing. Whether documentary/reality content or scripted dramas, UN-style voice-over or lip-synced dubbing, each video requires its own combination of technologies to dub it into a new language. On their own, these technologies can achieve surprising results, which are only improving by the day; however, to ensure that the dubbing doesn’t sound robotic and to achieve truly lifelike voices, capturing the nuances and cultural particularities of human speech, humans are still integral to the process.

Who are the humans?

Utilizing AI technology in the workflow allows Papercup to dub its clients’ content at a speed and scale unachievable by traditional dubbing. Combining the technology’s efficiency with our human-in-the-loop model ensures that we do not compromise on the quality of voices or localization.

Much like dubbing studios or localization service providers offering dubbing services, Papercup has a large community of talented people undertaking all of the same roles expected in a traditional dubbing environment. These people are utilized at various stages of the workflow and to varying degrees depending on the level of service required from the client.

The role of Papercup localization specialists

Our localization specialists are experienced audiovisual translators and linguists who use Papercup’s text-to-speech technology to provide high-quality localisation of client videos in multiple languages.

Localization specialists use machine transcription and machine translations created by large language models to create a translated dubbing script that Papercup’s catalogue of proprietary voices can then voice.

What is the point of AI if localization specialists are still essential to the process?

Much of the AI in use today requires a human to be the arbiter of quality, which is why AI is best placed to augment processes rather than automate them. In the AI dubbing process, machine learning is used to increase the speed of the translation and narration, freeing translators to work on creative script adaptation work, and voice actors to enhance expressivity where it’s required.

Don’t humans slow down the process?

With the efficiencies gained from streamlining the translation process, our localization specialists can spend their time editing the translations to convey cultural nuances – a job usually carried out in the script adaptation stage of traditional dubbing. They then use text-based commands within Papercup’s studio platform to manipulate the synthetic voices’ pronunciation, cadence and tone to achieve a fully localized and dubbed video.

The role of Papercup voice actors

When the content and style of dubbing require it, Papercup works with talented voice actors with a wealth of experience in traditional voice acting to enhance the expressivity of its synthetic voices.

We work with voice actors in two distinct ways. We commission them to record bespoke scripts and use that data to produce world-class realistic AI voices for specific use cases. They’re also involved in our production workflows. For highly expressive video content, where the level of emotion, pitch, and tone cannot be achieved through text-to-speech alone, Papercup’s voice actors work from their home studios to help identify areas where the prosody could be improved to match the source language dialogue and record an improved version in the target language.

Voice conversion models are then used to map the voice actor’s performance onto a specific segment of dialogue and improve the quality and expressivity of the synthetic voice. At Papercup, we only need one voice actor to do the voice work for each video because controls within the studio allow voice actors to adapt their voices to sound like any gender, age or style of speaker. This saves us, and our clients, hours of casting and studio time. This approach is used to varying degrees depending on the level of expressivity the video demands.

The role of Papercup video editors and audio engineers

At the final stages of the workflow: Papercup’s skilled post-production team, which consists of video editors who localize video graphics, embed subtitles, and create different video cuts, and audio engineers who clean, edit, and mix the audio files to our client’s unique specifications.

Automated video and audio outputs are always available, however our talented video editors and audio engineers allow us to offer the same level of post-production services as a traditional dubbing provider and the tailored solution that our clients, especially those distributing on streaming and broadcast platforms, inevitably demand.

What the localization operations team is and what it does

These humans-in-the-loop are managed by Papercup’s in-house localization operations team, home to people with a wealth of industry experience in localization project management. Much of the team has worked in traditional dubbing environments for some of the world’s biggest streaming and broadcast clients.

The localization operations team ensures that client videos undergo a workflow crafted to their unique content. This involves pulling in the right people with the right experience to deliver the highest-quality dubbing in specific formats to a specific schedule.

Why the Papercup human-in-the-loop model makes us the best solution for your content
Traditional dubbing offers the highest quality dubbing

Traditional dubbing undeniably achieves high-quality results, but it can often be cumbersome and time-consuming; scripts have to be manually translated, multiple voice actors have to be cast, studio time has to be booked, and files have to be sent to post-production houses to be edited and mixed, not to mention the retakes!

Fully automated services offer rapid dubbing

Fully automated dubbing services do everything at speed, without human involvement. This works for time-sensitive content like news or for content that doesn’t need to convey complex emotions like disappointment or sarcasm, like training videos.

AI dubbing, perfected by humans offers the best of both worlds

The AI automation that forms the bedrock of Papercup’s studio improves efficiency at every stage of the dubbing process. Every stage of the process is managed in once place – the Papercup studio – vastly simplifying the dubbing process.

Papercup’s community of expert humans-in-the-loop guarantees quality translations, expressive voices and professional post-production services. And the best part? Our team’s work and feedback is looped back in and utilized by Papercup’s machine learning and engineering teams to improve the baseline quality of our automated workflows and synthetic voices. Magic!

Get in touch to see how AI dubbing can transform your localization.