Limecraft Flow – using AI for automatic Subtitling and Localisation

Please welcome Maarten Verwaest.
He’ll talk about harnessing A.I. for automatic subtitling. Take it away.
Thank you, good morning. So we’re going to talk for a couple
of minutes on the specific subject of subtitling. Subtitling in the same
language and subtitling to other languages, the latter being an
issue in Europe because we have 27 official languages. The European
Commission do all they can to stimulate the proliferation of
European works throughout Europe and beyond. Limecraft is incorporated in
2010 as an asset management offering. Our customers are producers,
distributors and broadcasters. Initially, we focused on processing raw material, automating the
workflow into post-production. Eventually, we added subtitling
to the workflow, because we wanted to offer a complete
workflow solution, from the camera to the distribution. With Limecraft Flow,
a browser based application, and the ingest software which runs local, we can support all sorts of complex
workflows, including TV series, documentary, entertainment and
e-learning applications. These are simple workflows, with a couple
of cameras and recording management out there. Also we automate all the
steps back and forth post-production, and eventually we take care of
the dispatching to VOD services. Customers include VRT,
France Télévisions, BBC, YLE and other broadcast stations. And post facilities, as well
as independent producers. But let’s talk for a couple
of minutes on the subject of A.I. One of the engines we’re
using is obviously Microsoft VideoIndexer. We all know it is kind
of fancy to include A.I. in your offering, but to that point it’s
just a tool, an instrument. So we asked ourselves these questions, on
behalf of our customers. How do we leverage the intelligence? What to
do with it? How do we cope with such massive input of metadata which is
generated? How do we make it usable? And because we’re talking
about subtitling, let’s explore the areas of transcription
and the creation of subtitles. Afterwards, we’ll have a short demo. In total, we support
over 100 languages. That’s because we’re running
different of speech to text engines in the background. The most tangible application,
the lowest hanging fruit, was to index content
for archive operators that need to check in hundreds
of hours of material per day. Speech transcription accelerates
that process and makes such content searchable. That’s the easy use case. It is what most asset management
vendors have in their offering. The speech to text on its own
can then be extended with auto tagging or auto categorisation,
which is a use case on its own. But of course we took the extra
mile. Here is another interesting use case. In a fiction context, we listen to the actor’s voices
and compare the transcript with the screenplay, to match the
shots with the screenplay. So we automatically prepare the
rough cuts for the editor. So now we’re about to look at
the more interesting cases. A third application uses speech
transcription for the purpose of documentary making. Documentary makers would typically
hire students or interns to transcribe content using a
Word document, print it out, and do the paper edit by using
a color marker. They would highlight the quotes that make up a storyboard and that’s how
they prepare the edit. So we just do the same, only now it happens in a browser. And then in 2015, we had the European
accessibility act. In the United States, we have the
American Disabilities Act. Overnight, it was made it became mandatory to
use closed captions on your content. So a few broadcasters out there
asked us, given that we mastered the speech the text part, why didn’t
came up with a piece of software that takes the transcript, takes
into account all the spotting rules, and cuts the transcript into
properly properly timed subtitles. From an engineering point of
view, it was an interesting challenge. It was much more complex than we thought. BBC and NPO in the
Netherlands where the lead customers. So bear with me that in
Europe we’ve been using subtitles since the 70s and that
quality standards are really high. So whereas speech technology has been in the market for 30 years and was
quite easy to implement, cutting the transcript into subtitles was a
piece of software that didn’t exist. The BBC style guide, just to give you
an idea, is a document of over 100 pages specifying all the timing rules
for subtitles. Minimum 2,5 seconds on screen, maximum five seconds.
Unless you’re close to a scene change, because subtitles should be
timed along the rhythm of the edit. For the BBC, we have to align exactly
on the shortcut, whereas for Netflix we have to stay away three frames
from the shortcut. So when our customers are dispatching BBC content
for Netflix distribution abroad, we’ll modify the template
accordingly. All that intelligence is now available in software. So when you start from the
transcription, just hit automatic subtitling and that’s it. Let’s
see if we can have a short demo. Where is the pointer…
You’re on extended mode right now. So, there we go. This
piece of content has been transcribed. As the result
of the transcription all words have timecodes… Hi and welcome to localisation
essentials. My name is Sven and… You’ve noticed that every word now
has time codes. So we can create fancy applications, like clicking
in the text and the player will follow. Now we’ll open the Subtitler app. Brilliant. We have the text. We have set the timing and styling
rules in the configuration section, where we can turn the buttons like
we want. If you’re Premier League football and the speakers have
a pace of 240 words per minute, you’ll need 60 characters on a
line. Whereas the BBC require 37 characters maximum. And if
you’re publishing on social media, you might even go smaller to
35. It depends on the use case. But let’s cut this piece of
five minutes in subtitles. It’ll take 10 to 20 seconds to process the timed transcripts and
the shot cuts, if any (these are talking heads), to
create the closed captions. …from Norway. And I’m Christina
from Lebanon. We both… Machine transcription is a source
of errors. So I didn’t cheat… My name is Sven and I’m from Norway. Then there should be a speaker
change, but because there was no pause, it couldn’t detect the
speaker change. Speech recognition is best effort. You would then
need to go into the editor to add a period and split the
subtitles. Yet, according to our statistics, you’ll still save between 80 and 90 percent of the
turnaround time to create subtitles. We guarantee that, using the given
transcript, the created subtitles are as close as possible
to official guidelines. In case it is not possible
to accommodate all the styling rules, we’ll put
a gentle warning sign for the media manager or the subtitle
operator. So he or she can decide to either modify the content,
or to publish it as such. For the hard of hearing it might
be necessary to use a literal transcription and then we’ll leave
it as such. When finished, you can export the subtitles
in a couple of formats. SRT is the most common. You can
use SRT files for YouTube or Facebook. Most content delivery
networks accept SRT files. Next, as suggested in the
introduction, in Europe we’re using 27 official languages.
Producers have a material interest in localising content. So we added machine translation to the pipeline. Let’s assume we’re addressing a Latin American audience. We’ll translate this on
the spot into Spanish. Engineering wise, this was an
even more complex challenge. The subtitles might have been
modified by the subtitle operator, so we can’t rely on the
original transcription. We have to use the subtitle template
to reconstruct a pseudo-script. Upon translation there is a lot of
time management going on in the background, because machine
translation is not aware of time codes. We are but machine
translation is not. As soon as we have recreated a translated and
re-timed transcript, we can cut it again in subtitles. Along the same rhythm of the template. are part
of the localization team here at Google and we’ll be your guides
into the world of localization… not fluent in Spanish… So what
is localisation? In the world of information technology, localization
means translating and adapting a product or service to a particular
language culture or any graphic market. There are two main aspect to
that. The first one is stylistic. You have to make sure the language
tone you use is appropriate for each local culture. Then there is
also a technical aspect. You may have to make changes to things
like date and time formats. Alphabetisation or even the…
So, what we’re looking at is no rocket science. Honestly. We’ve taken the tools that
are out on the market. ASR, cutting in subtitles, machine
translation… We built a properly working combination. But it is a real
use case, not just technology. It is a valuable use case which saves our
users, producers, distributors and broadcasters a lot of time. I’ll save myself the hassle of
going back to the PowerPoint but if you want, there is
a free trial available. More info on We’re happy to support
you in some test samples. You guys have an open API? Of course we do. Every possible action on a on a
video item, on the transcript or the subtitles, is available as well
through the API. As a matter of fact, 50 percent of our use comes
from third party applications that don’t use the user interface but just use the backend. Wonderful. Do we have any
other questions for Maarten? Yeah. Let me bring
the mike over to you. Can you start from a different
language like Hindi for example? We can start from different
languages. You just need to import the subtitle file. To be very honest with you, there are
two different use cases. The first one is same-language subtitling,
usually for accessibility purposes. This is where we turn domestic
audio into domestic subtitles. The other use case is localisation
for VOD operators. In this use case we get original content in whatever
language with a set of original subtitles. The challenge for the
language service provider or the translation agency is to translate
the given subtitles to another language. It saves them the burden
of executing speech recognition which is the most important
source of errors. So yes we can. Wonderful. Any other questions. All right well please give Maarten
a big round of applause everyone. Thank you Maarten. That
was really fascinating presentation. We appreciate that so.

Add a Comment

Your email address will not be published. Required fields are marked *