Automatically create video subtitles with Amazon Transcribe and Lambda

AWS is amazing. I can’t highlight enough, how smooth it is to automate some of the workflows, which otherwise is very tedious to achieve. If I were to list down some of my favorite AWS services, Lambda functions would probably be at the top. The serverless architecture lets you focus on the most critical aspect, your business logic, without having to worry about setting up deployments, scaling, and server administration in general. What’s even better, you can code in a language of your choice, as it supports most of the commonly used programming languages.

In this article, we will learn how to use some of the AWS services like Lambda, S3, and AWS Transcribe to automate the generation of subtitles for a video file.

Problem Statement: Automatically generate a subtitle file (.vtt/.srt) when an mp4 video file gets uploaded to S3.

AWS Services Used:

AWS Transcribe: AWS Transcribe is the speech-to-text solution provided by Amazon Web Services which is quick and has a high accuracy. AWS Transcribe uses a deep learning process named ASR (automatic speech recognition) to convert the audio to text.
AWS S3: Use S3 bucket as input storage for your video files. Also as an output storage for generated subtitle file (.vtt/ .srt).
AWS Lambda: Function that gets triggered by an s3 event on mp4 file creation. It will call the transcribe service. Once transcription is completed, another lambda function get's triggered by the output subtitle file which deletes the transcription job.

Let's Code

const AWS = require('aws-sdk')
const transcribe = new AWS.TranscribeService()

exports.handler = async (event, context) => {
  try {
    const eventRecord = event.Records && event.Records[0]

    // input bucket
    const bucket = eventRecord.s3.bucket.name

    // this is your mp4 file
    const filename = eventRecord.s3.object.key

    // Extract uuid as my filename is in the format of
    // "<uuid>-video.mp4"
    const uuid = filename.slice(0, 36)

    // Create a job name based on the uuid so we can identify the job
    // later and delete it
    const jobName = `transcribe-${uuid}`

    console.log('Going to start transcription job: ' + jobName)

    // S3 url of the file to transcibe
    const fileUri = `s3://${bucket}/${filename}`

    // Parameters for the transcribe job, see: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartTranscriptionJob.html

    const params = {
      LanguageCode: 'en-US',
      Media: {
        MediaFileUri: fileUri,
      },
      MediaFormat: 'mp4', // Incoming media format, supports mp3 | mp4 | wav | flac | ogg | amr | webm
      TranscriptionJobName: jobName,
      OutputBucketName: bucket, // We use the same bucket for output
      OutputKey: uuid + '-audio', // use same UUID for the output file
      Subtitles: {
        Formats: ['vtt'],
        OutputStartIndex: 0,
      },
    }

    const resp = transcribe.startTranscriptionJob(params).promise()

    await resp
      .then((data) => {
        console.log('Transcribe job: ' + JSON.stringify(data))
      })
      .catch((err) => {
        console.log(err, err.stack)
      })

    return {
      responseCode: 200,
      body: `Successfully started transcription job: ${jobName}`,
    }
  } catch (err) {
    return {
      responseCode: 200,
      body: JSON.stringify(err.message),
    }
  }
}

The above lambda function creates a unique transcription job each time it is invoked. We do not need it once the transcription process is complete. In order to automate the deletion of the transcription job, we can setup another lambda which is triggered by an s3 event for .vtt file creation. Remember, we've used same uuid as the incoming file, for both the output file and for the transcription job. Here's a code snippet for the same:

const AWS = require('aws-sdk')
const transcribe = new AWS.TranscribeService()

exports.handler = async (event, context) => {
  try {
    const eventRecord = event.Records && event.Records[0]
    const filename = eventRecord.s3.object.key
    const uuid = filename.slice(0, 36)
    const jobName = `transcribe-${uuid}`

    console.log('Going to delete transcription job: ' + jobName)

    const params = {
      TranscriptionJobName: jobName,
    }

    const resp = transcribe.deleteTranscriptionJob(params).promise()

    await resp
      .then((data) => {
        console.log('Done!')
      })
      .catch((err) => {
        console.log(err, err.stack)
      })

    return {
      responseCode: 200,
      body: `Successfully deleted transcription job: ${jobName}`,
    }
  } catch (err) {
    return {
      responseCode: 200,
      body: JSON.stringify(err.message),
    }
  }
}

Make sure your transcribe lambda has these roles assigned: AmazonS3FullAccess and AmazonTranscribeFullAccess. Also, create a .mp4 file creation trigger (s3 Event) on your transcribe lambda and a .vtt file creation trigger on your deleteTranscription lambda.

That's it. With this, you should be able to create a subtitle file when an mp4 video is uploaded to S3.

Cheers!