Recently, we helped one of our clients (prominent marketing agency) to build the backend system of their new campaign site where people can upload videos which is then approved by admins and finally people can stream those videos. Later, at the end of the campaign, prizes were given to the top videos selected by judges.
A single video can be up to 1 GB. As it was a relatively large campaign, it was expected that a huge number of videos will be submitted. So, storage and scalability were a big issue here. Again, the system had to be capable of handling concurrent upload and download of videos even in high traffic condition. Another challenge was the smooth streaming of those videos as the majority of the traffic comes from mobile devices and streaming from video source directly using mobile data does not do very well. Also, we had to do some video processing after uploading the video.
Why Amazon Web Services (AWS)
After requirement analysis, we realized that AWS is the best cloud platform that could meet all its needs. Besides we were already using AWS in several of our projects and were familiar with its capabilities and scalability. We wanted to make our application backend as simple as possible by pushing most of the compute-intensive tasks to the cloud platform. The crucial part of building the application was choosing an effective object storage that provides durability and scalability at a lower price. Here, Amazon S3 comes in. Also, Amazon S3 can be easily integrated with Amazon CloudFront, the CDN service of AWS which helps in our video streaming issue. Besides, we needed to overlay watermarks on the uploaded videos which can be easily done with Amazon Elastic Transcoder and it charges based on video duration rather than video size. So, for these reasons, we choose to go with AWS.
How We Use AWS
We created our web application using Django and hosted in AWS. We used two Amazon EC2 instances for compute capacity where HTTP requests are distributed by Elastic Load Balancer, Amazon EBS for persistent block-level storage volumes, Amazon RDS as a cloud database, Amazon S3 for storing/retrieving data, Amazon Route 53 as a highly available, scalable DNS service, AWS Lambda to trigger job in Amazon Elastic Transcoder for media transcoding, AWS SNS to send email notification using AWS SES and finally Amazon CloudFront for media caching.
Fig: Reference architecture for web application hosting
We created three private buckets in S3 for our purpose: One for video upload, one for video thumbnail image and another for transcoded videos. When working with S3, normally, files are uploaded to S3 from server-side using SDK. In this process, at first, the server receives the files from client-side and then it uploads the file to S3. Between this transition from client-side to server-side to S3, files are temporarily held into server memory. This might not be an issue for uploading small sized files, but it is certainly a big issue if the file size is very large.
Consider a scenario where people can upload maximum 5 GB of files at a time and the application server is hosted on AWS EC2 instance which is attached to a 40 GB AWS EBS. At any moment, if 10 people concurrently upload 5 GB of files each, then the total size of files that the server receives becomes larger than its memory capacity. And we certainly don’t expect to happen this with our own production server. So, the solution to this problem was to upload files directly to S3 without any intervention of our application backend.
When the video upload form is submitted, we generate a presigned URL for the video using SDK and then post the video to that presigned URL using AJAX. This way videos were uploaded directly to S3 from client side without needing anything to do from our application backend. Again, when admin tries to download the videos, we again generate a pre signed URL for that video and using that URL video can be downloaded directly. We also enabled the transfer acceleration property of the video upload bucket to reduce latency while uploading. This way videos are uploaded to the nearest edge location and through AWS backbone network, finally, the file will be uploaded to S3. It is particularly useful if the bucket lives in a different region than the user.
To transcode videos, we created a pipeline in Elastic Transcoder by defining our input, output and thumbnail buckets. We also enabled on completion event and on error event which uses SNS to send emails using SES. Then we created a Transcoding Preset by defining our desired output video format, size along with the watermark image, size, position etc. Now, to transcode each video, we needed to create a job using the pipeline and preset. But we wanted to run this job automatically every time a new video is uploaded into the S3 bucket.
Fig: Sample lambda architecture
We created a Lambda function which is triggered automatically when a video is uploaded in the S3 bucket. It creates a Transcoder job for that video and then the Transcoder runs the job. Finally, we created an RTMP distribution of the transcoded video bucket using CloudFront for faster streaming of videos.
At the time of writing this article, more than 5,000 videos have been uploaded within the first 3 weeks and the site has on average 10,000 visitors per day. Even with that much traffic, the CPU utilization remains considerably low and Elastic Transcoder is performing flawlessly. Also because these cloud services require minimal administration, developers can spend their time working with the product team to add business value rather than managing infrastructure.