Extracting Thumbnails Faster with FFmpeg

January 12, 2015

Topic tags

Brendan Schwartz

Founder, CTO

One handy feature of Wistia is the ability to easily set a video’s thumbnail (or still or poster, if you prefer) to an exact frame.

You open up the Customize panel, seek to the frame you want in the video (for bonus points, use the arrow keys to single frame advance to the exact frame you want), then click Use the current frame, and ta-da: we grab that frame and set it as the video’s thumbnail.

The only problem is that there are cases where that “ta-da” isn’t quite a “ta-da.” Occasionally, it takes us several minutes to process the thumbnail change after you select it. Taaaaaaaa-zzzzzzz.

How thumbnails are made

To understand what’s happening behind the scenes, here’s a bit more context about our system’s architecture.

When you upload a video to Wistia, it’s sent to an internal service called the Bakery. The Bakery handles all of our video ingestion, transcoding, storage, and delivery. The upload gets routed to a cluster of Bakery upload servers at a Rackspace data center outside Chicago. When the upload is complete, we extract a thumbnail from the middle of the video using ffmpeg, a popular open source project that provides tools to manipulate video. Because the file is on the local filesystem, this is a quick operation: it usually takes less than a second.

Now, say you don’t like the thumbnail the system automatically chose for you, and you want to pick your own. In your Wistia account, you pause the video on the frame you want as your thumbnail, and you click “Use the current frame.” Here’s what happens when you click that link:

The Wistia application servers send a message to the Bakery’s API, instructing it to extract a frame from the video at the time you specified.
The Bakery checks and sees that the video file is stored locally on one of the nodes in the upload cluster, so it routes the request to the machine that has the video file.
The server with the video file receives the request and extracts the thumbnail immediately using ffmpeg.

Great! Now you see the new thumbnail, and all this took only a second or two. Ta-da, for sure!

Here’s where things take a turn for the worse. It’s been a few days since you uploaded your video, you’re back in your Wistia account, and you want to pick a new thumbnail. We have a problem: the video isn’t stored locally on the cluster any more.

Because our upload cluster has limited disk space, some time after you uploaded your video, we shuttled it off to Amazon S3 for long-term storage and removed it from the local disk in the cluster.

This isn’t the end of the world; we just need to do things a bit differently. Here’s what happens when you click “Use the current frame” for a video that’s not stored locally on the cluster:

The Wistia application servers send a message to the Bakery’s API, instructing it to extract a frame from the video at the time you specified.
The Bakery determines that the video file is not stored locally on a node in the cluster, so instead of processing the request immediately, the Bakery registers a background task to create this thumbnail.
A worker machine (usually used for encoding videos) grabs this task from the queue and starts working on it.
In order to extract the thumbnail, the worker machine first downloads the video file from S3 to its local disk.
The worker extracts the thumbnail using ffmpeg, like before.

This works, but it’s not fast. Video files can be big. It’s not uncommon for files to be many hundreds of megabytes or several gigabytes in size. And not only that, but network speeds are only so fast. We typically see speeds in the low 10s of MB/s from S3 to our machines at Rackspace. So let’s say your video file is 1 GB. That means it’s likely going to take between 20 and 50 seconds to download that file.

That’s slow. And it’s especially annoying because from your perspective, nothing’s changed. Why was it really fast to select a thumbnail the first time, and painfully slow the next time? Chances are, because it was so fast the first time, when it took 30 seconds the next time around, you didn’t even wait for it — you assumed something was broken. That’s what I’d assume!

Aside: yes, we can probably get that download speed up from S3 using multithreaded download, but we’re still looking at tens of seconds for multi-gigabyte files. Also, network speeds over the public internet aren’t the most dependable in general, so while this would speed things up, it wouldn’t truly solve the problem.

A new hope

Luckily, there’s a better way. ffmpeg supports several video input protocols aside from the local filesystem, one of which is https. When I first discovered this, I noticed there was a seekable option. After perusing the source, it looked like ffmpeg was capable of making byte-range requests. This is great. Instead of downloading the whole file to extract a single frame, perhaps ffmpeg will only get the parts it needs.

Testing the behavior is easy enough. Here’s a 36-minute, 1.5 GB video of a yule log (a belated happy holidays!):

~  curl -I https://embed-ssl.wistia.com/deliveries/b1663daaa577df844103db4d189bd7611c3c6573.bin
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Allow-Methods: GET, HEAD, OPTIONS
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Origin, Content-Type, Accept, Server, x-amz-version-id, X-Cache
Access-Control-Request-Method: *
Cache-Control: max-age=31557600
Cache-Control: max-age=31557600
Content-Type: video/mp4
Date: Mon, 15 Dec 2014 22:59:22 GMT
Etag: "06511b715e105fe2a240a2f4deeb6d75"
Last-Modified: Tue, 09 Dec 2014 00:02:44 GMT
Server: nginx/1.4.6 (Ubuntu)
Set-Cookie: __bakery_session=BAh7BkkiD3Nlc3Npb25faWQGOgZFVEkiRTFmOTY0ZWI2ZGExZTg5NTI0M2Qx%0ANzY1NWZiYjFlNmJlNWRlM2QxNWJmZTYxODZhMzhiNmVlMDIwYjc3Y2Q1MDgG%0AOwBG%0A--4fb22b894277098a63d09479bca459c51a417aef; path=/; HttpOnly
Set-Cookie: rack.session=BAh7BkkiD3Nlc3Npb25faWQGOgZFVEkiRTFmOTY0ZWI2ZGExZTg5NTI0M2Qx%0ANzY1NWZiYjFlNmJlNWRlM2QxNWJmZTYxODZhMzhiNmVlMDIwYjc3Y2Q1MDgG%0AOwBG%0A; path=/; HttpOnly
x-amz-version-id: _n0VW.055MJfZhuzOBBnsVbSjfgNS_0z
X-Served-By: bakery-breadroute-naan,bakery-prime-cyclops
Content-Length: 1537868391

Downloading this file from one of our worker boxes took 27 seconds (pretty speedy, actually):

—— bakery-prime-zoo.ord.wistia.land 23:00:55 —— ~
$ wget https://embed-ssl.wistia.com/deliveries/b1663daaa577df844103db4d189bd7611c3c6573.bin
--2014-12-15 23:01:02--  https://embed-ssl.wistia.com/deliveries/b1663daaa577df844103db4d189bd7611c3c6573.bin
Resolving embed.wistia.com (embed.wistia.com)... 72.21.81.253
Connecting to embed.wistia.com (embed.wistia.com)|72.21.81.253|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1537868391 (1.4G) [video/mp4]
Saving to: ‘b1663daaa577df844103db4d189bd7611c3c6573.bin’

100%[======================================================================================================================================>] 1,537,868,391 67.3MB/s   in 27s

2014-12-15 23:01:30 (53.3 MB/s) - ‘b1663daaa577df844103db4d189bd7611c3c6573.bin’ saved [1537868391/1537868391]

Now let’s grab the thumbnail from 30 minutes into the video we downloaded:

time ffmpeg -ss 1800 -i b1663daaa577df844103db4d189bd7611c3c6573.bin -vframes 1 -vcodec png -an -y %d.png
real   0m0.376s
user  0m0.463s
sys   0m0.061s

In total, we have 27 seconds to download and 0.376 seconds for ffmpeg to get the thumbnail. So ~27.4 seconds all in.

Now let’s try having ffmpeg grab the still directly from the URL:

time ffmpeg -ss 1800 -i https://embed-ssl.wistia.com/deliveries/b1663daaa577df844103db4d189bd7611c3c6573.bin -vframes 1 -vcodec png -an -y %d.png
real   0m0.438s
user  0m0.469s
sys   0m0.068s

Wow, the whole thing took less than half a second. That’s over 60 times faster. And that’s with the download speeds that were significantly faster than what we normally observe.

Wiresharkin’

For the extra curious, you can use Wireshark to see what ffmpeg is actually doing:

We can see here that ffmpeg makes 4 separate HTTP calls, each for a different byte range. For each call, the start of the range is specified, but the end is not. This is because we don’t know how many bytes we actually need, we know where we want to start reading each time. When we get what we need, we close the connection. Here’s a breakdown of the requests for this file and why ffmpeg is making them:

Bytes 0-: Start at the beginning of the file. Most video files start with an identifier so you can tell what format it is. This is important because once we know the format, we’ll know where in the file we need to go next. In this case, this is an MPEG-4 video container.
Bytes 55198-: This is where the stsc atom is. This is the sample-to-chunk-table. In MPEG-4 parlance, a video is made up of one or more chunks, and each chunk is made up of many samples. This table tells you which samples belong to which chunks. The stsc atom is also immediately followed by the stsz atom, or the sample sizes atom. The stsz atom contains a map of the sizes in bytes of all samples. Using this information, we can calculate the byte offset for the keyframe closest to 30 minutes.
Bytes 887549-: This is where the mdat atom starts. This atom contains the video stream itself. I’m not entirely sure why this request is needed. Seems like we could have gone straight to request #4 with the information we got from request #2. If you know the answer, I’d love to know!
Bytes 1421450149-: This is the closest keyframe to 30 minutes in to the video stream. We seek here to grab the frame we want!

Aside: I used AtomicParsley to analyze what all these byte offsets correspond to. It’s a handy command line tool that will parse MPEG-4 files and print out the location of all the atoms in the file.

And if you’re curious to know exactly why ffmpeg is making the requests it is, go straight to the source.

It’s so fast now!

After putting this change in production, we’ve seen more than a 5x speed up in thumbnail extraction times for videos that are not stored locally in our cluster. Not too shabby!

Topic tags

Mailing list sign-up form