Thursday, June 25, 2009

Decoding Theora files using libtheora

My last post covered read Ogg files using libogg. The resulting program didn't do much but it covered the basic steps needed to get an ogg_packet which we need to decode the data in the stream. The thing step I want to cover is decoding Theora streams using libtheora.

In the previous post I stored a count of the number of packets in the OggStream object. For theora decoding we need a number of different objects to be stored. I encapsulate this in a TheoraDecode structure:

class TheoraDecode { 
...
th_info mInfo;
th_comment mComment;
th_setup_info *mSetup;
th_dec_ctx* mCtx;
...
};

th_info, th_comment and th_setup_info contain data read from the Theora headers. The Theora stream contains three headers packets. These are the info, comment and setup headers. There is one object for holding each of these as we read the headers. The th_dec_ctx object holds information that the decoder requires to keep track of the decoding process.

th_info and th_comment need to be initialized using th_info_init and th_comment_init. Notice that th_setup_info is a pointer. This needs to be free'd when we're finished with it using th_setup_free. The decoder context object also needs to be free'd. Use th_decode_free. A convenient place to do this is in the TheoraDecode constructor and destructor:

class TheoraDecode {
...
TheoraDecode() :
mSetup(0),
mCtx(0)
{
th_info_init(&mInfo);
th_comment_init(&mComment);
}

~TheoraDecode() {
th_setup_free(mSetup);
th_decode_free(mCtx);
}
...

The TheoraDecode object is stored in the OggStream structure. The OggStream stucture also gets a field holding the type of the stream (Theora, Vorbis, Unknown, etc) and a boolean indicating whether the headers have been read:

class OggStream
{
...
int mSerial;
ogg_stream_state mState;
StreamType mType;
bool mHeadersRead;
TheoraDecode mTheora;
...
};

Once we get the ogg_packet from an Ogg stream we need to find out if it is a Theora stream. The approach I'm using to do this is to attempt to extract a Theora header from it. If this succeeds, it's a Theora stream. th_decode_headerin will attempt to decode a header packet. A return value of '0' indicates that we got a Theora data packet (presumably the headers have been read already). This function gets passed the info, comment, and setup objects and it will populate them with data as it reads the headers:

ogg_packet* packet = ...got this previously...;
int ret = th_decode_headerin(&stream->mTheora.mInfo,
&stream->mTheora.mComment,
&stream->mTheora.mSetup,
packet);
if (ret == TH_ENOTFORMAT)
return; // Not a theora header

if (ret > 0) {
// This is a theora header packet
stream->mType = TYPE_THEORA;
return;
}

assert(ret == 0);
// This is not a header packet. It is the first
// video data packet.
stream->mTheora.mCtx =
th_decode_alloc(&stream->mTheora.mInfo,
stream->mTheora.mSetup);
assert(stream->mTheora.mCtx != NULL);
stream->mHeadersRead = true;
...decode data packet...

In this example code we attempt to decode the header. If it fails it bails out, possibly to try decoding the packet using libvorbis or some other means. If it succeeds the stream is marked as type TYPE_THEORA so we can handle it specially later.

If all headers packets are read and we got the first data packet then we call th_decode_alloc to get a decode context to decode the data.

Once the headers are all read, the next step is to decode each Theora data packet. To do this we first call th_decode_packetin. This adds the packet to the decoder. A return value of '0' means we can get a decoded frame as a result of adding the packet. A call to th_decode_ycbcr_out gets the decoded YUV data, stored in a th_ycbcr_buffer object. This is basically an array of the YUV data.

ogg_int64_t granulepos = -1;
int ret = th_decode_packetin(stream->mTheora.mCtx,
packet,
&granulepos);
assert(ret == 0);

th_ycbcr_buffer buffer;
ret = th_decode_ycbcr_out(stream->mTheora.mCtx, buffer);
assert(ret == 0);
...copy yuv data to SDL YUV overlay...
...display overlay...
...sleep for 1 frame...

The 'granulepos' returned by the th_decode_packetin call holds information regarding the presentation time of this frame, and what frame contains the keyframe that is needed for this frame if it is not a keyframe. I'll write more about this in a future post when I cover synchronising the audio and video. For now it's going to be ignored.

Once we have the YUV data I use SDL to create a surface, and a YUV overlay. This allows SDL to do the YUV to RGB conversion for me. I won't copy the code for this since it's not particularly relevant to using the libtheora API - you can see it in the github repository.

Once the YUV data is blit to the screen the final step is to sleep for the period of one frame so the video can playback at approximately the right framerate. The framerate of the video is stored in the th_info object that we got from the headers. It is represented as the fraction of two numbers:

float framerate = 
float(stream->mTheora.mInfo.fps_numerator) /
float(stream->mTheora.mInfo.fps_denominator);
SDL_Delay((1.0/framerate)*1000);

With all that in place, running the program with an Ogg file containing a Theora stream should play the video at the right framerate. Adding Vorbis playback is almost as easy - the main difficulty is synchronising the audio and video. I'll cover these topics in a later post.


Categories: , ,

Labels:

11 Comments:

Blogger Gerv said...

Once we get the ogg_packet from an Ogg stream we need to find out if it is a Theora stream. The approach I'm using to do this is to attempt to extract a Theora header from it.

Is that really the recommended way? I can't believe they created the standard without a way of determining what type each packet is...

2:18 AM  
Blogger Chris Double said...

The first header packet in a stream contains a text identifier enabling detecting the stream type without decoding. These are listed, along with mime types, in rfc 5334.

I could check this manually - I'd have to check if it's a header page, and if it contains the string, etc. Easier to just call the API function to attempt to decode the header. And it means I don't have to worry about the internals of the first header packet for each codec.

2:31 AM  
Blogger Gerv said...

But doesn't that mean you are a) relying for normal operation on the library's error-handling being correct and robust, and b) relying that no other packet you ever come across will, by coincidence, look like a header?

3:34 AM  
Blogger Chris Double said...

Yes, it relies on the libraries error handling to be robust. libtheora checks if it's a header. If it's a data packet it checks if it has read all the headers previously.

You can check a bit in the packet to see if it's a header packet and libtheora exposes this via the th_packet_isheader function. The API documentation recommends using th_decode_headerin however as th_packet_isheader doesn't check if it's specifically a theora header.

libvorbis doesn't provide this function and also requires calling a header decode function to determine both if its a header and what the stream type it is.

It looks like the standard way of processing the packets.

3:41 AM  
Blogger George Chriss said...

Is it somehow possible to see an error message in addition to the catch-all "grey X" if a <video> element cannot be displayed in Firefox? A specific error message would be helpful in diagnosing broken video.

6:34 AM  
Blogger Chris Double said...

George, It's a good idea - there's a similar bug in bug 494379 asking for a console error message if a video can't play due to not being supported. Extending this to just display an error for any playback issue might be the approach to take.

12:43 PM  
Blogger dbt said...

Is it that hard to just implement it right? This is content sniffing all over again.

9:05 PM  
Blogger Chris Double said...

dbt, I'm not sure what you mean by implementing it right. We have to read the packet to find out what type of stream it is. There's no other way to find out other than 'sniffing the content'. The Ogg file contain contain any type of stream in any order.

10:12 PM  
Blogger Gregory said...

From Timothy Terriberry (the captcha here is broken for him):

A note on error handling: OC_NOTFORMAT is from the old pre-1.0 API. It happens to be the same as TH_ENOTFORMAT from the new API, but you have to include the old API's header to get it.

Also, there are other common return codes that can occur in normal operation. For example, TH_EBADHEADER gets returned if the comment header claims to have a multi-gigabyte comment in it, when the actual packet is only a few bytes (or other similar conditions). TH_EVERSION will get returned if we ever update the bitstream, making old decoders incompatible, and you try to decode a new stream with an old version of the library. See the API documentation linked in the original post for details.

Ignoring these errors can have security implications (cause crashes) when you try to use uninitialized data structures later. Chris surely knows all of this, and I know this was just meant to be a simple example, but it's worth mentioning.

2:37 AM  
Blogger Chris Double said...

Thanks for the comments Tim and Greg, very helpful!

I've disabled the captcha - it seems to be causing a few problems.

I'll update the example with the correct error codes.

For errors in the example program I'm pretty much just asserting and aborting the program rather than trying to recover or handle them. In the handling of the headers I'm failing the assert on any negative error code.

I've missed a few it seems though as my current example program segfaults on at least one of my bad file testcases - where ogg_stream_packetout returns -1. I'll fix them before I do the next post.

3:39 AM  
Blogger Mircea said...

A question. Scenario: video conferencing. I want to encode images from video camera and send them to multiple locations. At codec initialize I get the headers using th_encode_flushheader. I need to send them to each participant in order to initialize the decoder. Then I send encoded frames by calling th_encode_packetout. I think I need to send the headers each time I send a encoded frame - in case someone joins later and he missed the initial package with the headers.
Is my assumption correct ?

3:18 AM  

Post a Comment

<< Home