Friday, June 05, 2009

Reading Ogg files with JavaScript

On tinyvid.tv I do quite a bit of server side reading of Ogg files to get things like duration and bitrate information when serving information about the media. I wondered if it would be possible to do this sort of thing using JavaScript running in the browser.

The format of the Ogg container is defined in RFC 3533. The difficulty comes in reading binary data from JavaScript. The XMLHttpRequest object can be used to retrieve data via a URL from JavaScript in a page but processing the binary data in the Ogg file is problematic. The response returned by XMLHttpRequest assumes text or XML (in Firefox at least).

One way of handling binary data is described in this Mozilla Developer article. Trying this method out works in Firefox and I can download and read the data in the Ogg file.

Ideally I don't want to download the entire file. It might be a large video. I thought by handling the 'progress' event or ready state 3 (data received) I'd be able to look at the data currently retrieved. This does work but on each call to the 'responseText' attribute in these events Firefox copies its internal copy of the downloaded data into a JavaScript array. Doing this every time a portion of the file is downloaded results in major memory use and slow downs proving impractical for even small files.

I think the only reliable way to process the file in chunks is to use byte range requests and do multiple requests. Is there a more reliable way to do binary file reading via JavaScript using XMLHttpRequest? I'd like to be able to process the file in chunks using an Iteratee style approach.

I put up a rough quick demo of loading the first 100Kb of a video and displaying information from each Ogg packet. This probably works in Firefox only due to the workaround needed to read binary data. Click on the 'Go' button in the demo page. This will load transformers320.ogg and display the contents of the first Ogg physical page.

I decode the header packets for Theora and Vorbis. So the first page shown will show it is for a Theora stream with a given size and framerate. Clicking 'Next' will move on to the Next page. This is a Vorbis header with the rate and channel information. Clicking 'Next' again gets the comment header for the Theora stream. The demo reads the comments and displays them. The same for the Vorbis comment records. As you 'Next' through the file it displays the meaning of the granulepos for each page. It shows whether the Theora data is for a keyframe, what time position it is, etc.

Something like this could be used to read metadata from Ogg files, read subtitle information, show duration, etc. More interesting would be to implement a Theora and/or Vorbis decoder in JavaScript and see how it performs.

The main issues with doing this from JavaScript seem to be:
  • Handling binary data using XMLHttpRequest in a cross browser manner
  • Processing the file in chunks so the entire file does not need to be kept in memory
  • Files need to be hosted on the same domain as the page. tinyvid.tv adds the W3C Access Control headers so they can be accessed cross domain but it also hosts some files on Amazon S3 where these headers can't be added. As a result even tinyvid itself can't use XMLHttpRequest to read these files.


Categories: , , ,

Labels:

9 Comments:

OpenID idpage said...

The attribution link for the original idea is gone, but is still at http://web.archive.org/web/20071103070418/mgran.blogspot.com/2006/08/downloading-binary-streams-with.html

6:30 PM  
OpenID carmen said...

i noticed a few months ago that ogg videos were actually playing inline on archive.org with my minefield build

presumably some HTML5 video thing with an existant API to get the metadata you need?

9:37 PM  
Blogger Chris Double said...

Carmen, yes I believe archive.org will playback using the native browser support. They video elements provides a reasonable amount of metadata (duration, etc) which can be used.

My Ogg/JavaScript explorations are mainly to prototype different ideas out - it doesn't really have much to do with the native video element. I'm exploring how far JavaScript can go :)

10:44 PM  
Blogger Drakim said...

I think the only reliable way to process the file in chunks is to use byte range requests and do multiple requests.

How about using Comet and printing the file chunk after chunk, until JavaScript has determined that is has had enough and closes the Comet connection?

It's essentially the exact same thing, but you don't need to do an entirely new request each time, which should make it faster and nicer.

9:40 AM  
Blogger ginger said...

oggz-dump in a browser! I love it!

9:35 PM  
Blogger sull said...

how are you doing it server-side? perl, php other?

8:36 AM  
Blogger Chris Double said...

This example doesn't do anything server side. It's all client side.

8:55 AM  
Blogger rdza said...

Works on iPhone OS3.0.

Would be great to have a pure JavaScript decoder of first vorbis and then theora especially for iPhone as mobile safari opens all audio and video it supports in fullscreen, defeating the whole purpose of mashing up html5 tags.

Perhaps a pure js ogg handler shim could solve this for all handicapped platforms.

8:00 PM  
Blogger enliteneer said...

I'm looking to retrieve and parse binary data every second or so, but I'm worried that eventually too much of the computer's memory will be used...

The amount of data transferred each time isn't too big, around 6kb or so, but how does XMLHttpRequest handle the memory allocation? By continuously requesting the data is new memory being allocated every time?

What's the best way to retrieve a binary file, parse it, de-allocate, and repeat?

7:07 AM  

Post a Comment

<< Home