Gsoc 2018 work on mediakit

dhruv · February 23, 2018, 8:02pm

Hello, i want to work on embedded subtitle support for mediakit in haiku. @jua and @PulkoMandy are the possible mentors for the given project. I am currently going through the BeBook for mediakit. Can you please provide some more pointers to go on ?

Barrett · February 23, 2018, 11:43pm

Hi dhruv, I think you are missing a possible mentor.

Subtitles should be implemented at the codec level of our API and made available through BMediaTrack. The idea is that Codecs should be available in two formats, styled text and bitmap. We have our own styled text format so it should work on that.

The main steps I can think right now are :

Implement an API to decode subtitles which will be implemented by the codecs.
Implement a filter to render subtitles to a bitmap
Find a way to customize the rendering of subtitles on the bitmap (font, position etc.)
Add support for the new formats in the media_kit headers
Add support in BMediaFile
Implement support in the ffmpeg plugin
Implement support in MediaPlayer

I’m pretty sure there’s more to add, this is just to give a general idea.

jua · February 24, 2018, 10:24am

Hey dhruv, thanks for your interest in this!

First of all I think we should define a scope for the project. Subtitle rendering can mean a lot of things these days, spanning from “simple” to “really complex”.

Subtitle support is mainly composed of the following parts:

Allow the extraction of subtitle data from media container formats (i.e. file formats such as MKV or MP4). This is merely getting the raw (encoded) subtitle stream, and needs to be integrated with our ffmpeg add-on. ffmpeg itself already supports this. Also needs to allow enumerating subtitle streams (files can contain any number) and getting meta-data about them (e.g. which format, language, etc)
A media node add-on which gets the subtitle data as input and renders it into a video stream of bitmap raw data as output; possibly composites it into destination video (see below).
Extend the support in our MediaPlayer to select and show subtitles to use the newly created infrastructure.

This may sound simple, but there are still many open questions in this which you could work on during GSoC:

We will need to define data formats for subtitles which can be sent to a media-node. Should they be a common format or do we want to pass along raw encoded subtitle data? Maybe we could even support both (a basic format for simple uses, the raw encoded data for complex formats).
The handling of time in the subtilte media node (especially in complex formats which support text animation)
Will the subtitle media node also composite the rendered subtitle bitmaps onto the video stream or is that the task of another media node?
What subtitle formats do we want to support, and where do we want to use 3rd-party libraries, and where implement things ourselves (and do we want that at all)? For example, in modern complex formats, the de-facto standard being ASS (no really, it’s the name…), it definitely makes sense to use a library, in this case https://github.com/libass/libass, to do the actual rendering.
Some subtitle formats allow the embedding of fonts. Currently we do not support such things due to limitations in our app_server (it’s the same problem which keeps WebPositive from being able to use web-fonts). If you’re interested, you could also work on integrating support for application-supplied font data in app_server.
Possible bonus feature: would be cool if our ffmpeg add-on not only supported extraction of subtitle data when demuxing (playback), but also muxing subtitle streams into containers (i.e. “add subtitle data to a video file”).

Oh and don’t worry: We do not expect you to answer these questions now I just want to give you an idea what kind of works needs to be done, so you can decide if you want to do it.

As for more pointers, have you already taken a look of the code of e.g. our MediaPlayer, or the old BeOS R5 Media Kit sample code?

Barrett · February 24, 2018, 10:43am

Please stop spreading this insane idea of having a media_node handle everything. This is bad by design. The subtitles should be available using BMediaTrack and BMediaFile, if the application running on top of it want to do so in a media_node is all at choice of the programmer.

If libass is going to be implemented then the right place is as a filter in the codec api. The same apply for the ffmpeg muxing.

jua · February 24, 2018, 10:47am

Like it or not, that’s the way the Media Kit is designed.

This is bad by design.

I disagree.

Barrett · February 24, 2018, 10:51am

So please show me a feature in Haiku which is implemented like you say. Yet explain me why Be engineers dropped the idea of using BFileInterface and media_nodes to handle muxing/demuxing and instead implemented a codec API with extractors and filters after BeOS R4. They should have been dumb then to do so?

I stated various reasons in past to avoid this, can you show me why you disagree and what’d be the advantage in doing so?

jua · February 24, 2018, 11:07am

The advantages are all those of data flow graph based media systems, the primary one IMO being flexibility. It’s why pretty much every other major media API is graph-based. Having the subtitle rendering as a separate node allows you to not just use it for playback of a video file with subtitles. You can use it in every other context you’d like, whenever you want to overlay subtitles (or more generically: timed text) over a stream of video, whatever its source, whatever its destination, even in video applications who were never meant to support subtitles (just rewire it in Cortex, done!). It also allows to easily set it up to render subtitles from a different source than the video source. You could even use the subtitle renderer node as a video titler effect node of sorts (given the feature-set of ASS, it would be quite fancy even) in a video editor application.

Barrett · February 24, 2018, 11:46am

Graph based? The best you can do with the media_kit right now is N:1 connections. I will not repeat the difficulties in handling formats and latencies in such a non trivial scenario as I have already explained in different places.

Right now there’s no such thing like this. However, the design I’m thinking of since a few months doesn’t prevent creating nodes to perform that and in fact we’d need to implement default nodes to do so someday. But the design flaw in your idea is that codecs should depend on nodes and this is completely bad for a whole lot of reasons.

I think you should be talking of your own imaginary media framework because this isn’t how our codec API works, you are free to rewrite it to work only on media nodes but this isn’t how it works right now, for good reasons as I said.

Are you aware of the current problems due to non homogeneous implementations of nodes in the system? What you describe there isn’t possible right now and would require writing a lot more code.

OK, that looks very interesting in the goal, this is what everyone want to do, but completely out of how the codec kit and media kit works. As said right now you can’t do that unless you rewrite the codecs to work in such a way (and then you’d realize why it’s a bad idea).

If what you say was true we didn’t have a codec API and everything would be handled by nodes. As said there are good reasons for not doing so.

The whole point is to implement those features at the codecs level independently from the media_kit. Then once those features are available, you’d be able to write all your fancy media nodes to do what you want, but still I don’t want an app to be necessarily tied to the media nodes.

We have a codec API which is thought to be made public someday in the future, the whole point of this API is to make an interface for extractors, decoders, encoders, filters and so on. Those interfaces in future will be implemented in true media_nodes and independently on the codec (as an example if it’s an audio or video filter doesn’t matter) the programs will be able to instantiate such nodes to do real time processing. But once again, nodes have nothing to do with the implementation of subtitles itself, we have already an API in place which is obviously made for doing what I described.

jua · February 24, 2018, 11:57am

Well, I do not want to further the discussion into offtopic-land, let’s keep focussed on subtitles.

I don’t see how subtitle rendering belongs into video codecs. If you want to integrate it, I could understand integrating it into the video renderer node (pretty much how subtitles currently work in the MediaPlayer already). By the way, the main reason why MediaPlayer’s subtitle rendering wasn’t made its own media node is IIRC performance issues with the Media Kit implementation. Those should be solved some day, but until then, I’d be ok with keeping it in the video renderer.

Barrett · February 24, 2018, 12:19pm

This is not off topic land, believe me or not the media framework should be seen from a general point of view to make each component fit.

Yes advanced rendering should be done someway out of the codecs, also because it’d be hard to make it work correctly without making it more complex over the need. Unfortunately right now it’d be a hole in the water since there are other missing components.

So, since depending on the library and the container codecs can be both in text and bitmap format we want to provide a way for interoperability. Most apps will not want to deal with text subtitles so the idea is more or less this:

We support three formats, bitmap, text, styled text.
When a subtitle is available in text format we want the codec to be able to render it in a bitmap.

The rendering above is just a simple rendering. Basically we want the bitmap format to be always available. Performance is really out of this topic, I’d be happy to discuss in another place…

dhruv · February 24, 2018, 8:39pm

Thank you @jua @Barrett so much for your support. I’ll try to work on your suggestions and submit a code snippet as soon as possible. First I’ll try to extract a separate subtitle file from a video and then obtain a bitmap from it. Please provide me some more suggestions and links to related study materials and codes so that i can understand the problem and solution well and work fast and easily.
Any help would be highly appreciated.

Barrett · February 25, 2018, 12:42am

@dhruv you’re welcome.

I’d say we would be really happy to see a patch that fix a bug of the current ffmpeg plugin.

You can start from there:

https://git.haiku-os.org/haiku/tree/src/add-ons/media/plugins/ffmpeg

Other than that I’d suggest to read the media_kit documentation, our codec API and study how ffmpeg handle subtitles.

You can also find some easy ticket at https://dev.haiku-os.org/wiki/EasyTasks

Feel free to ask for help we are available also in the official irc channel.

PulkoMandy · February 25, 2018, 9:49am

Hi,

Ok, let’s add my own view on this project as well.

Barrett already linked the ffmpeg plug-in:
https://git.haiku-os.org/haiku/tree/src/add-ons/media/plugins/ffmpeg

This is where most of our audio and video decoding is done (except for a few specific formats where ffmpeg was not up to the task - but we can ignore that for now). In particular, the decoding is in AVFormatReader.cpp (extracting data from files) and AVCodecDecoder.cpp (decoding data streams to audio or video).

This is eventually used to implement BMediaFile and BMediaTrack
https://git.haiku-os.org/haiku/tree/headers/os/media/MediaFile.h
https://git.haiku-os.org/haiku/tree/headers/os/media/MediaTrack.h

As the names implies, BMediaFile represents a file with “media” (audio and/or video) contents. It can also be an HTTP stream or a DVD disk or something else we can read and decode data from. A file is made of multiple tracks (usually at least one audio and one video track, but it can be more complex than that - for example there could be multiple audio tracks in different languages).

So, on this side of things, we will need a new kind of BMediaTrack that allows access to the subtitles data. As it was already mentionned, they can in some cases be text or rich text, and in other cases be bitmap images.

Once we have this part (the decoding) working, the other side of the work is to get the subtitles to be displayed over the video. Currently, MediaPlayer has support for showing subtitles coming from a separate file, but these will short-circuit most of the Media Kit - they are handled directly in the MediaPlayer.

I don’t know enough about the media nodes framework to decide which solution is best, so I will let you do your own research on the existing code in MediaPlayer, and decide wether subtitles could be handled just the same as audio and video, or if this is unsuitable and some other solution should be used. One thing to remember is that it would be great if the solution was a generic one: not just for MediaPlayer, but for example also useful for showing subtitles on Youtube videos in the web browser.

It may be interesting to take inspiration from how it is done on other platforms. For example, how is the GStreamer API for subtitles? What fancy things does it allow?

Barrett · February 25, 2018, 10:34am

I’m pretty sure you didn’t mean this but I’m going to clarify to avoid confusing the student. We don’t need a new kind of BMediaTrack in the sense of a derived class, but a new type along the others in the media_kit, e. g. audio, video, subtitles and so on. So when the buffer is read, it will be filled with the subtitles data.

Barrett · February 25, 2018, 11:09am

The whole point of the thing is that, per current design, we don’t want the codecs to be dependent on the media_kit. We shouldn’t generally assume that the programmer wants to use the media_kit at all. The general idea is that decoded data is made easily available through BMediaTrack and then at the upper level someone can easily implement a node, eventually. One of the things to consider is that hardly we can provide a node which can really handle everything a possible app want. Instead most of the apps will want to access just the bitmap to draw on top of the frames, so that’s why a simple rendering should happen in the codecs side.

Having a rendering node would be problematic right now, for example, how you’d manage the position and other settings of the subtitles? There’s no way right now to have a reliable API on top of the hypothetical rendering node. And even if this was resolved, for example using some kind of port protocol, there’s no assurance it can be managed in a way that it can be considered a stable API.

Now, let’s suppose there’s a node for each component of the codec. We’d have a similar chain :

[FileReader] → [Demultiplexer] → [Decoder] → [Filter] → [Consumer]

This is taking into account we want to do that on a single media format. Suppose we want to handle both audio, video and subtitles.

At the Demultiplexer node we’d have 3 “arrows” going out. The audio arrow would go to the system mixer. The Video and Subtitles path will before go in the respective decoders, and then we’d have to overlap the bitmaps somewhere, right? So we need also a kind of video mixer node to do that.

Let me add here that BMediaEventLooper is unsuitable to do something like that because you can’t really manage non-linear paths with it due to the over simplified latency system it has.

Now imagine to use something like that in a complex app like MediaPlayer.

Problems:

How’d you manage synchronization between the frames, since in theory every node has an independent latency. In theory every jitter could make the frames to displace and it’s very hard right now to recover “externally” from such a situation.
It’d work perfectly as a show case, but once the app needs to do something a little bit over the “standard”, the problem arise, the code lies somewhere, no way to access how the nodes do the job.

And last but not less important, each node would have it’s own thread (per current media_kit design). Do you think it’d be an optimal solution computationally? It’d waste a lot of resources. That’s why this idea of “do everything using a node” is completely bad. It can’t work. It may make sense to have system nodes for easily playing wav files, yes, but don’t expect it to be a solution for complex apps.

dhruv · March 6, 2018, 2:37pm

Can i use opencv for that purpose ?

extrowerk · March 6, 2018, 3:40pm

Uh, opencv would be Overkill and unsituated for this Task.

jua · March 6, 2018, 5:48pm

I think you misunderstood the “extraction” part.

When talking about video subtitles, we must first distinguish two types of how they can be in a video:

“Burned in” subtitles (sometimes also called “hard subtitles”) is text that has been put into the video image beforehand, before it was encoded. The text is part of the video image. You cannot hide the text, it’s always visible. Since they’re just part of the video, they don’t need any special support in the video player, and are thus also not subject to the Media Kit subtitle support.
“Soft” subtitles, which are overlayed onto the video at playback time by the video player. The video image itself contains no subtitle text, but the file comes with information on when to display which text. The video player then composites this text live onto the video while it plays. These subtitles can be turned on/off by the user, and the video can contain many subtitle tracks (e.g. several languages) which can be selected. Supporting these soft subtitles is what Media Kit subtitle support is about.

So, “extraction” here means getting the soft subtitle data that comes with a video file. How it comes with the video file can happen in one of two ways:

The subtitle text data is multiplexed into the media container (usually a video file or network stream). Modern container formats like MKV or MP4 support to have tracks of subtitle data alongside the other data tracks in them (which are video tracks, audio tracks, etc). “Extracting” the subtitle text means grabbing it when demultiplexing the container. We use ffmpeg for getting audio and video tracks already, and it supports subtitle tracks as well, so ideally we’d use ffmpeg for getting the subtitle tracks as well.
The subtitle text data is stored in a separate file that comes alongside the video file. “Extraction” in this case means finding the associated subtitle file and reading it.

Both of these methods have their advantages and disadvantages, and both are being used out there. The solution would need to be flexible enough to accomodate both.

dhruv · March 8, 2018, 11:51am

Thank you so much for elaboration. I understood the problem well but i am unable to get a good start on coding part. I have read the media kit document and i am a bit familiar with ffmpeg library. Can you please provide me some papers or something particularly for subtitle extraction part ? Any help would be highly appreciated.

jua · March 9, 2018, 5:51pm

Sorry, I don’t think there’s much in terms of doccumentation specific to subtitle extraction. In general, interesting would be e.g.: the ffmpeg library documentation, the Media Kit docs and example code, the current Media Player code.