I found this in the documentation you sent:
"Backends
The multimedia functionality is not implemented by Phonon itself, but by a back end - often also referred to as an engine. This includes connecting to, managing, and driving the underlying hardware or intermediate technology. For the programmer, this implies that the media nodes, e.g., media objects, processors, and sinks, are produced by the back end. Also, it is responsible for building the graph, i.e., connecting the nodes.
The backends of Qt use the media systems DirectShow (which requires DirectX) on Windows, QuickTime on Mac, and GStreamer on Linux. The functionality provided on the different platforms are dependent on these underlying systems and may vary somewhat, e.g., in the media formats supported.
Backends expose information about the underlying system. It can tell which media formats are supported, e.g., AVI, mp3, or OGG.
A user can often add support for new formats and filters to the underlying system, by, for instance, installing the DivX codex. We can therefore not give an exact overview of which formats are available with the Qt backends."
It seems they recommend Gstreamer on Linux. That’s fine, however, just like they use DirectShow on Windows, shouldn’t Haiku’s native media_server be used on Haiku? This isn’t Linux, Haiku’s media_server is the native solution here. Sticking to it as much as possible should come a long way in making things consistent and avoiding any possible confusing codec issues. Gstreamer would also add yet another layer, and I’m a bit worried about latency. With that said though, this is just on the long term; I can certainly understand if you want to try easier solutions first.