Getting to Know Gstreamer, Part 4

Intro

In my previous post for this series, I showed how I assembled a pipeline that composites subtitles over a video read from an MPEG-4 file, then recodes the result as another MPEG-4 stream for delivery.  

In order to make the subtitles centered and prevent a nasty strip of transparent (checkered) background along the bottom, I had to learn how to place a blocking probe on the subtitle source element, pre-roll the stream, and discover the video dimensions of the original video (what I like to call the "action video" track to distinguish from the mostly static subtitle video track).

In order to preserve the audio track, my pipeline had to decode, re-sample, re-encode it, and then multiplex it with the new video stream.

Next I'll show how I added a fade effect to the subtitle track, and how I plan to encode that for production.

For recap, here is my pipeline so far:
  1. videomixer name=mixer sink_0::zorder=0 sink_1::zorder=1
  2. filesrc location=my-video.mp4 ! typefind name="typefind" ! decodebin name=demuxer ! mixer.sink_0
  3. filesrc location=subtitles.srt ! subparse ! textrender ! capsfilter="capsfilter" mixer.sink_1
  4. mpegtsmux name=muxer
  5. demuxer. ! audioconvert ! audioresample ! faac ! muxer.
  6. mixer. ! x264enc ! muxer. ! some element that accepts an MPEG-4 stream

Fade Effect

Giving subtitles a fade effect requires the ability to manipulate the alpha value of the subtitle video layer.  It's background is already transparent, so if we manipulating the alpha value will see only the subtitle opacity change while the action video continues to play behind it.  

The videomixer sink pads already have an "alpha" property that we could set statically, with "videomixer ... sink_1.alpha::0.5" for example.  This sets alpha to 50% where it will remain until changed.  We want to make alpha change dynamically during subtitle transitions, and make it appear as smooth as possible, as if a human being were controlling it by manipulating a slider on a mixer-board.  For subtitle fade-ins, we want to slide the alpha of the subtitle layer from 0 to 1 and do the opposite for fade-outs.

Dynamically controllable properties are always defined with the GST_PARAM_CONTROLLABLE flag. It should say this in the element's documentation for such a property.  Unfortunately, in the case of videomixer, the alpha property is a property of each sink pad, not the element itself, so it does not show up in the documentation that Gstreamer generates from the element's class.  The videomixer sink pads are instances of GstVideoMixer2PadClass, and its definition assigns the CONTROLLABLE flag to its alpha property.

The Gstreamer API provides control-sources and control-bindings to control dynamically controllable properties.  A control-binding is an object that implements binding.get_value(timestamp).   Elements with dynamically controllable properties always check for the existence of a control-binding for each controllable property.  They use it by calling binding.get_value(timestamp) with the timestamp of each buffer that they process.  Since the value returned by get_value(ts) is always in the range 0..1, the element will translate that value to the properties range.  So for a property whose values may range from 0..255, the element would translate get_value() return value of 0.5 to 128.  The alpha property of videomixer sinks also ranges from 0..1, so the translation in this case is an identity function.

For a fade-in effect that begins at timestamp x and ends at y, we want to set it up a control-binding so that get_value(ts) returns 0 for ts <= x, returns 1 for ts >= y, and returns a monotonically increasing value between 0 and 1 when x < ts < y so that the transition will appear to be smooth and continuous as if a human being were manipulating a slider on a mixer panel. 

The control-binding delegates get_value(ts) to a control-source object that knows how to translate timestamps into output values to achieve the effect we want.  We will create an interpolation-source-control, set it up to interpolate linearly (there are a few other options we won't be concerned with here) and link it to the control-binding.  We then use the control source to set control points as tuples of [ts, alpha] to define the beginning and end of each fade effect.

Here is a snippet of code we could run before the pipeline begins to play that would set the fade-in and fade-out effects for one subtitle beginning at the 2 second mark and ending at 6.5 seconds into the video:

mixer = self.pipeline.get_by_name('mixer')
pad = mixer.get_static_pad('sink_1')
cs = GstController.InterpolationControlSource()
cs.set_property('mode', GstController.InterpolationMode.LINEAR)
binding = GstController.DirectControlBinding.new(pad, 'alpha', cs)
pad.add_control_binding(binding)

# For fade-in beginning at the 2 second mark and lasting one second.
cs.set(2 * Gst.SECOND, 0)
cs.set(3 * Gst.SECOND, 1)

# For fade-out beginning at the 5 second mark and lasting 1.5 seconds.
cs.set(5 * Gst.SECOND, 0)
cs.set(6.5 * Gst.SECOND, 1)

Gstreamer timestamp values are usually in nanoseconds.  Gst.SECOND returns 10**9 to make this more readable.

The Subtitle Frame-rate Problem

We're not there yet.  Remember that elements implement dynamically controllable properties by calling binding.get_value(timestamp) with the timestamp of each buffer they process.  For a content stream whose timestamp changes continuously in small increments of a few milliseconds each, this will result in smooth transitions from one control point to another.  

It took me some time to discover that the textrender element does not deliver a series of buffers whose timestamps increase in small increments.  Instead, it outputs only one buffer per subtitle.  Each Gstreamer buffer has a timestamp and a duration, and videomixer uses these to synchronize buffers from its multiple sink pads before it composites them.  

This makes perfect sense.  Why would we expect an element whose output changes only abruptly and at infrequent intervals generate a stream of identical buffers at frequent intervals when elements downstream already know how to synchronize streams with different rates of change?  That would be a waste of CPU cycles.

The problem is that this breaks our control-binding on the alpha property of the videomixer sink that receives subtitle video buffers.  The element will only call binding.get_value(ts) when a new buffer arrives from textrender.src which only happens once per subtitle.  So we will never see any fade effect.

To get around this, I had to insert a second videomixer element between textrender.src and videomixer.sink_1.  Here is how that top-level pipeline element spec looks now, with the change highlighted:

filesrc name=subsrc location={srtfilepath} ! subparse ! textrender ! capsfilter name="capsfilter" ! videomixer background=transparent ! queue ! mixer.sink_1

This new videomixer doesn't know anything about subtitles, so it creates a video stream with the default frame-rate, so that each output buffer's timestamp increases in small increments from the one before.  This is not very efficient, but now we can see the fade-effects work!

When I realized what was happening, I wanted to see if I could somehow define the control-source and control-binding in a way that would get videomixer to call binding.get_value(ts) where ts is the timestamp of the buffer on the action-video sink.  

Looking at the code again,

binding = GstController.DirectControlBinding.new(pad, 'alpha', cs)
pad.add_control_binding(binding)

we see that pad is given as a parameter to the control-binding constructor, and again as the self argument to add_control_binding().

Therefore, I was hoping that I could write the above as:

binding = GstController.DirectControlBinding.new(subtitle_sink, 'alpha', cs)
action_video_sink.add_control_binding(binding)

to get the videomixer to compare the control-binding control points with the frequently changing action-video timestamps.

This did not work.

I also tried the videorate element instead of the second videomixer.  The top-level pipeline element now looked like this:

filesrc name=subsrc location=subtitles.srt ! subparse ! textrender ! videorate ! capsfilter name="capsfilter" ! queue ! mixer.sink_1

and setting the framerate property on the caps filter to match the action-video stream:

newcaps.set_value('framerate', st.get_fraction('framerate'))

The pipeline no longer played.

Unless I discover a better solution, I will have to continue to use the extra videomixer element that does nothing but convert the frame-rate of the subtitle track.

Obtaining the Fade-effect Control Points

I did not talk about how the app obtains the fade-effect control points.

For development, I had the app pre-parse the subtitle file and extract each subtitles start and end times.  Then I assumed a fixed duration for each fade-in and fade-out and calculated the end points accordingly.

I recently discovered that I can add any text I want after the timestamps for each subtitle cue in the subtitles source file.  subparse will ignore it.  I can now use this to encode custom durations for each fade-in and fade-out effect that override a default.

In An Ideal World ...

In an ideal world, We wouldn't have to composite subtitles on the action-video.  The VTT format would include a way to encode fade effects on subtitles and all players (including the browsers) would recognize and respond to them.  This may still happen, but I don't see it happening any time soon.

Since we do have to composite the subtitle stream over the video-stream to achieve fade effects, the next-best solution would be for me to clone gst-plugins-base, update subparse to recognize a fade-in, fade-out specification for each subtitle cue in the Subrip format, then write a new subparse, say "subparse2" that knows how to use that and also generate a video stream with a frame-rate rapid enough to allow for smooth fade-effects.  Then my app would no longer have to use a control-binding.

This still leaves the problem of modifying the height and width of the subtitle video stream to match the action-video stream.  Given sufficient time and resources, I would clone gst-plugsins-good and add a property with a name like "default-aspect-ration" that takes the name of a sink.  videomixer would then require all the other sinks to match the aspect ratio of the given sink.  Then my pipeline spec would use that to set the video-stream as the default aspect ratio for videomixer, and my app would not have to manipulate the pipeline to set the correct aspect ratio on the subtitle stream.  

I am not sure whether such a solution is compatible with Gstreamer's design.  Gstreamer's authors seem to prefer caps-filters as a way of matching content-stream attributes over properties on sinks and source pads.  I would have to check in with the developer/maintainer community.  Several of my attempts to begin discussions or get questions answered on the Gstreamer developer e-mail list have not received any responses, so this might not either.

Meanwhile, the solution I found works.  It just took me a lot of effort to get to it.

Popular Posts