Real Time Stereoscopic Streaming

Streaming Under DirectShow

DirectShow is the part of DirectX handling every media operations. DirectX is built on Microsoft COM (Common Object Model), but it is even easier to notice that when working with DirectShow. Everything is component, so everything is COM…
The aim of this report is not to explain COM in detail, however a short comment should be appreciated.
This model is mainly based on interfaces. It allows changing interfaces at runtime, which is a quite useful functionality. For instance, a programmer wants to write a Direct3D 7 compatible software, because he would like to reuse an old code he wrote for direct3d 7. Unfortunately, he installed DirectX 9, and most of the users of his application also use DX9. Thanks to COM, he will not have so much trouble to build his application, because he can “ask” every object to provide a DX7 interface. More concretely, the QueryInterface method is designed for this task.
It is also possible to register COM object in the registry base. In most of cases, it is Windows components. And DirectShow filters belong to this category.
To render a media, DirectShow use a smart connections process: for every media, DirectShow build a graph, insert the appropriate source filter (a file reader, a web cam, a network source etc…), and try to connect the output pin of this filter to the input pin of a renderer, or course inserting transition filters.

For instance, what happens when a user open a MPEG4 file. The media player only knows it is a file, so a file source will be inserted into the graph. This file source will identify the type of media it is, I.e. MPEG4, thanks to some recognition patterns written in the registry base. So the output pin of this filter will be typed as “MPEG4 stream”.
Then the media player will search (once again in the registry base) which filter is associated with MPEG4, and will find the MPEG4 Decoder. At this time, if no filter could be associated with the stream, Media Player displays the well-known message “cannot find the codec”. As soon as this filter is added to the graph and connected to the source filter, it will give “VIDEO” as the main type of its output pin. This way, DirectShow can complete the graph by adding a video renderer, and start playing it. Of course, this is a very simple description, because I didn’t talked about the subtypes negotiation (RGB24, YUV, R5G6B5, etc…) or about how to bind the rendering filters to the main application. Nonetheless, it sum up globally the behavior of Direct Show.
Then… all we had to do was to write a couple of filters.

The RTP Sender

The main part of the media section of this project was to try to make possible for a graph to be split between two distant computers. Beginning the graph on one, and finishing it on the other. Although both the sender and the receiver filter have been developed at the same time, I will start explaining the sender before the receiver.

Setting up the filter

A filter is nothing more than an ATL COM Dll., so in order to create a new filter, the easiest way is to create a new ATL COM Wizard project with Visual C++. This kind of Dll is a bit special, in a way that it has to export specific symbols like “DllRegisterServer” or “DllUnRegisterServer”, and has to comport a template and a template size in order to be registered as a component.
In order to connect a filter to another, DirectShow uses a smart type negotiation. In other words, the first step to develop a filter is to define its pins.
The sender can be seen as a rendering filter, which means it has only input pins and no output.

DirectShow will connect a filter with another depending whether the media types of the two pins to be connected are compatible or not. In the case of the sender, we specified no type, which means everything can be connected to this pin. Since the aim of this filter is to send streams, we didn’t want any restriction at this level.
The filter itself, the media types supported and the pins have to be registered in the registry base. This is done when the filter is registered as a windows component.
Although this operation can be done manually using the registry editor of Windows, it is easier to use regsvr32 to register any control or component.
Here is the part of the code used in the registration of the component. This part has to be present in the DLL source code.


const AMOVIESETUP_MEDIATYPE pintype =
{
&MEDIATYPE_NULL // clsMajorType
, &MEDIASUBTYPE_NULL // clsMinorType
};
const AMOVIESETUP_PIN pins[] =
{ { L"Input" // strName
, FALSE // bRendered
, FALSE // bOutput
, FALSE // bZero
, FALSE // bMany
, &CLSID_NULL // clsConnectsToFilter
, L"" // strConnectsToPin
, 1 // nTypes
, &pintype // lpTypes
}
};
const AMOVIESETUP_FILTER filter =
{
&CLSID_CRTPSendFilter // clsID
, FILTERNAMEL // strName
, MERIT_DO_NOT_USE // dwMerit
, 1 // nPins
, pins // lpPin
};
CFactoryTemplate g_Templates[]= // Class factory template
{ { FILTERNAMEL
, &CLSID_CRTPSendFilter
, CRTCSendFilter::CreateInstance
, NULL
, &filter }
};
int g_cTemplates = sizeof(g_Templates)/sizeof(g_Templates[0]);

As we can see, we first declare the media type we will use, then the input pin, then the filter, and finally the entry point of the dll to create an instance of the filter. (CRTCSendFilter::CreateInstance is this entry point).

Thanks to this mechanism, any application can create an instance of our filter, even if they don’t even know the name of the class.
Calling the COM function CoCreateInstance will load the DLL, if it’s not already loaded, and call the entry point specified in the registration, i.e. CRTCSendFilter::CreateInstace.
Here is an example for creating a filter instance only knowing its GUID:
CoCreateInstance(CLSID_CRTPSendFilter,NULL,CLSCTX_INPROC_SERVER,IID_IBaseFilter,(void**)&sender); (sender is a pointer on IbaseFilter interface, CLSID_CRTPSendFilter is the GUID of the sending filter)

Filter features

Even if the registration seems to be a painful task, it only allow user to call the creating function. If nothing more is implemented, the filter will not be able to be inserted into a graph.
Since we set the CreateInstance, we first have to define this method, which will merely return something like “new CRTCSendFilter”. CRTCSendFilter inherit of CbasRenderer, which is the base class for every rendering filter.
Then we have to overload the “CheckMediaType” method. DirectShow will call this method to ensure a connection is possible or not. At this time of the development, we had a simple filter that could be inserted into any graph, and connected to any other filter, except rendering filters.

So from this time we could start working on the real features of the filter. The purpose of this filter is to put the data into RTP packets and to send it thru the network. But those packets will be useless if the receiver doesn’t know what kind of media it is. So the main function of this filter is to act like a server, and to “answer” two types of queries from the client applications:
– Which media type are you streaming? (Sound, Video, Compressed, Raw…)
– What is the format of your stream? (For instance uncompressed RGB24, RGB32 …)
Of course, the media type and the format are completely unchanged: they are retrieved from the filter connected to the sending filter. For instance, if a simple web cam is connected, the type will be “video”, and the format “RGB24” for instance. On the other hand, if a divx compressing filter is inserted between the web cam and the sending filter, the media type will be “divx”, and the format will contain compression parameters. (Of course it is a bit more complex, but this mere explanation should help understanding).
We had to keep in mind that RTP is over UDP, and UDP is a connectionless protocol, which means there is no real notion of client/server with a UDP connection. To make it easier, any UDP sending application will ask the user where to send packets, and will actually send packet even if the address is not correct, or if the receiver is not online anymore. To sum up, there is no notion of connection. But since we had to negotiate the type of the media, and its format using TCP, which is connection-oriented protocol, we could simulate a connection-oriented protocol with UDP.
Doing this was not a heavy task: once the client connects the TCP port, the sending filter will store his IP address, and will also negotiate a port number to use. Once port number and IP address is acquired, the filter will use them to send RTP data.
The advantage of connection-oriented transmissions is that we can detect disconnection, and we do not need to tell the sender where it should send the data.
Once the media type is sent, the receiver can understand and decode the RTP stream sent to him.
To sum up, the main objective of the sending filter is to split data into RTP packets, and to send the type of stream and the format to any receiving application.

The receiving filter

The RTP receiver is of course the exact opposite of the sender. It will connect a sending filter knowing its address and TCP port, get the media type and the media format, send the local UDP port to use and start to assemble received packet to produce the same buffer as the sender got from its connected filter, and finally transmit this buffer to the connected filter of the receiver. There is no need to detail more about transmission, because it is exactly the opposite of what the sending filter does. However, I will focus on how this filter completes its negotiation, and an additional feature of this filter: being a file source filter.

Finishing the connection process.

The receiving filter is the exact opposite of the sending filter: it is a source filter rather than a rendering filter. That means it only has output pins. When the filter is inserted, the media type of its output pin is not defined. In the case of an input pin, it means every media type is acceptable. However in the case of an output pin, that means this pin cannot be connected to anything.

This filter does not only implement the IBaseFilter COM interface, but also ISpecifyPropertyPage. Thanks to this mechanism, it is possible for any programmer to “query” the component for its property page, and to display it. The way to write a property page is usually Win32 (messages programming), and through this page, we can prompt the user for the information we require: the connection data.

Once the user validates those entries, the receiver tries to connect to the sending filter, using the specified IP address and port number. If the connection is established, the receiver will request the media type, then the format. After that, the output pin will dynamically be adjusted, so DirectShow will be able to complete the graph. The port number to be used on the local computer is sent to the sender during this type negotiation, when requesting for the format. This way, the sending knows which port to use for streaming data. (The filter already knows the IP address of the receiving computer because we use TCP for negotiating the media type).

After establishing a connection to the sending filter, the receiver can set its output pin, so the graph can be completed… and run.

As soon as the graph is completed, it can be run. The main running loop is of course reading RTP packet from the network, and recomposing the same data stream that the one processed by the sending filter. Of course, if the receiving graph is played while the sending one is not, the type negotiation will remain possible, although no data will be sent. The result will be a black window for the receiver.
Thanks to this method, we are now able to build a graph split between two remote computers. That should make easier the task of programmers willing to write a streaming application, because they only need to consider the receiver as a source filter, exactly like a file or like a web cam.

Being a file source filter…

On the previous part we explained the filter is a source filter, exactly like a file or a web cam. Actually the last feature of this filter is that it is really a file source filter. Of course, we can manually insert it into a graph and enter the connection data through the property page, as illustrated before. On the other hand, we can use a custom file type, “.rml” containing all those pieces of information. A bit more technically, the receiver filter also implements the IFileSourceFilter COM interface. And as soon as the file extension is registered in the registry base, DirectShow will try to open every “rml” files with this filter. Of course, we need an upper level association: if the application used to open a .rml file doesn’t use DirectShow, no component will be created. But Media Player, for instance, is based on this technology. So if the user try to open a .rml file with Media Player, a graph will be created, and the file type will be identified, so the RTP receiver filter will be inserted instead of the legacy “File reader” filter.
To sum up, this feature has 2 enormous advantages: first, for the final user, only the codec installation is required. Once it is done, he can receive network streams merely with Media Player. Then for the programmer, he doesn’t need to think about networking at all: for him, it will be exactly like opening a MPEG movie, or any kind of media file. No port number, no IP address, and no protocol. Just opening a file. And most of media programmers are quite at ease with “opening files”.

Leave a Reply

Your email address will not be published. Required fields are marked *