Talk:VideoGrabberComp
From IGSTK
This page is dedicated to the design-discussion of the VideoGrabber comp.
Please add your comments to excising topics or add new ones.
Goal: An optimal desing agreed upon by key developers that can be presented at the next Tcon.
Contents |
VideoGrabberComp
Naming conventions
- VideoGrabber
- Grabber
- Alternative: API
- Alternative: Communication
- QtMacGrabber, WDMWindowsGraber, etc.
- E.g. should it be QuickTimeMacGrabber instead of QtMacGrabber (Possible QuickTime/Qt (GUI) confusion) Frankl
- Should it be DirectShowWindowsGraber instead of WDMWindowsGraber (API vs. driver model, etc.) Frankl
- Should the platform names be removed, no ambiguity for the DirectShowGraber (DirectShow is only available on Windows), but you could have a QuickTimeMacGrabber and a QuickTimeWindowsGrabber, etc. Frankl
- State Machine names
- Using the Factory mechanism (as VTK) we can do the following:Luis
- Create a platform independent class VideoGrabber
- Create specific implementation classes the depend on the platform and the video source to be used
- The standard video interfaces should be tried first (Video for Windows/Video for Linux)
- Camera/card specific interfaces should be attempted when they have special functionalities that are not available in the platform generic interfaces.
- etc. (please add)
Communication with the Grabber HW
- Do we think that it's ok to use the general api's (QuickTime/DirectShow/V4L) (big advantage) or is it necessary to use the HW specific api's (most grabber HW comes with it's own SDK, something similar to PolarisTracker, FOBTracker, etc.) in order to set/get required information. Frankl
- I think there are pros and cons of using general api for QuickTime/DirectShow
- Pros : ability to handle most HWs
- Cons : Difficulty to handle HW specific special features such as brightness gain and offset adjustment, pseudo-color adjustment in "Live" capture mode Andinet
My suggestion is to have a mix of the two approaches. A general (generic) api's with a design open for HW specific api integration.
Structure of the VideoGrabber Component
VideoGrabber/Tracker Synchronization
- Both streams must be tagged with timestemps
- Temporal calibration is needed to find the lag between the two streams (Spatial calibration is also needed)
- We probably need a temporal calibration procedure, since we don't know where cards gets their time signal from, and we have uncertainty on the delay of transferring data from the card to an IGSTK image. A class similar in spirit to the spatial calibration classes may be useful here. Luis
- Spatial calibration of the video is a more challenging task. If the video device is not tracked using a magnetic or optical tracker, implementing spatial video out of video is a major undertaking. This is what the MicronTracker provides as functionality. Kitware has done some work on this area for other projects, it may be in itself a topic for future funding. Luis
- Synchronization is tightly coupled to the use of the video comp. Frankl
- Direct display of Real-time (RT) 2D (ultrasound) data:
- In a 2D window: (synchronization is not needed if just ultrasound data is displayed, but most likely virtual tool information will be overlaid and the ultrasound data will be fused with other modalities
- In a 3D window: mapping RT2D data on a plane in 3D
- -> In this case the synchronization could be done in the (complex) spatial objects assigned to the different windows.
- In IGSTK currently,the synchronization is driven from the View class, upstream up to the spatial object classes. The View class notifies its source classes of the next time it is planning to render the scene. The source classes will provide the data that fits best that particular time. If no valid data is available for the time indicated by the View class, then the source must indicate this by becoming invisible, or changing colors Luis
- Freehand 3D (ultrasound) reconstruction.
- ->What is the best way to do synchronization here (real-time reconstructing, from memory, from disk)
- Do we need new window classes to handle real-time data or should this be handled by the new spatial objects (that requires both tracking and imaging information)?
- Direct display of Real-time (RT) 2D (ultrasound) data:
- What about the MicronTracker that can supply both a tracking and (two) video streams, how should that system be handled? Frankl
- I suggest to have two independent classes for it. One that talks to the micron device as a tracker, and a separate one that talks to the device as a video source. Luis
Buffering
Double buffer vs. ring buffer
The wiki page discusses both a ring buffer and a double buffer. Which one of the two types is desired? Probably a hybrid buffer that provides features of both.
I imagine that there will be a ring buffer that is an "R" buffer, meaning that all frames in the ring buffer can be read. Outside of the ring buffer, there will be memory allocated for a single "W" frame that the framegrabber will write to. After this "W" buffer has been written by the framegrabber, it can be copied into a frame in the ring buffer. This is only a suggestion, there are probably other ways to combine the features of a ring buffer and a double buffer. Dgobbi 06:56, 22 Mar 2007 (EST) I agree with David, having the ring buffer, there is no much need for a double buffering mechanism. In practice, the manager of the ring buffer will not advertise an image as being available until it has been completely written in the ring buffer. In other words, the ring buffer manager should offer an API that accepts a time as input and produce as output the index of the image in the ring buffer whose time-stamp validity time matches the best the time provided as input. It may be the case that no image matches that requested time, and therefore there should be a mechanism for returning an event that indicates this circumstance. The request will typically come from an image spatial object representation that in its turn has been requested by a View to provide its modify its presentation in order to reflect the state of the scene at a particular specific time. Luis
Format of the buffer: RGB vs. IEEE 1394 IYU2
Eventually, the video component will have to support color images.
Color video signals are predominantly YCrCb and rarely RGB. Conversion from YCrCb to RGB can be done in software, but is a computationally expensive operation. It might be desirable, then, for the buffer to store the video frames in a YCrCb compatible format. The IEEE 1394 (firewire) format called "IYU2" is a 24-bit, full-resolution YCrCb pixel storage format that should be suitable.
Conversion to RGB would be necessary when data is transported out of the buffer for display, since not all VGA adapters have the ability to render YCrCb pixel formats. Dgobbi 06:58, 22 Mar 2007 (EST)
If we have VideoSpatialObject and VideoSpatialRepresentation classes, it might be best if it is these classes, rather than the VideoGrabber class, that does the conversion to RGB. The reason is that YCrCb texture mapping actually is supported on some VGA adapters, and on such hardware we could completely skip the conversion to RGB and hence improve the video display performance. Dgobbi 07:19, 22 Mar 2007 (EST)
For details of the IYU2 format, see http://www.fourcc.org/yuv.php
I would be surprised if the conversion from YCrCb to RGB takes more than a couple of milliseconds. Before getting concerned about performance and the computation time we should run experiments to profile the time that it take to make this conversion for a typical size image (1024 x 1024 ?)(although we are also working with cameras in the range of 2000 x 3000 pixels...). Luis
Component Interface
Get/Set methods
From the meeting discussion Agenda_030807, Minutes_030807.
Which should be used:
- Get/Set methods. Set methods may only be used before RequestInitialize and value testing will only be done when calling RequestInitialize.
- Use Requests instead of Set/Get methods. Values will then be tested for each method, but this will result in more states.
- Put all parameters into a single object and pass this to RequestInitialize.
- Eventually put as many parametres as possible as input to RequestInitialize.
For the VideoGrabber there may be problems with some of these solutions as different hardware means that different parametres may need to be set. A solution may be to have different parametre objects (expanding solution #3) similar to the different grabber objects. Ole Vegard and Geirat
In the cases where specific parameters must be set, then the application has to be made aware of the specific VideoGrabber class that is being used. In other words, instead of using the generic VideoGrabber base class and relying on the factory mechanism for instantiating an object of the correct class, the application developer will have to explicitly instantiate the specific VideoGrabber that matches her/his platform.
Set/Get methods should not be part of the API. They must be implemented using the RequestSet and the RequestGet + Event return mechanism. Even if all the parameters are grouped together in a single context structure, the Set method for that structure should be implemented as a RequestSet method and should be managed by the internal state machine of the VideoGrabber. It is the state machine, for example, the one that should ignore the RequestSet() method if the RequestInitialize() has been previously called and has been successful. Luis
Cropping and Padding of Video
The component interface methods for cropping and padding video should use a more standard format. The current methods, SetVideoOutputClipRectangle and SetVideoOutputPadding, look like the interface methods that I wrote for vtkVideoSource (and which I afterwards regretted).
I propose the following:
- SetVideoCropRegion(width, height, x, y)
- SetVideoPadding(left, right, top, bottom)
For cropping, the "x" and "y" are the offsets of the crop region within the original frame. Dgobbi 07:33, 22 Mar 2007 (EST)
SetTexturePointer
Rather than setting a pointer into which the VideoGrabber will write frames, it might be better to have a method that the display component (i.e. the future VideoSpatialObject) can use to get a pointer to the video buffer, and then the VideoSpatialObject (or its Representation) will be responsible for generating the texture from the video frame.
I understand the reasoning for this method, though: it is mandatory that we have an efficient path between the video grabber and the display. Dgobbi 07:43, 22 Mar 2007 (EST)
This looks like a terrible security breach. Since the argument in favor of this method seems to be a performance concern, I will suggest that we verify first if the concern is justified. There are so many things that can go wrong due to having public methods that accept and return pointers that we have to present a strong justification for such a risky approach. If we have to go down this path, then the method should not provide raw pointer to a buffer. Instead it should receive a reference to an igstkImage, that at least will guarantee that we have a properly allocated and managed data structure. Application developers MUST NOT have access to the internal structures of any IGSTK classes, even less via raw pointers. Luis
Spatial Objects for Ultrasound Display
In addition to the video component, there will have to be a new spatial object class that is capable of displaying the video. The existing ImageSpatialObject is probably not suitable, since it stores a single static image as an ITK "Image".
The igstk VideoSpatialObject will be connected to the VideoGrabber component and should be able to request a video frame to display. It must be capable of playing the video live, and of synchronization of the video with the transforms from an igstk Tracker. Dgobbi 07:13, 22 Mar 2007 (EST)
The video data should probably be considered to be data of the igstkVideoSpatialObject representation rather than data of a videoSpatialObject. The reason is that the video data is time-stamped and therefore it may be different for different Views presented in the GUI. In the context of timing, the video data is akin to the Transforms provided by a tracker. The video data has an expiration time in its time stamp. If the View request the videoSpatialObjectRepresentation class to configure itself for time T, and the VideoGrabber reports that there is not valid image available for time T, then the videoSpatialObjectRepresentation must reflect that circumstance by displaying a blank rectangle or by other visual mean that makes clear that no valid data is available. In the hypothetical case of a 4 views application, the Video data may have to be feed into four independent videoSpatialObjectRepresentation instances, each one of them may end up having a different frame from the video grabber. There are memory, speed and safety trade-off that we will have to analyze here. Luis
