Patches against trunk | Patch against 0.21-fixes | Patches against mythtv-vid branch

05/11/08

Updated trunk patches for svn revision 18976.

03/11/08

Updated trunk patches - a new, clean set (and single diff) against r18952 are in the trunk-patches directory.

31/10/08

Following the recent commits to the mythtv-vid branch, I've added clean, up to date versions of the remaining mythtv-vid patches to the directory above (all apply cleanly to r18946).

22/10/08

Small update - the glx1.3 hack went missing this morning in the trunk version. Small patch 28a (trunk only) aligns the hack with the version in mythtv-vid.

22/10/08

Another small patch (#28 - Qt4 fixes) for the mythtv-vid branch (Ticket)

Links above for the latest patch sets against fixes (no update for #28) and trunk.

Other than that the opengl renderer now works on the windows build...

19/09/08

Well it's been a while :)

One extra patch (#27 - small fix for software based bobdeint) is up for the mythtv-vid branch (Ticket)

As ever, the version against trunk is here and a single patch for 1 to 27 against 0.21-fixes is here.

I've just had a chance to check out the latest nvidia driver (177.70). Unsurprisingly, nothing has changed. Performance is on a par with previous drivers and there is still no fix for the 'blank' screen when using glx1.3. Maybe next year.

31/07/08

I decided to park working on the early z-culling. I do have a working implementation of an 'adaptive' deinterlacer that falls back to a less GPU intensive deinterlacer if there is significant motion and it estimates there is insufficient time to complete the operation. It uses a combination of early z culling with a depth mask, occlusion queries and OpenGL timers. It's just not ready for prime time and is pretty niche.

Latest 2 patches against the mythtv-vid branch are up.
For completeness, a full diff of 1 to 26 against trunk is here and 1 to 26 against 0.21-fixes is here.

With these latest patches, opengl playback seems pretty much unbreakable for me at the moment. Fingers crossed...

I've alse added the latest performance analysis using the 8800GT and 173.14.12. Yadif generally is faster, though the effective rate has halved when downscaling HD clips - this is because patch 26 added an extra filter stage to ensure deinterlacing still works properly when downscaling. Link

17/07/08

Well, I've pretty much given up on field based rendering - but have learnt a lot in the process :)

Simply rendering the two different fields line by line (even using display lists to speed up the rendering) yields no significant change in performance. I've no idea why.

When I finally got early z culling to work, this didn't produce any performance improvement either. Using a depth mask to render the left side of images without deinterlacing gave a significant speedup but applying the same '50%' reduction on a line by line (ie field) basis didn't change anything. After a lot of digging I discovered that the z buffer doesn't operate at the same resolution as the frame buffer. Trying to mask fields is too granular to trigger early culling and hence no speed up. It might work on ATI hardware.

I did however adapt this to do motion based masking - ie don't call the gpu intensive deinterlacer for static regions of the image. This yields a significant speedup on average - but high motion scenes still need all the grunt (plus the overhead of the extra pass to generate the mask). I'll probably add this as an option for the opengl yadif deinterlacers - which are the only two deinterlacers that benefit (circa 100% speedup on my test clips).

So 3 or 4 patches ready to go soon (hopefully the last!)

09/07/08

Up to date combined patch for 0.21-fixes (untested). 0.21-fixes patch

For completeness, 11 patches against trunk for all the latest changes (the extra patch just aligns formatting etc between trunk and mythtv-vid):-

  1. Tidy up some locking and GL_PACK_ALIGNMENT. Patch 16
  2. Tidy up YV12 packing code and add proper packing/upsampling of interlaced YV12 frames. Patch 17
  3. Re-write of texture handling. . Patch 18
  4. Don't try and upsample interlaced frames when using software bobdeint.. Patch 18a
  5. Rewrite of deinterlacing code.. Patch 19
  6. YADIF. Patch 20
  7. Bicubic. Patch 21
  8. Fix seg fault and leak. Patch 22
  9. Add opengl options. Patch 23
  10. Sync to mythtv-vid formatting. Patch 23a
  11. Disable textures. Patch 24

07/07/08

Uploaded 10 patches to trac - all against the mythtv-vid branchTicket

06/07/08

Reworked some aliasing in yadif fragment program to get it to compile on older hardware (needed to get the number of temporaries down to 32).
Updated 7600GS performance stats (yadif doesn't do too badly!). Link

03/07/08

Added combined diff for all current 23 patches against 0.21-fixes. Diff
Almost there with field based rendering using a depth mask to cull unnecessary rendering. Something still isn't quite right though.
Latest performance figures for NVIDIA8800GT and 7600GS after change to the interlaced packing code. Link
8800GT figures have actually improved slightly (but probably in the noise).

29/06/08

First crack at field based rendering - sizeable performance drop though :(
Interlaced chroma upsampling fixed - shouldn't need the chroma fix in the deinterlacer. Interlaced frame conversion takes about 20% more CPU than progressive though :(

27/06/08

Added comparison of performance of my latest OpenGL build versus trunk. Link
On average an 85% performance improvement when running an 8800GT :)

26/06/08

Wasted a lot of time trying to add some SSE2 packing code before realising OpenGL (Pixel Buffer Objects) won't give me 16 byte aligned memory.

Need to set GL_PACK_ALIGNMENT to 1. Added to patch 1 below.

24/06/08

Added first performance comparison between OpenGL renderer and standard XVideo using CPU deinterlacers. Link

Patches not yet submitted to trac:-
  1. Tidy up some locking.
  2. Packed interlaced frames - fairly important :)
  3. Complete rewrite of texture handling - fixes framebuffer object instability, simplifies deinterlacing, needed for bicubic filter.
  4. Don't do interlaced pack when using CPU bobdeint.
  5. Complete deinterlacer rewrite - field based deinterlacers, combined YUV2RGB and deinterlacing, better motion detection, proper deinterlacing of chroma.
  6. Yadif deinterlacer.
  7. Bicubic upscaling.
  8. Fix memory leak and segmentation fault.
  9. Add opengl options and ability to enable bicubic filtering.
To Do
  1. Field based rendering - 'progressive' fragment program for current field and deinterling frag prog for other. Should give significant performance improvement and can simplify fragment programs further.
  2. Better chroma upsampling for interlaced frames. Added to patch 2 above
  3. Chroma upsampling error for 3:2 material.
  4. Chroma fix (smoothing) within deinterlacer for static areas. Not needed.
  5. Resize filter - always use when display height does not equal video height (regression fix).
  6. Resize filter - remove filter when not deinterlacing; might have performance implications.
  7. Tidy up yadif - doesn't compile on FX5200 and 7600GS.
  8. Check ATI compatibility (and Intel?).
  9. Performance analysis for 7600GS, FX5200 and X1650Pro.
Possibles
  1. Move to GLSL - which may be more hardware portable.
  2. Utilise vertex programs as well to precompute offsets etc.
  3. Hardware decoding... (need to re-write some code first)