Tuesday, 23 September 2014

Wayland and Qt 5.4

Since nobody else has done the honors yet, I'm happy to announce that - as decided at the Qt Contributors Summit this year - support for running applications under a Wayland compositor will be seeing its initial release with Qt 5.4. That is, the QtWayland repository is finally going to stop sitting in the corner, sulking. :)

There's a few "buts", though.

Firstly, it should be noted that support for QWidget-Based applications (and other desktop-based usecases) may be far from ideal, and quality may not be great. This is a consequence of most development on QtWayland having been driven from mobile/embedded viewpoints to date, and is not, in general, an inherent limitation on the windowing system. It's also something of a reflection on Wayland itself, which is only now starting to mature for desktop use (through xdg-shell etc etc.)

tl;dr: Think of this as a technical preview, keep your expectations realistic, and if you want to use it, expect to roll up your sleeves a bit and get dirty from time to time.

Secondly, the QtCompositor API in the QtWayland module (allowing you to write your own Wayland compositor) will not be seeing a release at this time. The API is not frozen, and has not seen the usual polish/quality that you might expect from Qt APIs. As this API is only of use to a limited number of people (those looking to implement an embedded/mobile device, typically, or write their own DE) this should not impact too many people.

tl;dr: If you want to write a compositor, you get to keep both pieces if it breaks. If you want to use applications under an existing Wayland compositor, you're fine.

Future work to QtWayland is largely an open story, but some obvious candidates come to mind:

  • Continued work on xdg-shell support
  • Plugin based window decorations (to enable environment-specific look and feel) this has now landed in the 5.4 branch :)
  • Integration with the rest of Qt's autotests (I spent a while getting tests fixed or at least runnable under window-compositor, but it would be nice to automate this)
  • "Official" subsurface protocol support
If there's something you would like to see happen, here or not, you're more than welcome to pitch in. If you'd like to talk to the other people hacking on QtWayland, please pop by on #qt-lighthouse on freenode, and talk to the folks there :-)

I'd also like to take a moment to thank everyone for their contributions to QtWayland. In particular, I'd like to say thanks to the following, in no particular order (and I'm extremely sorry if I've missed someone, please let me know and I'll happily add you to the list):
  • Kristian Høgsberg & Jesse Barnes, for their initial work on the port, sponsored by Intel,
  • Jørgen Lind, Samuel Rødal, Andy Nichols, Laszlo Agocs, and Paul Olav Tvete for continuing work on it excellently and admirably,
  • Nokia for sponsoring a good deal of the development up until their abrupt departure from the Qt world,
  • Digia for continuing to help out after Nokia left,
  • Andrew Knight, for ably shepherding problems encountered by Jolla for quite a long time,
  • Jolla for sponsoring a large chunk of work on QtWayland (past and present),
  • Gunnar Sletta for rewriting integration with rendering (especially QtQuick), removing a large number of bugs & improving performance,
  • Giulio Camuffo for numerous fixes, improvements and interaction with the wider Wayland community.
As a conclusion, I'd like to note that I'm really happy to see this finally happen - I've wanted it for a very long time now - and for Wayland to keep moving on for bigger and better things. Hopefully, this release will achieve its intended result (that more eyes/hands get exposed to the code, and start to use it, and help out with it).

Labels: , , , , , , ,

Friday, 12 September 2014

profiling is not understanding

When software goes slow, generally, the first reaction is to profile. This might be done through system tools (like Instruments on OS X, perf/valgrind/etc on Linux, VTune, etc). This is fine and good, but just because you have the output of a tool does not necessarily correlate to understanding what is going on.

This might seem like an obvious distinction, but all too often, efforts at improving performance focus on the small picture ("this thing here is slow") and not the bigger picture ("why is this so slow"). At Jolla, I had the pleasure of running into one such instance of this, together with Gunnar Sletta, my esteemed colleague, and friend.

As those of you who are familiar with Jolla may know, we had been working on upgrading to a newer Qt release. This also involved quite a bit of work for us, both in properly upstreaming work we had done on the hurry to the late-2013 release, and in isolating problems and fixing them properly in newer code (the new scenegraph renderer, and the v4 javascript engine in particular have been an interesting ride to get both at once!).

As a part of this work, we noted that touch handling was quite slow (something which we had worked around for our initial release, but now wanted to solve properly). This was due to the touch driver on the Jolla introducing touchpoints faster than the display was updating, that is, while the display might be updating at 57 hz (yes, the Jolla is weird, it doesn't do 60 hz) - we might be getting input events a lot more frequently than that.

This was, in turn, causing QtQuick to run touch processing (involving costly item traversals, as well as the actual processing of touch handling) a lot more frequently than the display was updating. As these took so much time, this in turn slowed rendering down, meaning even more touch handling was going on per frame. A really ugly situation.

Figure 1: Event tracing inside the Sailfish OS Compositor
Figure 1 demonstrates this happening at the compositor level. The bottom slice (titled "QThread") is the event delivery thread, responsible for reading events from evdev The peaks there are - naturally - when events are being read in. The top thread is the GUI thread, and the high peaks there are touch events being processed and delivered to the right QtQuick item (in this case, a Wayland client, we'll get to that later). The middle slice is the compositor's scenegraph rendering (using QtQuick).

With the explanation out of the way, let's look at the details a bit more. It's obvious that the event thread is regularly delivering events at around-but-not-quite twice the display update. Our frame preparation on the GUI thread looks good, despite the too-frequent occurrence of event delivery, though, and the render thread is coping too.

But this isn't a major surprise - the compositor in this case is dead simple (just showing a fullscreen client). What about the client? Let's take a look at it over the same timeframe...

Figure 2: Event tracing for the client (Silica's component gallery, in this case)
Figure 2 focuses on two threads in the client: the render thread (top), and the GUI thread (bottom). Touch events are delivered on the GUI thread, QtQuick processes them there while preparing the next frame for the render thread.

Here, it's very clear that touch processing is happening way too often, and worse than that, it's taking a very long time (each touch event's processing is taking ~4ms), not leaving much time for rendering - and this was on a completely unloaded device. In a more complicated client still, this impact would be much, much worse, leading to frame skipping (which we saw, on some other applications).

Going back to my original introduction here, if we had used traditional profiling techniques, we'd have seen that touch handling/preparation to render was taking a really long time. And we might have focused on optimizing that. Instead, thanks to some out-of-the-box thinking, we looked at the overall structure of application flow, and were able to see the real problem: doing extra work that wasn't necessary.

As an aside to this, I'm happy to announce that we worked out a neat solution to this: QtQuick now doesn't immediately process touch events, instead, choosing to wait until it is about to prepare the next frame for display - as well as "compressing" them to only deal with the minimal number of sensible touch updates per frame. This should have no real impact on any hardware where touch delivery was occurring at a sensible rate, but for any hardware where touch was previously delivering too fast, this will no longer be a problem as of Qt 5.4.

(Thanks to Gunnar & myself for the fix, Carsten & Mikko for opening my eyes about performance tooling, and Jolla for sponsoring this work.

P.S. If you're looking for performance experts, Qt/QML/etc expertise or all round awesome, Gunnar and myself are currently interested in hearing from you.)

Labels: , , , , , , , , , , , , ,