Author Topic: Performance issues, tracking bottlenecks.  (Read 16484 times)

helmesjo

  • Full Member
  • ***
  • Thank You
  • -Given: 0
  • -Receive: 0
  • Posts: 116
    • View Profile
Performance issues, tracking bottlenecks.
« on: May 19, 2014, 08:55:44 AM »
Note: This is a question forwarded from another developer. Also: I can read the syntax but I don't write shaders, so bare with me here...

Essentially, he has been trying different shaders to find an acceptable frame-rate on Android-devices (3 different that we have in-house). With 2 panels with clipping (at least one nested) and the "Transparent Colored 1"-shader (your shader) the frame-rate is unplayable (20-40, jumping a lot), making tweens etc. stutter. Changing to Unitys build-in shader (Mobile/Particles/Alpha Blended), the framerate caps at 60, and everything draws as expected (from what we can tell) EXCEPT the nested clipping of course. -40ish FPS for just the clipping feels a little bit too much (but what do I know...).

Can it be so that there is other magic happening that we might not need? Is there possibly a "minimalistic" shader with just transparency & clipping, clean wiped from whatever else that might slow it down? Again, android is the big issue. iOS (iPhone 4S & 5) runs pretty smooth.

No, I don't have a clue how the clipping works internally. :)
« Last Edit: May 20, 2014, 04:43:43 AM by helmesjo »

ArenMook

  • Administrator
  • Hero Member
  • *****
  • Thank You
  • -Given: 337
  • -Receive: 1171
  • Posts: 22,128
  • Toronto, Canada
    • View Profile
Re: Performance: Transparent Colored 1 vs Unity Alpha Blended
« Reply #1 on: May 19, 2014, 12:27:29 PM »
The more complex the shader, the lower the framerate, it's to be expected. Shaders that involve clipping are more complicated than shaders that don't. If you turn off clipping, you will likely see the same framerate with alpha blended as with transparent colored.

Nicki

  • Global Moderator
  • Hero Member
  • *****
  • Thank You
  • -Given: 33
  • -Receive: 141
  • Posts: 1,768
    • View Profile
Re: Performance: Transparent Colored 1 vs Unity Alpha Blended
« Reply #2 on: May 19, 2014, 03:29:46 PM »
If you wanna get really hacky and magical, you can use a secondary camera for your clipping with a different size viewport - assuming you don't have too many of those. It's a hell of a hack though, since you have to make your UI inside that camera bigger to account for the smaller viewrect and still fit inside "regular" sized UI.

helmesjo

  • Full Member
  • ***
  • Thank You
  • -Given: 0
  • -Receive: 0
  • Posts: 116
    • View Profile
Re: Performance: Transparent Colored 1 vs Unity Alpha Blended
« Reply #3 on: May 20, 2014, 04:42:53 AM »
The more complex the shader, the lower the framerate, it's to be expected. Shaders that involve clipping are more complicated than shaders that don't. If you turn off clipping, you will likely see the same framerate with alpha blended as with transparent colored.

Alright... Since this is by no means my field, I just gotta ask: What is done so much differently than say in native iOS- & Android-development? Clipping there is per "widget" (UIView for iOS) instead of a dedicated panel, and there is no problem at all having clipping enabled on multiple views (it's on by default for each view). But here, with 2 panels clipping their content, the system runs on its knees... (No bashing, honest question).

Followup question about performance: What _the hell_ is this magic "Overhead" that sometimes eats up 20-50% of the CPU-time (pretty much constantly from time to time)? Got any ideas on how to track down the source of this godzilla of performance-eaters? Or is it just Unity-stuff when profiling, not affecting the apps performance?

We are in the finishing touches of our little 100% UI-based game, and compared to our previous one done in NGUI 2.x (a much larger project), this new one is having serious problems with running at a smooth framerate. Shader aside, UIPanel.LateUpdate & UIRect.OnUpdate are the real performance-killers (profiled on device without "Deep Profile"), together constantly consuming >10% CPU at minimum and gives very little extra room for other components which will cause instantly bad framerate if anything besides UI may need a few ms of CPU-time. Now, looking at UIRect.OnUpdate, I can't really understand what is taking up this time since overloads does pretty much nothing but check some booleans, so may there be something else wrong in our setup (the app is at idle and all of our own scrips are at 0%)? In editor, it basically stays at <0.1%, but on android it's constantly 5-10% (iOS is <1%), 15-20% at times (idling in a menu)... UIPanel is jumping between 5-20% pretty much all the time (idling in a menu). Any common mistakes causing this to happen? Number of draw calls is held below 20 (mostly below 10 if not a lot of textures are displayed in a list), and this is on level with the previous app.
Also, with the new way of handling widgets (from my understanding, theres been a rather big change), Disabling/Enabling widgets is a big Nono!, which in turn will further increase the amount of widgets polling stuff in OnUpdate. In NGUI 2.x we naturally disabled screens not currently visible, and enabled when transitioned in. This is not possible anymore, as it will create massive spikes stretching over multiple frames which will cause any current tweens to be almost completely ignored. What is the recommended thing to do with widgets that's not visible? Alpha = 0? (the latter is what we currently do. Not good enough but much better than SetActive).

Edit. Noticed that the spikes (at least in a quick test on Android) when enabling widgets is due to a massive GC.Collect somewhere within UIPanel.LateUpdate. What is allocated so intensely when enabling a widget with ~10 child-widgets?
« Last Edit: May 20, 2014, 10:59:38 AM by helmesjo »

ArenMook

  • Administrator
  • Hero Member
  • *****
  • Thank You
  • -Given: 337
  • -Receive: 1171
  • Posts: 22,128
  • Toronto, Canada
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #4 on: May 20, 2014, 01:54:15 PM »
Deep Profile never gives accurate results. I've explained it on several occasions before. Think about it... it adds a fixed amount of overhead to every function call. Say 1 ms for argument's sake. If one function takes 10 ms to execute, and is only executed once, then the total cost is 11 ms with deep profile on. Now if you have another function that takes only 0.001 ms to execute, but it's called 1000 times, what you end up with is (0.001 + 1) * 1000 = 1001 ms. So you may suddenly think, "omg! that second function is so expensive!" when in fact, it's just deep profiling at work.

NGUI's clipping is done on hardware (shader). OS clipping is done on software. Hardware is much faster.

Disabling/enabling widgets is fine, but there is some overhead that goes with doing so the first time. When widgets get enabled they have to be sorted and placed in the correct draw order.

The easiest way to improve performance is to not change anything. Ie: if you have some sprite that's always changing its position or alpha, then you are effectively rebuilding your draw calls every frame. Separating this sprite onto its own panel will yield better performance (as that widget's changes won't affect other widgets).

I'd need to know more about what you're doing to give a more accurate advice.

helmesjo

  • Full Member
  • ***
  • Thank You
  • -Given: 0
  • -Receive: 0
  • Posts: 116
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #5 on: May 20, 2014, 03:05:17 PM »
Deep Profile never gives accurate results. I've explained it on several occasions before. Think about it... it adds a fixed amount of overhead to every function call. Say 1 ms for argument's sake. If one function takes 10 ms to execute, and is only executed once, then the total cost is 11 ms with deep profile on. Now if you have another function that takes only 0.001 ms to execute, but it's called 1000 times, what you end up with is (0.001 + 1) * 1000 = 1001 ms. So you may suddenly think, "omg! that second function is so expensive!" when in fact, it's just deep profiling at work.

NGUI's clipping is done on hardware (shader). OS clipping is done on software. Hardware is much faster.

I get the feeling you skimmed though my post very quickly... :)

  • I wrote that I'm NOT using deep profile (since I've seen you explained it before), but thanks anyways for the thorough explanation. So sadly, all my above numbers still apply.
  • Fair enough, but my question was: "How come NGUIs' "Shader-clipping" it's SLOWER than UIViews with clipping in iOS?". I have no clue how they do it (maybe you do?), but to put it in perspective: Every UIView has clipping ON by default (it's per view, there is no such thing as panel), and performance is not an issue. With NGUI, I have 2 panels clipping in the app, and it cuts the app from ~60fps down to ~50fps (I've done some drawcall-optimizations during the day, but still, it's a pretty major drop.)



Disabling/enabling widgets is fine, but there is some overhead that goes with doing so the first time. When widgets get enabled they have to be sorted and placed in the correct draw order.

The easiest way to improve performance is to not change anything. Ie: if you have some sprite that's always changing its position or alpha, then you are effectively rebuilding your draw calls every frame. Separating this sprite onto its own panel will yield better performance (as that widget's changes won't affect other widgets).

I'd need to know more about what you're doing to give a more accurate advice.

About disabling/enabling widgets, how do you mean "overhead only the first time"?
To keep it short, I have a custom table, instantiating enough cells that needs to be displayed and hiding (disabling) cells when they scroll out of bounds (outside the panel handling the table, clipping turned on). Work just like in iOS & Android. When a cell is about to be visible, I enable it. I get epic spikes every time. Enabling a whole screen with say... 50 (?) widgets is so horrendously slow on phone, to the point where the app freezes for seconds (Galaxy S) when I enabled a whole scene. Looking at the profiler, with deep profile turned off, it's all happening in UIPanel.LateUpdate, UIRect- & UIDrawCall.OnEnable amoung others.

I've attached three prints of the profiler (run in editor), the first with the UICamera.Update when a screen is activated (ACEUIButton... just enables the screen and starts TweenPosition), the second with UIPanel.LateUpdate is the frame directly after the first (the one with Loading.ReadObject, donno where that comes from, guessing something with the atlas. I believe it only happens the first time, otherwise it's always GC.Collect slowing it down.) and the third when a single cell becomes visible. The cell is built up of about 10 various widgets (labels, sprites & "invisible" widgets).

If I want to log every time a drawcall is forced to rebuild, where would I do this?
« Last Edit: May 20, 2014, 03:57:55 PM by helmesjo »

ArenMook

  • Administrator
  • Hero Member
  • *****
  • Thank You
  • -Given: 337
  • -Receive: 1171
  • Posts: 22,128
  • Toronto, Canada
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #6 on: May 20, 2014, 04:15:46 PM »
Yup, sorry I read that as you using deep profiling.

You can uncomment line 1436 of UIWidget.cs to find out when certain widgets are being rebuilt.

I don't know anything about how native iOS views are handled, however considering they're done on a level closer to the hardware, them being faster is understandable, and is even expected. C# is also quite a bit slower than C++ / ObjC.

When an NGUI widget is enabled, it has to find its parent panel and register itself with that panel. Then when LateUpdate comes, the panel has to re-sort all widgets based on depth, re-fill the buffer of every widget, then re-create the affected draw calls. This is where the true hit lies.

Nicki

  • Global Moderator
  • Hero Member
  • *****
  • Thank You
  • -Given: 33
  • -Receive: 141
  • Posts: 1,768
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #7 on: May 20, 2014, 04:29:52 PM »
iOS has a separate thread for drawing its UI which explains some of the performance increase. You also don't have the overhead of all of unity to contend with. The native doesn't have any 3d to work in either, so they can make all sorts of optimizations that Unity just doesn't have.

The actual clipping in NGUI/Unity is pretty fast, but moving things around and especially redrawing is fairly taxing. I saw a crystal clear example in subway surfers of this, where the character screen has a bunch of scrolling 3D characters, but they're actually just tracking an invisible 2d grid of UIWidgets underneath - it's faster than having drawn sprites scrolling, because there's no drawing/redrawing required by NGUI that part is cut away entirely and the only calculations are clipoffset and moving ~20 3d models in an update loop.

helmesjo

  • Full Member
  • ***
  • Thank You
  • -Given: 0
  • -Receive: 0
  • Posts: 116
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #8 on: May 20, 2014, 11:55:48 PM »
Yup, sorry I read that as you using deep profiling.

You can uncomment line 1436 of UIWidget.cs to find out when certain widgets are being rebuilt.

I don't know anything about how native iOS views are handled, however considering they're done on a level closer to the hardware, them being faster is understandable, and is even expected. C# is also quite a bit slower than C++ / ObjC.

When an NGUI widget is enabled, it has to find its parent panel and register itself with that panel. Then when LateUpdate comes, the panel has to re-sort all widgets based on depth, re-fill the buffer of every widget, then re-create the affected draw calls. This is where the true hit lies.

NOTE: Please read carefully and answer thoroughly (as best you can), since the only thing delaying our game at this moment is the poor performance on Android, so we really don't have time for these back and forth mini-answers (no offence).

Fair enough! I'll check that debug out. Any thoughts on why Android is pumping UIRect.Update @ 5-10% @ idle? The current problem is that the app is running rather smooth on iOS (iPhone 4S & iPhone 5, UIRect.Update <1% idle), but on Android (good ol' Galaxy S, a Galaxy Nexus & the "new" Nexus 10 tablet) there's major issues with UIRect.Update & UIPanel.LateUpdate being the big difference between the two platforms. Holding one of both in each hand, you can without a doubt feel that something is definitely wrong on Android.

Nicki, TY for the explanation! Unity sure brings a big can of performance-killers to the table, without a doubt.

Edit. About the overhead first time enabling widgets, is there anywhere I can debug that too? Making sure I'm not doing something wrong and this "overhead" happens more than it should.

Edit 2. I've added some custom stuff to UIWidget, but did you mean this: Debug.Log("Fill " + name + " (" + Time.time + ")");? Could you give a brief explanation to when & why this happens? Thanks! BTW! Adding this log crashes the game on start (works if not playing). Any way of debugging something useful on panel-level instead?

Edit 3. I'm also noticing almost a 20fps drop (60fps->40fps) when dragging a scrollview... And when I begin to drag there is a big spike, again because of enabling widgets (the panel has clipping so widgets outside of the bounds are disabled by NGUI when not dragging, & enabled when drag starts). Zzz...
« Last Edit: May 21, 2014, 04:57:29 AM by helmesjo »

ArenMook

  • Administrator
  • Hero Member
  • *****
  • Thank You
  • -Given: 337
  • -Receive: 1171
  • Posts: 22,128
  • Toronto, Canada
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #9 on: May 21, 2014, 11:37:45 AM »
When you start dragging the scroll view by default it causes all widgets within it to become visible so that there are no visibility checks occurring while dragging the content. You can change this behaviour on the UIPanel ("Cull" option).

The Debug.Log you uncommented happens when the widget's contents need to be filled, which happen when something changes and the draw call needs to be rebuilt. It won't happen while dragging the scroll view, but will happen when you enable a widget, change its color, alpha, move it around, etc.

There is no difference on NGUI's side between iOS and Android handling here, so all I can think of is that the difference must be on the debugging side. iOS devices also generally tend to be more powerful than android ones from what I remember, but you should see this without remote debugging turned on. This is especially true looking at the list of devices you mentioned. iPhone 4S and 5 are vastly more powerful than the old Galaxy S, Nexus, and even the Nexus 10 tablet. Those android devices are either old, or aren't meant for fast performance.

UIRect.Update is generally when anchoring gets updated. If you have anchors set to update every Update (default), it can quickly add up -- which is why I advise changing them to OnEnable instead.

helmesjo

  • Full Member
  • ***
  • Thank You
  • -Given: 0
  • -Receive: 0
  • Posts: 116
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #10 on: May 21, 2014, 12:54:26 PM »
When you start dragging the scroll view by default it causes all widgets within it to become visible so that there are no visibility checks occurring while dragging the content. You can change this behaviour on the UIPanel ("Cull" option).

The Debug.Log you uncommented happens when the widget's contents need to be filled, which happen when something changes and the draw call needs to be rebuilt. It won't happen while dragging the scroll view, but will happen when you enable a widget, change its color, alpha, move it around, etc.

There is no difference on NGUI's side between iOS and Android handling here, so all I can think of is that the difference must be on the debugging side. iOS devices also generally tend to be more powerful than android ones from what I remember, but you should see this without remote debugging turned on. This is especially true looking at the list of devices you mentioned. iPhone 4S and 5 are vastly more powerful than the old Galaxy S, Nexus, and even the Nexus 10 tablet. Those android devices are either old, or aren't meant for fast performance.

UIRect.Update is generally when anchoring gets updated. If you have anchors set to update every Update (default), it can quickly add up -- which is why I advise changing them to OnEnable instead.

Alright, thanks!

Actually, I just took for granted that at least the Nexus 10 (which is a somewhat "new" device, < 1 y/o ) but that may not be the case then... I'll see if we can dig up a newer more equally powerful Android-device and compare.

About anchors, no I have no anchors at all (actually I've removed everything to do with anchors in UIRect.Update since I use my own constraint-system), but I'll do some more digging...

How do you recommend "disabling" (stop rendering/updating) widgets if it's really necessary, without completely disabling the object? Widget.alpha = 0f? Or is there another way to avoid rebuilding the drawcall? How is it done internally when out of clipped bounds?

ArenMook

  • Administrator
  • Hero Member
  • *****
  • Thank You
  • -Given: 337
  • -Receive: 1171
  • Posts: 22,128
  • Toronto, Canada
    • View Profile
Re: Performance issues, tracking bottlenecks.
« Reply #11 on: May 21, 2014, 01:39:42 PM »
Yup, alpha 0. You can set the panel's alpha to 0, since it's cumulative.