Hello Aren,
I've spent all day attempting to optimize 3.0.1 for Flash. My final result is 37.8 fps with 3.0.1 vs 40.2 with 2.6.3.
For everyone, it should be noted that Flash can handle a large number of draw calls just fine - it is script execution causing the fps drops. The optimization that NGUI does to reduce the amount of draw calls is actually a performance hit in Flash

Therefore my methods and results most likely do not apply to other platforms.
The additional things I've tried ( based on what I said above ):
Remove any.Equals() calls and replace them with the simple == operator and replace the .Compare() and the Sort() operations with manual sorting.
As expected the Equality / Inequality / etc. time decreased, however the total frame time stayed the same. i.e.: Adobe Scout wasn't telling me what's really wrong...
At that point I implemented some timed logging to find the bottleneck myself and found it was caused completely by UpdateDrawCalls(), notably the assignment to dt (Transform) and dc(UIDrawCall) at the end. Which is simply doing exactly what it's supposed to...
Finally I took to changing the setup of the GUI in one of our games. I found that the amount of panels you use makes a huge difference in 3.0.1 whereas it did not in 2.6.3.
I removed about 40 panels, of which a lot where nested under another panel.
In the end I went from ~7 fps out of the box to ~12 fps with a lot of hacking / optimization and finally to 37.8 by decreasing the amount of panels.