I see regarding the implementation difficulties, yeah, good thinking to do it the faster running gpu way where possible of course.
Regarding a stencil buffer example, yeah, that would be nice, too.
I know what you mean with when you add support for 2, some will ask for 3, when you add support for 3, some could ask for 4 =)
But yeah, still, i think lots of common scrollview in panel(s) or similar use cases for UI usage should be covered with two.
More than 2 would then be more useful if one would want to support more esoteric things like multiple nested layers of scrollviews or (more common) adding mask sprite support for all sprites (sorta in the vein of flash where one can just add mask layers or set an object as mask object and the other as masked, though there it ran on the cpu of course..)