It sounds to me like playback isn't given the highest priority when heavy computation is having to be made (deciding which of the vast number of samples is going to be played depending on how you play or controller input is a pretty daunting task for a realtime process)...
Most of what SA2 has to deal with is primarily monophonic lead sounds (sax, clarinet etc.). If a complex set of behaviors is set to work on a dense polyphonic sounds (like those voices), it is unsurprising that it might have to do a LOT more computation to pull it off. But it should NOT do so at the expense of the timing of the style section. CPU resource allocation needs to be prioritized to achieve this, IMO....
Thing is, this rarely ever is doable at the OS level. This is usually a function of the voice generation VLSIC chip, rather than the OS CPU. I wouldn't expect to see any 'fix' for this. It's likely you'll have to wait for a new generation of voice chips that respond faster, IMO.
_________________________
An arranger is just a tool. What matters is what you build with it..!