The machine I currently multi-box on is quite nice to do this kind of testing since it has 8 physical cores (http://www.apple.com/macpro/).

At first, I set Keyclone to use all 8 cores, at least that's what I thought, from the interface setting. Sure enough when I checked in activity monitor, the first 4 cores and their hyper-threaded counterparts were hammered, and the last 4 were doing nothing.

Then, I forgot about the keyclone interface setting, and set affinity in the config file to use all 16 cores. Some physical cores were at 100%, with some activity on hyper-threaded cores, while some other physical cores were left completely untouched. This doesn't feel like it can be optimal.

So I set affinity in such a way that all 8 physical cores are used, and not hyper-threaded cores. Now the load is much better spread, and no activity happens on hyper-threaded cores before all physical cores get loaded (which does not happen with 5 clients anyway).

So yes, it is a quite subjective opinion, but I would think it is a better option to set affinity only to physical cores (i.e. 0, 2, 4, 6).