Use cores 0 and 2 for the first instance and Cores 1 and 3 for the other instances. This works better because of how the chip is laid out and shared resources between those cores.