Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Revision 3. Work in Progress.
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues.
This wiki is broken down into five sections:
I. Synopsis A - Individuals who wants the top 5 recommendations.
II. Synopsis B - Individuals who wants the entire list of recommendations.
III. Guide A (Practical) - Individuals who needs to know why I choose the list of recommendations.
IV. Guide B (Theoretical) - Individuals who needs to know why I choose the list of recommendations, what other recommendations are available in addition to the list and how future improvements in computing technology will affect the list.
V. Improvements needed for this Wiki.
Some of these finding were made through careful observations. While others were through googling the web and the dual-boxing forums for clues and recommendations. Please don't take this as the absolute truth but as stated above as recommendations. Remember that your mileage will vary according to different setups and preferences. Throughout this guide, I try my best to control or limit unknown variables during each observation so that individuals can replicate the same results.
I. Synopsis A:
If you are on a budget and you need to maximize your money for best performance, follow this priority:
1. Vista operating system or any modern operating system that maximizes cache usage of available memory.
2. Maximize CPU frequency
3. Maximize number of cores
4. Maximize available memory
5. Maximize GPU frequency
II. Synopsis B:
If you are unable to read this wiki due to it's technical nature, here are some guidelines to ensure adequate performance (That's assuming you meet the correct CPU/GPU combination and exceed WoW's recommended system requirements):
1. Central Processing Unit (CPU) Bound
Minimum: 1 core per WoW instance
Recommended: 2 cores per WoW instance
2. System Memory Bound
Minimum: 600MB to 1 GB per WoW instance
Recommended: 2 GB per WoW instance
3. System I/O Bound
Minimum: None
Recommended: NUMA
4. Graphic Processing Unit (GPU) Bound
Minimum: One CPU core per GPU node per WoW instance.
Recommended: Two CPU cores at maximum frequency per WoW instance with the most powerful graphics node.
5. Storage System Bound
Minimum: Symbolically (Vista) or junction (XP, 2K, or NT) link all WoW folders.
Recommended: Symbolically or junction link all WoW folders. Ensure that each copy of WoW has 2GB of memory. Load as much system memory as your motherboard is capable of handling.
6. Network Processing Unit (NPU) Bound
Minimum: Ensure your ISP connection's upload speed >= number of WoW instances * 56Kbps upload speed
7. Software (Operating System) Bound
Minimum: 32-bit operating system for 2 or 3 WoW instances at full spec or 4+ WoW instances running at minimum spec.
Recommended: 64-bit operating system for 4+ WoW instances running at full spec.
8. Software (Driver) Bound
Minimum: Latest available non-beta drivers.
III. Guide A (Practical):1. Introduction:
This part of the guide is an exhaustive research into multiboxing with multidevices (multicpu, multigpu, multiharddrive, multimonitor) on one computer (for multiboxing with multicomputer see xzin's guide at Multiboxing Wiki - My 5 (Now 10!) Boxing WoW Writeup). My technical guide will give you a basic understanding of the challenges of selecting the proper hardware and software components and the difficulties of troubleshooting a one PC setup. This guide contains my long-term observations of how multi-instances of WoW has affected my computer performance and what changes I made to ensure adequate performance.
In the list of terms, I break down each computer components into computational nodes. Please keep this in mind throughout the guide as this plays an important aspect in how computers operate and how future technology developments or paradigm shifts in programming will affect it.
I will be using my setup as an example. WoW1 is my main instance in which most of my actions are performed from. WoW2-WoW4 are my dps instance. WoW5 is my healer instance. My setup is not overclocked. I am using a clean (non-modified) Vista install with the latest patches and minimum installed third-party software. My current setup is as follows:
Processor:
2x AMD Opteron 2376 2.3GHz
Each processor has four DIMMS populated.
Motherboard:
Tyan Thunder n6650e (2915-E) BIOS v3.00
Nvidia nForce Driver v15.16
Memory:
4x4GB ECC Registered DDR2 667MHz
4x2GB ECC Registered DDR2 400MHz
Memory runs at 400MHz.
Graphics:
2x Nvidia GTX 280
Nvidia Geforce Driver v181.22
Hard Drive:
150GB Western Digital Raptor 10K RPM
74GB Western Digital Raptor 10K RPM
Monitor:
4x Dell 3007WFP (2560x1600)
Network:
Time Warner Cable Road Runner Turbo (Up: 2000Kbps / Down: 15000Kbps)
Time Warner Cable Road Runner Standard (Up: 512Kbps / Down: 7000Kbps)
Time Warner Cable Road Runner Basic (Up: 384Kbps / Down: 3000Kbps) - Not Tested.
Time Warner Cable Road Runner Lite (Up: 128Kbps / Down: 768Kbps)
Software:
Windows Vista 64-bit Ultimate Edition SP1
My previous setup is as follows:
Processor:
Intel Core 2 Extreme QX6700 2.66GHz @ 3.2GHz
Motherboard:
EVGA 750i SLI
Nvidia nForce Driver v15.23
Memory:
4x2GB DDR2 800MHz
Graphics:
2x Nvidia GTX 280
Nvidia Geforce Driver v181.22
Hard Drive:
150GB Western Digital Raptor 10K RPM
74GB Western Digital Raptor 10K RPM
Monitor:
3x Dell 3007WFP (2560x1600)
Network:
Time Warner Cable Road Runner Turbo (Up: 2000Kbps / Down: 15000Kbps)
Time Warner Cable Road Runner Standard (Up: 512Kbps / Down: 7000Kbps)
Time Warner Cable Road Runner Basic (Up: 384Kbps / Down: 3000Kbps) - Not Tested.
Time Warner Cable Road Runner Lite (Up: 128Kbps / Down: 768Kbps)
Software:
Windows Vista 64-bit Ultimate Edition SP1
This guide is broken down into seventeen sections:
1. Introduction
2. List of Terms and References
3. Central Processing Unit (CPU) Bound
4. System Memory Bound
5. System I/O Bound
6. Graphic Processing Unit (GPU) Bound
7. Physics Processing Unit (PPU) Bound
8. Artificial Intelligence Processing Unit (AIPU) Bound
9. Audio Processing Unit (APU) Bound
10. Storage System Bound
11. Network Processing Unit (NPU) Bound
12. Software (Operating System) Bound
13. Software (Driver) Bound
14. Software (Application) Bound
15. Man-Machine Interface (MMI) Bound
16. Conclusion
17. Unknown Issues
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Continuation.
2. List of Terms and References:
What is bounded? Bound (Using CPU Bounded also applies to other bounding issues.) as defined by wikipedia.org:
In computer science, CPU bound (or compute bound) is when the time for a computer to complete a task is determined principally by the speed of the central processor: processor utilization is high, perhaps at 100% usage for many seconds or minutes. Interrupts generated by peripherals may be processed slowly, or indefinitely delayed.
The concept of CPU bound was developed during early computers, when data paths between computer components were simpler, and it was possible to visually see one component working while another was idle. Examples components were CPU, tape drives, hard disks, card-readers, and printers. Computers that predominantly used peripherals were characterized as I/O bound. Establishing that a computer is frequently CPU bound implies that upgrading the CPU or optimizing code will improve the overall computer performance.
With the advent of multiple busses, parallel processing, multiprogramming, preemptive scheduling, advanced graphics cards, advanced sound cards and generally, more decentralized loads, it became less likely to identify one particular component as always being a bottleneck. It is likely that a computer's bottleneck shifts rapidly between components. Furthermore, in modern computers it is possible to have 100% CPU utilization with minimal impact to another component. Finally, tasks required of modern computers often emphasize quite different components, so that resolving bottleneck for one task may not affect the performance of another. For these reasons, upgrading a CPU does not always have a dramatic effect. The concept of being CPU bound is now one of many factors considered in modern computer performance.
Source: http://en.wikipedia.org/wiki/CPU_bound
What is Amadal's Law? Amadal's Law as defined by wikipedia.org:
Amdahl's law is a model for the relationship between the expected speedup of parallelized implementations of an algorithm relative to the serial algorithm, under the assumption that the problem size remains the same when parallelized. For example, if for a given problem size a parallelized implementation of an algorithm can run 12% of the algorithm's operations arbitrarily fast (while the remaining 88% of the operations are not parallelizable), Amdahl's law states that the maximum speedup of the parallelized version is 1/(1 - 0.12) = 1.136 times faster than the non-parallelized implementation.
Source: http://en.wikipedia.org/wiki/Amdahl%27s_law
What are nodes? Node as defined by wikipedia.org:
A node In the Unified Modeling Language (UML) is a computational resource upon which UML artifacts may be deployed for execution. [1]
There are two types of nodes: device nodes and execution environments.
1) A device represent hardware devices: a physical computational resource with processing capability upon which UML artifacts may be deployed for execution. Devices may be complex (i.e., they may consist of other devices).[1]
2) An execution environments represent software containers (such as operating systems, JVM, servlet/EJB containers, application servers, portal servers etc.) This is a node that offers an execution environment for specific types of components that are deployed on it in the form of deployable artifacts.[1]
Execution environments can be nested. Nodes can be interconnected through communication paths to define network structures. A communication path is an "association between two Deployment Targets, through which they are able to exchange signals and messages".[1]
Source: http://en.wikipedia.org/wiki/Node_(UML)
In the case of this guide, nodes are defined computational units that have their own memory (generic term) or workspace which is only accessible to that computational unit (in a very generic and abstract sense). Node areas are defined below as:
CPU Node: As in this case, the Opteron is a good example of a computing node with it's own memory space. For the Core 2, we are going to stretch the example to include the motherboard.
GPU Node: Another good example, where the graphic cores have their own memory space.
PPU Node: Since I don't own a Ageis PhysX card and they were later on bought out by Nvidia, we are going to use a generic example. No matter where the physics code is running on we are going to assume it has it's own memory space, whether it be physical (i.e. Ageis PhysX) or virtual (i.e. Nvidia PhysX on the GPU or Havoc on the CPU).
AIPU Node: No known physical implementation, however like the above we are going to assume a generic example (i.e. AI code on the CPU).
APU Node: Certain sound cards have it's own memory space (i.e. Creative Labs' X-Fi Fatal1ty Pro Series or higher).
NPU Node: Certain network cards have it's own memory space too (i.e. KillerNIC).
MMI Node: That's pretty much the user.
What are interfaces? Interface as defined by wikipedia.org:
Interface generally refers to an abstraction that an entity provides of itself to the outside. This separates the methods of external communication from internal operation, and allows it to be internally modified without affecting the way outside entities interact with it, as well as provide multiple abstractions of itself. It may also provide a means of translation between entities which do not speak the same language, such as between a human and a computer. Because interfaces are a form of indirection, some additional overhead is incurred versus direct communication.
The interface between a human and a computer is called a user interface. Interfaces between hardware components are physical interfaces. This article deals with software interfaces, which exist between separate software components and provide a programmatic mechanism by which these components can communicate.
Source: http://en.wikipedia.org/wiki/Interface_(computer_science)
Interface areas are defined below as:
User Interface
Physical (Hardware) Interface
Software Interface
References used through the Guide?
1) World of Warcraft Hardware Guide:
http://www.anandtech.com/video/showdoc.aspx?i=2381
2) World of Warcraft - Burning Crusade, Hardware Guide:
http://www.gamespot.com/features/6164252/index.html
3) World of Warcraft - Wrath of the Lich King, Hardware Guide:
http://www.gamespot.com/features/6202200/index.html?tag=feature;sidenav
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Continuation.
3. CPU Bound:
Observation 1:
While in Outland and also in the original WoW, overclocking my 4 core setup from 2.66GHz to 3.2GHz resulted in a modest 5-10 fps increase in each of the WoW instances.
Observation 2:
For the 8 core setup, all WoW processAffinityMask are manual configured in the WTF as followed:
WoW1 - SET processAffinityMask "15"
WoW2 - SET processAffinityMask "240"
WoW3 - SET processAffinityMask "240"
WoW4 - SET processAffinityMask "240"
WoW5 - SET processAffinityMask "15"
For the 4 core setup, all WoW processAffinityMask are manual configured in the WTF as followed:
WoW1 - SET processAffinityMask "15"
WoW2 - SET processAffinityMask "15"
WoW3 - SET processAffinityMask "15"
WoW4 - SET processAffinityMask "15"
WoW5 - SET processAffinityMask "15"
Note 1: WTF Directory - C:\World of Warcraft\WTF
Note 2: WoW by default (after installation) writes your WTF file with SET processAffinityMask "3". On loading of WoW, WoW writes your WTF file upon detection of number of cores available with SET coresDetected "X". X being the number of cores detected.
Note 3: Leaving SET processAffinityMask "3" as is would have resulted in only cores 0 and 1 being overtaxed with too many WoW instances. By following the above setup, this would more evenly load balance your WoW over all eight cores or four cores.
Note 4: In terms of the Core i7, any Core i7 derivative that supports hyperthreading or even any older generation platform that supports hyperthreading, I do not have any of these setups except for an old Pentium 4 which is broken, so I cannot test on whether or not WoW benefits from hyperthreading. However, there are numerous web articles that review hyperthreading and show whether or not certain applications and games benefit from hyperthreading.
For more information on Core I7 and hyperthreading, please look at these sites:
[H]ardOCP - http://www.hardocp.com/
Anandtech - http://www.anandtech.com/
Arstechnica - http://arstechnica.com/
OCAU - http://www.overclockers.com.au/
The Tech Report - http://www.techreport.com/
Tom's Hardware - http://www.tomshardware.com/us/#redir
For more information on Core i7 and in-depth technical discussions, please look at these sites:
Arstechnica - http://arstechnica.com/
The Tech Report - http://www.techreport.com/
If you cannot understand what the above website states pertaining to hyperthreading, then here is the summary in a nutshell:
Hyperthreading duplicates certain resources (the instruction and date executor is not one of them) located on the core. Hyperthreading benefit certain applications and games but under three conditions which are:
1. Instructions that require data have a high probability of stalling (i.e. databases).
2. Multiple instructions being executed at the same time require mixed resources (i.e. floating point vs integer).
3. Thread scheduler / manager (whether it be a application or operating system based) is smart enough to schedule instructions in the most efficient manner (to keep cores busy, dealing with statement 1) with minimal resource conflict (to keep cores busy and not competing for similar resources, dealing with statement 2).
Pertaing to WoW, this application requires a mix set of resources. For this example we are going to be talking about floating point and integer calculation operations. For a Core i7 with 4 cores and 8 threads (hyperthreading enabled), here are some example conditions where hyperthreading may affect WoW performance:
Condition(s):
1. Running all WoW instances with a core mask of 255 in conjunction with other CPU intensive applications (i.e movie or music player) in the background. Any modern operating system thread scheduler is design to allocate resources as efficiently as possible. However, when the system is put a load test and depending how big the load is, performance degradation is possible. In the case of multiple instances of WoW and a movie player, where all other variables are held to a constant, there will be execution contention on the cores. Since both applications use floating point calculations, there is a high probability that the arithmetic logic unit (ALU, using a generic term) will be saturated. How you say? The thread scheduler will first try to push the above application threads to any available real cores, once those cores are saturated, then the thread scheduler will try to push the threads to the virtualized cores. However, remember earlier that the hyperthreading does not duplicate all resources and one of those resources that is not duplicated is the executor (i.e. where calculations are done, using a generic term). So basically you have two threads competing for the same resource.
2. Running all WoW instances with a core mask of 255. In this situation, it's a little more murky in terms of whether or not performance would be affected. However, if you have the unfortunate luck of having the thread scheduler allocate both instance of WoW on the same core (i.e. WoW1 is on real core 0 and WoW2 is on virtualized core 0 which is real core 0), you will take a performance hit. But in most cases, most operating system's thread scheduler are design to limit this possibility under the condition that no other (non-WoW) application is competing for the same resource, you do not overload your cores with too many WoW instances, and mathematical probability (i.e. luck).
With luck and everything else in life, your mileage will vary according to your unique setup.
Continuing on with observation 2, I did some long-term comparisons between different processAffinityMask between loading a single instance of WoW on one core mask, two core mask, and four core mask. Observing the windows task manager over an average 4-8 hour gaming session showed no noticeable performance difference whatsoever. Going from a single core mask to a dual core mask offered a slight performance improvement. Going from a dual core to a quad core offered no performance improvement.
The one thing I do want to point out is a correlation with CPU usage and core mask. The formula I have developed is as follows:
CPU usage for 1 WoW running on 1 core = (CPU usage for 1 WoW / number of core mask) * number of core mask
No matter how many core masks a single instance of WoW received, this would always resulted in CPU usage equaling one core at full utilization. However, I did not test a core mask where all 8 cores were masked due to the non-uniform memory architecture (NUMA) of the dual quad core processor. Using a core mask above 4 would have resulted in lower performance due threads bouncing back and forth between the two processors and memory requests crossing over the hyper-transport (HT).
Observation 3:
At the time I had my 4 core setup and was currently still in Outland, my average framerate was 30 fps for each of the WoW instances. However, once I stepped into Zangarmarsh (but most notably when I got to the Horde flight point), all WoW instances fps dropped to 15-20 fps or lower. Mouse control and input was very sluggish. I tried every combination of core mask for each WoW instance, but that did not alleviate my issue. Only after upgrading to a dual quad core did this issue disappear. Afterwards, the only determination I could guess was related to WoW5 instance overloading WoW1 thru WoW4 due to my setup only having 4 cores available.
Observation 4:
Using the 4 core setup, whenever I minimized one or more WoW instance, I noticed on the performance monitor a large drop in CPU utilization and overall improvement on system response. After upgrading to 8 core setup, the performance monitor showed the same result but no change in system response due to more available cores. Getting back to Observation 3, the way I made that determination was by minimizing WoW5, which resulted in system response improvement.
Observation 5:
No matter which zone I was in or whenever I visited Dalaran, my CPU utilization always remained at a constant rate.
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Continuation.
4. System Memory Bound:
Observation 1:
All WoW instances are compared between the two following settings:
Setting 1:
Video Settings -> Resolution Tab:
Resolution:1920x1200 (Wide)
Multisampling:24-bit color 24-bit depth 8x multisample
Refresh:60Hz
Vertical Sync:Unchecked
Hardware Cursor:Checked
Reduce Input Lag:Unchecked
Windowed Mode:Checked
Maximized:Unchecked
Video Settings -> Effects Tab:
Video Quality:Custom (Everything high minus the Shadow Quality: Low)
Setting 2:
WoW2-WoW5 instances are set as follows:
Video Settings -> Resolution Tab:
Resolution:1920x1200 (Wide)
Multisampling:24-bit color 24-bit depth 1x multisample
Refresh:60Hz
Vertical Sync:Unchecked
Hardware Cursor:Checked
Reduce Input Lag:Unchecked
Windowed Mode:Checked
Maximized:Unchecked
Video Settings -> Effects Tab:
Video Quality:Custom
View Distance:High
Terrain Detail:Low
Spell Detail:High
Environmental Detail:Low
Ground Clutter Detail:Low
Ground Clutter Radius:Low
Shadow Quality:Low
Texture Resolution:Low
Texture Filtering:Low
Weather Filtering:Low
Video Settings -> Effects Tab -> Shaders:
Specular Lighting:Unchecked
Full-screen Glow Effect:Unchecked
Death Effect:Unchecked
Note 1: The overriding factor in determining how much memory is needed is based off of the graphic settings for each of the WoW instances.
For the highest setting, my ram usage was above and stayed above 1GB for each instance of WoW. For the lowest setting, my ram usage bounced between 700MB to 1GB for each WoW instance.
The only danger here is when the following happens:
(WoW video setting * number of WoW instances) memory usage >= available memory per processor node.
Like I stated earlier, performance is affected when memory request starts crossing over the HT to the other processor node. To minimize this effect, I recommended to keep memory usage below the memory available per processor node. Another thing to keep in mind is that processor affinity instructs the operating system to keep certain programs on certain processor node. By doing this ensures that your program will not cross another processor node unless all programs with processor affinity set to the specified node has memory requirements exceeding available memory on the node.
Observation 2:
Loading my system 8 core system with 24GB of system memory resulted in no speed improvement given that I met most of the recommended requirements for each WoW instance. On the other hand with each WoW instance memory usage hovering between 1 GB and 1.5 GB and having 10 GB+ of system memory available for caching resulted in minimal load times when traveling and zoning. Long term observation showed little or no hard disk activity. However, once entering into Dalaran my memory usage was at it's highest, about 1.25 GB for each WoW instance. There was no lag or lack of system response. However, there was a drop in frame rate and it is due to a heavy load on the CPU with the new city design, proximity of different zones and number of characters located in a confined area.
Observation 3:
Going off observation 2, taking the 8 core system's ram and dropping the 400MHz ram for the 800MHz ram resulted in no noticeable performance improvement.
5. System I/O Bound
Update in next revision.
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Continuation.
6. GPU Bound:
My monitor and GPU setup is as follows:
Monitor A, B, C, D (each running at 2560x1600)
GPU 0 and GPU 1 (non-SLI)
Monitors arrange from left to right on the same level of vision.
A B C D
GPU 0 drives monitor A and B.
GPU 1 drives monitor C and D.
WoW1 and WoW5 appears on monitor B.
WoW2 appears on monitor C.
WoW3 and WoW4 appears on monitor D.
WoW grid coordinates and sizes:
Instance - X Coordinates - Y Coordinates - Resolution
WoW1 - 2560 - 0 - 1920x1200
WoW2 - 7680 - 0 - 1920x1200
WoW3 - 8320 - 400 - 1920x1200
WoW4 - 5120 - 0 - 1920x1200
WoW5 - 3200 - 400 - 1920x1200
Note 1: In a MS windows environment, the screen coordinates are defined as primary monitor with the top left hand corner of the screen defined as coordinates 0,0 with the numbers increasing in a down and right fashion. If you switch the primary monitor to a different monitor, then the new primary monitor with the top left hand corner of the screen would be defined as coordinates 0,0 with monitor(s) to the left and above of the primary monitor being in the negative coordinate space.
Note 2: In Vista, you cannot mix and match different GPU vendors (not the card vendors but the GPU processor manufacturers, i.e. AMD ATI, Nvidia, Intel, S3 or etc). By design Vista will prevent you or it will revert to generic drivers to ensure system stability. In the previous operating systems, you could mix and match different vendors, however based on MS long-term bug-report and crash analysis, they determined a majority of the stability issues were related to the GPU drivers (I do not have the link to it, I will try to google it for the next revision). In terms of mix and matching different GPU families (i.e. Nvidia GTX 200 series with Nvidia GeForce 8000 series), you can but it is very dependent on whether or not the latest drivers support both families at the same time. Vista limits you to one set of drivers for all video cards. If your video drivers are unified and supports the video cards currently in your machine, then you are good.
Note 3: My current 8 core setup does not have a mix and match different GPU family, so I can't tell you with concrete facts if it would work or how fast it would be. Depending on available time, I have an old Nvidia GeForce 8800 GTX that is floating as a spare part that I could use with my current setup and substitute one of the Nvidia GTX 280 series with it and test this out.
Note 4: For applications to benefit from CrossFire / SLI, three conditions must be met:
1. Operating System support
2. Driver support
3. Applications specifically written to taken advantage of a CrossFire / SLI
Condition 1 and 2 are already met. However in terms of WoW, condition 3 is not met. WoW is not specifically written to take advantage of a CrossFire / SLI setup. This is probably due to programming cost of building, testing, and maintaining that support if they so choose to add it. Even if they were to add it in a future version of WoW, there will two sets of source codes to maintain and over the long-term would add up in cost and the man-hours. This leads to another issue of coarse vs fine multi-threaded issues which I will cover in the next revision.
Drivers and the driver control panel can to some extent force WoW to enable the CrossFire / SLI support, but that would be stretching it.
Note 5: Spanning displays do not offer performance improvement. Monitor(s) connected to the same GPU (non CrossFire / SLI) still have the same GPU rendering size of workload = number of monitors * each monitor's workload. However, if the drivers provide support for rendering to multiple monitors and multi video card, there is no bus contention or bandwidth issues between the said video cards, and the application is written to take advantage of this setup, then there is a possibility for performance improvement.
Note 6: There's been unconfirmed reports of certain CPU/GPU combination and the Nvidia GTX 200's graphic cards not reaching it's maximum performance due to the Nvidia graphic drivers not being optimized for a 8 core setup or 4 core 8 thread (hyperthreading) setup. This may be due to a lag time between new hardware introduction and drivers developed and optimized for said hardware.
Observation 1:
With a muti-monitor setup, I wanted to be able to see all instances of WoW on different screens. However, every time a WoW instance that crosses from one GPU zone to another GPU zone resulted in the WoW instance's graphic being software rendered instead of hardware rendered. Case in point, using the above example, if WoW1 which is seen on monitor B which is driven by GPU 0 and was to be click and dragged to either monitor C or D, would have resulted in the framerate dropping to below 10 fps. Moving it back to the original windows restored it to 60 fps.
To compensate for the above issue, I had to adjust when and where I loaded my WoW with which monitor and GPU. By adjusting the primary display in the vista display control panel solved this problem and ensured an equal GPU load balance on both GPUs. The techniques is as follows:
Load Sequence:
1) Primary display is set to monitor A or B.
2) Load WoW1 instance and display on monitor B.
3) Load WoW5 instance and display on monitor B.
3) Switch primary display from monitor A to monitor C or D.
4) Load WoW2 thru WoW4 instances on monitor C or D.
5) Switch primary display from monitor C or D to monitor A or B.
6) Use software to reposition WoW window instances. WoW1 and WoW5 on monitor B. WoW2 on monitor C. WoW3 and WoW4 on monitor D.
During the above load sequence, your WoW instances would be moved around due to changing primary display. Your mileage may vary on where they would finally end at, but executing step 6 would reposition the windows to their correct positions.
Observation 2:
For SLI, running 1 instance of WoW did not show any noticeable improvement and at times lowered performance. Running multiple WoW instances on SLI also resulted the same thing.
Observation 3:
In the memory bound section, using the first settings (high quality) for each respective WoW instance and testing it at Dalaran, my average fps for all WoW instances were between 10-15fps. The only thing noticeable was lowering the shadow quality improved average fps by 5 fps for each WoW instance.
7. PPU Bound:
Update in next revision.
8. AIPU Bound:
Update in next revision.
9. APU Bound:
Update in next revision.
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Continuation.
10. Storage System Bound:
Hard drives are setup as follows:
150GB raptor contains OS and one main WoW folder with four WoW symbolically linked to the main WoW folder (the symbolically linked folder each has there own WTF configure file).
74GB raptor contains 24GB page file.
Note 1: There's been a lot of talk about using a Solid State Device (SSD) for certain performance improvement. Now, I will agree that SSD offer load times improvement. However, I will not go out of my way to by the best or even a mediocre and cheap SSD. Here's why:
Reasoning(s):
1. Hierarchical caching system. A computer is by design multiple levels of abstraction and hierarchy. Each level is designed to hide the complexities in the lower levels. Case in point, the CPU core is at the highest point on this hierarchy. Every time it puts a request for data, it does not care where the data comes from and only that it gets it data. Keeping in this mind, a multi-tier caching system is design to hide the complexity of where the data is coming from and how difficult it is to retrieve such data. You hope that the data is found in the space closest to the CPU (and in most cases, this is true due to locality). You also hope that the data that is not found in the space closest to the CPU (and in most case, you minimizing this with locality and caching algorithms.), will be found in the next space closest to the CPU.
Also keep in mind the issue of performance vs cost and they are a one-to-one correlation. The more performance you need, the more you need to pay up for it. However, like all WoW tanking stats are subjected to diminishing returns (DR) so is maximum performance.
Example:
The above example may be a little extreme but does make a point that as you get further and further away from the CPU, your relative performance increase subjected to diminishing returns. In addition increasing the each space size and retrieval rate does offer some performance improvement and additional cost, but depending on the code, it will reach a point where there is no performance improvement. But by using a multi-tier caching system, you have the best compromise in performance and cost.
2. Vista or any modern system that puts a strong emphasis on maximizing caching of available system memory.
Getting back to the issue of whether or not an individual should buy a SSD for performance improvements should reconsider for other reasons:
Other reasoning(s):
1. Limited acoustic noise.
2. Limited thermal dissipation.
3. Compactness.
4. Limited memory for caching purposes (this is a stretch) whether it be due to hardware limitation (i.e. Pentium III), software limitation (32-bit operating system), or budgetary reasons.
5. Load times (This is also a stretch too. WoW does not spend most of it's time loading data. Performance improvement would be dependent on how often you load data that is needed and not available in cache. Look at other reasoning 4.).
Observation 1:
Based on long-term observations, the only time the hard drive accessed anything was during boot up, shut down, loading WoW, quiting WoW, loading new area/zone and loading dungeon instance. Other than that, there was little or no hard drive access.
11. NPU Bound:
Note 1: There's some talk about and some review related to the KillerNIC. Whether or not it would improve your network connection depends on the following:
Condition(s):
1. Are you running another network dependent application in the background with your WoW instances?
2. How much of a network load are the other applications putting on your Network Interface Card (NIC)?
3. Are the other applications TCP or UDP packet dependent?
Condition(s) beyond your control:
1. Internet traffic worldwide.
2. Is there a virus outbreak, DoS, DDOS or etc going on the Internet?
3. How packets are prioritized and routed across the Internet.
4. How packets are prioritized and routed once it reaches Blizzard.
5. How many individuals are currently logged into WoW.
6. If you are using a wireless connection, electromagnetic (EM) interference and other devices on your LAN.
Observation 1:
For a while I was playing five instances of WoW on Time Warner Cable Road Runner Turbo (Up: 3.0Mbits / Down: 22.0Mbits) then I lowered it down to Road Runner lite (Up: 384Kbps / Down: 768Kbps). There was no noticeable lost in performance or increase in network latency. However, there was a drop in reliability of the Internet connection, but this was mainly due to the issue of insufficient upload bandwidth for running 5 instances of WoW. During this time, one of the WoW instances would occasionally lose a connection to the server.
12. OS Bound:
Observation 1:
For a while I ran my gaming rig with default settings in Vista. Then I tried to shutdown unnecessary services in Vista, but there was no noticeable improvement between the two settings. The only thing different was Vista memory usage and boot up time was slightly lower but none of this affected WoW.
Observation 2:
From a careful observational point of view, while running 5 instances of WoW, there is a slight pause every so often and it occurs with the mouse response being sluggish or non-responsive. It occurs at random, but mainly in the Zul'drak zone. I'm not sure if occurs when I enter other zones, but it may be a thread scheduler issue. The reason why is every time I observe the resource monitor and task manager for clues as to what may be causing the issue, I can not find it. The only guess I can make is it has something to do with the Vista's thread scheduler and NUMA Most common desktop applications are designed to run on any core, however with the trend towards NUMA, the issue then becomes would you want to have your application run on any core at random. Getting back to the CPU affinity and NUMA issue, if an application can bounce back and forth across processor nodes, how would that affect application performance. In this case, with the mouse and keyboard software set by default to run any core, I tried to limit them to processor node 0 (cores 0-3). Unfortunately, that did not resolve the issue.
13. Driver Bound:
Update in next revision.
14. App Bound:
Update in next revision.
15. MMI Bound:
Update in next revision.
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Continuation.
16. Conclusion:
After analyzing all of the above data, the conclusion was made that my computer rig is CPU bounded. The reason behind this is the fact that each of the three links shown above in the GPU section by two different websites pertaining to WoW performance on current generation of technology at that time show that in each and every case that WoW is some extent CPU bounded. By reasonably meeting the performance requirements for other areas of your computer system, overall your computer performance is ultimately limited by how powerful your CPU is. For my setup, dealing with each of the sections:
CPU Bound: Increasing the number of core mask beyond two per instance of WoW resulted in no performance improvement. However, doubling or tripling up a WoW instance on an already allocated core to another WoW instance would detrimentally affect your performance. Hopefully, a faster opteron processor in the near future would increase my fps.
Memory Bound: If the minimum requirement is met (your processor's recommended or maximum memory speed without overclocking and 1GB of memory per instance of WoW), increasing memory speed and adding more memory would result in no performance improvement. More memory would have resulted in more of the WoW data files being cache however.
GPU Bound: Looking at the three website listed in the GPU section, if you meet the minimum requirements for select CPU/GPU combination, increasing the GPU's processor and memory speed or adding GPUs and GPU memory would have resulted in performance improvement but not the maximum performance improvement. This would probably allow to you to run one or more instance of WoW for each GPU depending on whether or not you met the minimum requirement of two cores per one instance of WoW.
Hard Drive Bound: By symbolically linking each of your WoW folder to your main WoW folder, you explicitly state to Vista that all WoW data are the same. This ensures that Vista will not create multiple cache copies of your WoW folder in memory, there by overburdening memory with too many duplicates.
Network Bound: Meeting the minimum the bandwidth requirement of 56Kbps upload speed for each copy of WoW ensures adequate performance. Increasing bandwidth above the 56Kbps upload speed would not provide any performance improvement. Latency does play a strong role in how smooth online play is.
Software Bound: Turning off unneeded Vista services compared to a default Vista installation resulted in no performance improvement for WoW. However, it did provide a small improvement in boot and shutdown time and lowered average memory usage.
17. Unknown Issues:
1. Basic application and services CPU affinity and NUMA issues.
2. Vista thread scheduler.
3. GPU memory allocation and caching.
IV. Guide B (Theoretical):
1. Introduction
This last part of the guide is a review of the list of previous recommendations, what other recommendations are available and how future improvements in computing technology will affect the list. Like the previous guide, this guide is broken down into seventeen sections:
1. Introduction
2. List of Terms and References
3. Central Processing Unit (CPU) Bound
4. System Memory Bound
5. System I/O Bound
6. Graphic Processing Unit (GPU) Bound
7. Physics Processing Unit (PPU) Bound
8. Artificial Intelligence Processing Unit (AIPU) Bound
9. Audio Processing Unit (APU) Bound
10. Storage System Bound
11. Network Processing Unit (NPU) Bound
12. Software (Operating System) Bound
13. Software (Driver) Bound
14. Software (Application) Bound
15. Man-Machine Interface (MMI) Bound
16. Conclusion
17. Unknown Issues
V. Improvements:
1. Reference links to outside material.
2. Screen capture.
3. Typos.
4. Wordcrafting.
5. Index and listing of terms.
6. Source code.
If you have any suggestions on what needs improvement, what needs better explaining or what needs to be added to this guide, please let me know. Thank you.
reply to Ken. Continuation.
and the test location was Dalaran. Test was runned for about 10 minutes standing in one of the major intersections within the city. CPU utilization for the 4-core mask was between 20-30%. Try the above settings, disable the frame rate limit and try the different 1-core, 2-core, and 4-core masks. You will probably get similar results with the CPU utilization.
Getting to your second comment about the Memory, application memory usage is to some extent not affected by the OS settings. I believe caching is not included in the calculation of the WoW application memory usage (I will need to google the Internet on Vista caching system to verify.) but as a separate stat. Yes, WoW quality settings does have a strong affect on memory usage and was pointed out in a test case later on in the guide. I did go in-depth about the two different settings and compared the difference in memory usage. The thing I did forgot to add was WoW addons affecting memory usage which I will add to the guide in the next revision or the following revision after that.
As to the page file, all these test were done with the page file set to the maximum size recommended by the Vista OS. As to whether or not turning off the page file will affect actual application memory usage, it will probably increase the memory usage due to not being able to page certain parts of the application memory to the page file (Need to find reference or link to this and verify in a test case).
Now whether or not it is recommended to turn off the page (you referred to it as swap) file completely, I recommend not to do that. Certain applications are dependent on the page file for proper operation. Yes, there are certain applications that will run fine without it and your mileage will vary on applications you run (or crash).
Another good test case would be to test and see if reducing / disabling the page file would affect performance and stability of the OS and the WoW application. The next test case with the page file setting is to see if reported application memory changes.
As for the minimum memory vs range of memory, I will change that to 1 GB.
Getting to your third comment about NPU, I did measure the network bandwidth usage and it bounced between 10-60Kbps range to upwards of 100Kbps-150Kbps range but this is also dependent on which WoW addon you use and I need to add that to the next revision of the wiki. Settings used was similar to the above core mask test and the same location. You reference 1 WoW instance uses about 3KBps which translates into 24Kbps (reference for some of the readers, 8 bits = 1 byte. Capital "B" notation is usually referenced as a byte. while lowercase "b" is usually references as a bit.). Now 24Kbps is not really close to 56Kbps but close enough for this example and I use 56Kbps as a basic (generic and rounded up to something that people could recognize as a basic tier of service) unit of network bandwidth. I was not insinuating that 56Kbps was a telephone line, I was using it as a generic unit of network bandwidth.
Getting to your fourth comment about the OS, I do reference the reasons later on in the guide as information for those interested in learning more about it.
Getting to your fifth comment about UML, I was planning to go somewhere with it but I haven't finished adding/editing the last part of the guide.
Getting to your sixth comment about CPU Bound Observation 1, I'm adding more stuff to that sections and the reason why this information is relevant was to point out that CPU frequency has a strong affect on framerates. Referencing the three websites that had conducted performance test with the three different versions of WoW shows this. But I am still in the process of adding more to this section.
Your last comment about downgrading the memory from 800MHz to 400MHZ, I will agree to a certain extent about memory speed is important in 3D rendering applications. However, for certain applications and certain video games, this is not the case. Now depending on the programming of the application, some / most of the 3D graphics pipeline rendering is actually performed on the GPU (depends on which one you have). The textures or any related graphic items needed would be placed within the GPU node. Coders would try to limit transactions crossing between the GPU and CPU nodes. Some parts of the system memory would be used kind like a caching zone for graphic related items not needed or if there was not enough room on the GPU node to begin with. The cases where speed of the system memory starts playing a role is when you are talking about CAD or modeling programs where it's mainly dependent on the CPU and system memory. However, with the introduction of the CUDA programming language and/or changes in the 3D programming languages (etc.), the lines separating the CPU from the GPU are getting murky and application performance becomes less concrete as to which component is limiting it.
Another case is depending on the setup and whether or not it has an integrated memory controller (and the multi-tier caching scheme) may explain the difference between the individual you referenced as having a performance increase in fps vs setup A where the memory speed was lowered from 667MHz to 400MHz. In setup A, I did pull out the 400MHz memory to see if there was a difference, but there was no noticeable improvement in framerates. And could you provide the link to the individual who had a fps increase due to faster RAM?
You referenced that your system was a quad core running at 2.66GHz, is this a core 2 based or the new Core i7 based series. What are the specs on your machine and what kind of software settings adjustments have you made above the default settings. A good case test would be test a Core i7 system (test if the integrated memory controller plays a big role in performance, if the new multi-tier cache is affected by different memory speeds and whether or not hyperthreading affects performance on a small vs heavy WoW loads). These test cases, would provide valuable information and better inform the readers of this site.