Log in

View Full Version : Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues. Revision 3. Work in Progress.



vchi
02-14-2009, 09:36 PM
Multiboxing Wiki - Multicpu, Multigpu, Multiharddrive, Multimonitor, etc and Bounded Issues.

This wiki is broken down into five sections:
I. Synopsis A - Individuals who wants the top 5 recommendations.
II. Synopsis B - Individuals who wants the entire list of recommendations.
III. Guide A (Practical) - Individuals who needs to know why I choose the list of recommendations.
IV. Guide B (Theoretical) - Individuals who needs to know why I choose the list of recommendations, what other recommendations are available in addition to the list and how future improvements in computing technology will affect the list.
V. Improvements needed for this Wiki.

Some of these finding were made through careful observations. While others were through googling the web and the dual-boxing forums for clues and recommendations. Please don't take this as the absolute truth but as stated above as recommendations. Remember that your mileage will vary according to different setups and preferences. Throughout this guide, I try my best to control or limit unknown variables during each observation so that individuals can replicate the same results.

I. Synopsis A:
If you are on a budget and you need to maximize your money for best performance, follow this priority:
1. Vista operating system or any modern operating system that maximizes cache usage of available memory.
2. Maximize CPU frequency
3. Maximize number of cores
4. Maximize available memory
5. Maximize GPU frequency

II. Synopsis B:
If you are unable to read this wiki due to it's technical nature, here are some guidelines to ensure adequate performance (That's assuming you meet the correct CPU/GPU combination and exceed WoW's recommended system requirements):
1. Central Processing Unit (CPU) Bound
Minimum: 1 core per WoW instance
Recommended: 2 cores per WoW instance
2. System Memory Bound
Minimum: 600MB to 1 GB per WoW instance
Recommended: 2 GB per WoW instance
3. System I/O Bound
Minimum: None
Recommended: NUMA
4. Graphic Processing Unit (GPU) Bound
Minimum: One CPU core per GPU node per WoW instance.
Recommended: Two CPU cores at maximum frequency per WoW instance with the most powerful graphics node.
5. Storage System Bound
Minimum: Symbolically (Vista) or junction (XP, 2K, or NT) link all WoW folders.
Recommended: Symbolically or junction link all WoW folders. Ensure that each copy of WoW has 2GB of memory. Load as much system memory as your motherboard is capable of handling.
6. Network Processing Unit (NPU) Bound
Minimum: Ensure your ISP connection's upload speed >= number of WoW instances * 56Kbps upload speed
7. Software (Operating System) Bound
Minimum: 32-bit operating system for 2 or 3 WoW instances at full spec or 4+ WoW instances running at minimum spec.
Recommended: 64-bit operating system for 4+ WoW instances running at full spec.
8. Software (Driver) Bound
Minimum: Latest available non-beta drivers.

III. Guide A (Practical):1. Introduction:
This part of the guide is an exhaustive research into multiboxing with multidevices (multicpu, multigpu, multiharddrive, multimonitor) on one computer (for multiboxing with multicomputer see xzin's guide at Multiboxing Wiki - My 5 (Now 10!) Boxing WoW Writeup (http://www.dual-boxing.com/forums/index.php?page=Thread&threadID=410)). My technical guide will give you a basic understanding of the challenges of selecting the proper hardware and software components and the difficulties of troubleshooting a one PC setup. This guide contains my long-term observations of how multi-instances of WoW has affected my computer performance and what changes I made to ensure adequate performance.


In the list of terms, I break down each computer components into computational nodes. Please keep this in mind throughout the guide as this plays an important aspect in how computers operate and how future technology developments or paradigm shifts in programming will affect it.

I will be using my setup as an example. WoW1 is my main instance in which most of my actions are performed from. WoW2-WoW4 are my dps instance. WoW5 is my healer instance. My setup is not overclocked. I am using a clean (non-modified) Vista install with the latest patches and minimum installed third-party software. My current setup is as follows:
Processor:
2x AMD Opteron 2376 2.3GHz
Each processor has four DIMMS populated.
Motherboard:
Tyan Thunder n6650e (2915-E) BIOS v3.00
Nvidia nForce Driver v15.16
Memory:
4x4GB ECC Registered DDR2 667MHz
4x2GB ECC Registered DDR2 400MHz
Memory runs at 400MHz.
Graphics:
2x Nvidia GTX 280
Nvidia Geforce Driver v181.22
Hard Drive:
150GB Western Digital Raptor 10K RPM
74GB Western Digital Raptor 10K RPM
Monitor:
4x Dell 3007WFP (2560x1600)
Network:
Time Warner Cable Road Runner Turbo (Up: 2000Kbps / Down: 15000Kbps)
Time Warner Cable Road Runner Standard (Up: 512Kbps / Down: 7000Kbps)
Time Warner Cable Road Runner Basic (Up: 384Kbps / Down: 3000Kbps) - Not Tested.
Time Warner Cable Road Runner Lite (Up: 128Kbps / Down: 768Kbps)
Software:
Windows Vista 64-bit Ultimate Edition SP1


My previous setup is as follows:
Processor:
Intel Core 2 Extreme QX6700 2.66GHz @ 3.2GHz
Motherboard:
EVGA 750i SLI
Nvidia nForce Driver v15.23
Memory:
4x2GB DDR2 800MHz
Graphics:
2x Nvidia GTX 280
Nvidia Geforce Driver v181.22
Hard Drive:
150GB Western Digital Raptor 10K RPM
74GB Western Digital Raptor 10K RPM
Monitor:
3x Dell 3007WFP (2560x1600)
Network:
Time Warner Cable Road Runner Turbo (Up: 2000Kbps / Down: 15000Kbps)
Time Warner Cable Road Runner Standard (Up: 512Kbps / Down: 7000Kbps)
Time Warner Cable Road Runner Basic (Up: 384Kbps / Down: 3000Kbps) - Not Tested.
Time Warner Cable Road Runner Lite (Up: 128Kbps / Down: 768Kbps)
Software:
Windows Vista 64-bit Ultimate Edition SP1

This guide is broken down into seventeen sections:
1. Introduction
2. List of Terms and References
3. Central Processing Unit (CPU) Bound
4. System Memory Bound
5. System I/O Bound
6. Graphic Processing Unit (GPU) Bound
7. Physics Processing Unit (PPU) Bound
8. Artificial Intelligence Processing Unit (AIPU) Bound
9. Audio Processing Unit (APU) Bound
10. Storage System Bound
11. Network Processing Unit (NPU) Bound
12. Software (Operating System) Bound
13. Software (Driver) Bound
14. Software (Application) Bound
15. Man-Machine Interface (MMI) Bound
16. Conclusion
17. Unknown Issues

vchi
02-14-2009, 09:37 PM
2. List of Terms and References:
What is bounded? Bound (Using CPU Bounded also applies to other bounding issues.) as defined by wikipedia.org:

In computer science, CPU bound (or compute bound) is when the time for a computer to complete a task is determined principally by the speed of the central processor: processor utilization is high, perhaps at 100% usage for many seconds or minutes. Interrupts generated by peripherals may be processed slowly, or indefinitely delayed.

The concept of CPU bound was developed during early computers, when data paths between computer components were simpler, and it was possible to visually see one component working while another was idle. Examples components were CPU, tape drives, hard disks, card-readers, and printers. Computers that predominantly used peripherals were characterized as I/O bound. Establishing that a computer is frequently CPU bound implies that upgrading the CPU or optimizing code will improve the overall computer performance.

With the advent of multiple busses, parallel processing, multiprogramming, preemptive scheduling, advanced graphics cards, advanced sound cards and generally, more decentralized loads, it became less likely to identify one particular component as always being a bottleneck. It is likely that a computer's bottleneck shifts rapidly between components. Furthermore, in modern computers it is possible to have 100% CPU utilization with minimal impact to another component. Finally, tasks required of modern computers often emphasize quite different components, so that resolving bottleneck for one task may not affect the performance of another. For these reasons, upgrading a CPU does not always have a dramatic effect. The concept of being CPU bound is now one of many factors considered in modern computer performance.

Source: http://en.wikipedia.org/wiki/CPU_bound (http://en.wikipedia.org/wiki/CPU_bound)

What is Amadal's Law? Amadal's Law as defined by wikipedia.org:

Amdahl's law is a model for the relationship between the expected speedup of parallelized implementations of an algorithm relative to the serial algorithm, under the assumption that the problem size remains the same when parallelized. For example, if for a given problem size a parallelized implementation of an algorithm can run 12% of the algorithm's operations arbitrarily fast (while the remaining 88% of the operations are not parallelizable), Amdahl's law states that the maximum speedup of the parallelized version is 1/(1 - 0.12) = 1.136 times faster than the non-parallelized implementation.

Source: http://en.wikipedia.org/wiki/Amdahl%27s_law (http://en.wikipedia.org/wiki/Amdahl%27s_law)

What are nodes? Node as defined by wikipedia.org:

A node In the Unified Modeling Language (UML) is a computational resource upon which UML artifacts may be deployed for execution. [1]
There are two types of nodes: device nodes and execution environments.
1) A device represent hardware devices: a physical computational resource with processing capability upon which UML artifacts may be deployed for execution. Devices may be complex (i.e., they may consist of other devices).[1]
2) An execution environments represent software containers (such as operating systems, JVM, servlet/EJB containers, application servers, portal servers etc.) This is a node that offers an execution environment for specific types of components that are deployed on it in the form of deployable artifacts.[1]

Execution environments can be nested. Nodes can be interconnected through communication paths to define network structures. A communication path is an "association between two Deployment Targets, through which they are able to exchange signals and messages".[1]

Source: http://en.wikipedia.org/wiki/Node_ (http://en.wikipedia.org/wiki/Node_)(UML)

In the case of this guide, nodes are defined computational units that have their own memory (generic term) or workspace which is only accessible to that computational unit (in a very generic and abstract sense). Node areas are defined below as:
CPU Node: As in this case, the Opteron is a good example of a computing node with it's own memory space. For the Core 2, we are going to stretch the example to include the motherboard.
GPU Node: Another good example, where the graphic cores have their own memory space.
PPU Node: Since I don't own a Ageis PhysX card and they were later on bought out by Nvidia, we are going to use a generic example. No matter where the physics code is running on we are going to assume it has it's own memory space, whether it be physical (i.e. Ageis PhysX) or virtual (i.e. Nvidia PhysX on the GPU or Havoc on the CPU).
AIPU Node: No known physical implementation, however like the above we are going to assume a generic example (i.e. AI code on the CPU).
APU Node: Certain sound cards have it's own memory space (i.e. Creative Labs' X-Fi Fatal1ty Pro Series or higher).
NPU Node: Certain network cards have it's own memory space too (i.e. KillerNIC).
MMI Node: That's pretty much the user.

What are interfaces? Interface as defined by wikipedia.org:

Interface generally refers to an abstraction that an entity provides of itself to the outside. This separates the methods of external communication from internal operation, and allows it to be internally modified without affecting the way outside entities interact with it, as well as provide multiple abstractions of itself. It may also provide a means of translation between entities which do not speak the same language, such as between a human and a computer. Because interfaces are a form of indirection, some additional overhead is incurred versus direct communication.

The interface between a human and a computer is called a user interface. Interfaces between hardware components are physical interfaces. This article deals with software interfaces, which exist between separate software components and provide a programmatic mechanism by which these components can communicate.

Source: http://en.wikipedia.org/wiki/Interface_ (http://en.wikipedia.org/wiki/Interface_)(computer_science)

Interface areas are defined below as:
User Interface
Physical (Hardware) Interface
Software Interface

References used through the Guide?
1) World of Warcraft Hardware Guide:
http://www.anandtech.com/video/showdoc.aspx?i=2381 (http://www.anandtech.com/video/showdoc.aspx?i=2381)
2) World of Warcraft - Burning Crusade, Hardware Guide:
http://www.gamespot.com/features/6164252/index.html (http://www.gamespot.com/features/6164252/index.html)
3) World of Warcraft - Wrath of the Lich King, Hardware Guide:
http://www.gamespot.com/features/6202200/index.html?tag=feature;sidenav (http://www.gamespot.com/features/6202200/index.html?tag=feature;sidenav)

Natch
02-15-2009, 12:46 AM
Nice write-up. Information like this is always interesting to read.

vchi
02-15-2009, 11:44 PM
3. CPU Bound:
Observation 1:
While in Outland and also in the original WoW, overclocking my 4 core setup from 2.66GHz to 3.2GHz resulted in a modest 5-10 fps increase in each of the WoW instances.
Observation 2:
For the 8 core setup, all WoW processAffinityMask are manual configured in the WTF as followed:
WoW1 - SET processAffinityMask "15"
WoW2 - SET processAffinityMask "240"
WoW3 - SET processAffinityMask "240"
WoW4 - SET processAffinityMask "240"
WoW5 - SET processAffinityMask "15"


For the 4 core setup, all WoW processAffinityMask are manual configured in the WTF as followed:
WoW1 - SET processAffinityMask "15"
WoW2 - SET processAffinityMask "15"
WoW3 - SET processAffinityMask "15"
WoW4 - SET processAffinityMask "15"
WoW5 - SET processAffinityMask "15"


Note 1: WTF Directory - C:\World of Warcraft\WTF
Note 2: WoW by default (after installation) writes your WTF file with SET processAffinityMask "3". On loading of WoW, WoW writes your WTF file upon detection of number of cores available with SET coresDetected "X". X being the number of cores detected.
Note 3: Leaving SET processAffinityMask "3" as is would have resulted in only cores 0 and 1 being overtaxed with too many WoW instances. By following the above setup, this would more evenly load balance your WoW over all eight cores or four cores.
Note 4: In terms of the Core i7, any Core i7 derivative that supports hyperthreading or even any older generation platform that supports hyperthreading, I do not have any of these setups except for an old Pentium 4 which is broken, so I cannot test on whether or not WoW benefits from hyperthreading. However, there are numerous web articles that review hyperthreading and show whether or not certain applications and games benefit from hyperthreading.
For more information on Core I7 and hyperthreading, please look at these sites:
[H]ardOCP - http://www.hardocp.com/ (http://www.hardocp.com/)
Anandtech - http://www.anandtech.com/ (http://www.anandtech.com/)
Arstechnica - http://arstechnica.com/ (http://arstechnica.com/)
OCAU - http://www.overclockers.com.au/ (http://www.overclockers.com.au/)
The Tech Report - http://www.techreport.com/ (http://www.techreport.com/)
Tom's Hardware - http://www.tomshardware.com/us/#redir (http://www.tomshardware.com/us/#redir)
For more information on Core i7 and in-depth technical discussions, please look at these sites:
Arstechnica - http://arstechnica.com/ (http://arstechnica.com/)
The Tech Report - http://www.techreport.com/ (http://www.techreport.com/)
If you cannot understand what the above website states pertaining to hyperthreading, then here is the summary in a nutshell:
Hyperthreading duplicates certain resources (the instruction and date executor is not one of them) located on the core. Hyperthreading benefit certain applications and games but under three conditions which are:
1. Instructions that require data have a high probability of stalling (i.e. databases).
2. Multiple instructions being executed at the same time require mixed resources (i.e. floating point vs integer).
3. Thread scheduler / manager (whether it be a application or operating system based) is smart enough to schedule instructions in the most efficient manner (to keep cores busy, dealing with statement 1) with minimal resource conflict (to keep cores busy and not competing for similar resources, dealing with statement 2).
Pertaing to WoW, this application requires a mix set of resources. For this example we are going to be talking about floating point and integer calculation operations. For a Core i7 with 4 cores and 8 threads (hyperthreading enabled), here are some example conditions where hyperthreading may affect WoW performance:
Condition(s):
1. Running all WoW instances with a core mask of 255 in conjunction with other CPU intensive applications (i.e movie or music player) in the background. Any modern operating system thread scheduler is design to allocate resources as efficiently as possible. However, when the system is put a load test and depending how big the load is, performance degradation is possible. In the case of multiple instances of WoW and a movie player, where all other variables are held to a constant, there will be execution contention on the cores. Since both applications use floating point calculations, there is a high probability that the arithmetic logic unit (ALU, using a generic term) will be saturated. How you say? The thread scheduler will first try to push the above application threads to any available real cores, once those cores are saturated, then the thread scheduler will try to push the threads to the virtualized cores. However, remember earlier that the hyperthreading does not duplicate all resources and one of those resources that is not duplicated is the executor (i.e. where calculations are done, using a generic term). So basically you have two threads competing for the same resource.
2. Running all WoW instances with a core mask of 255. In this situation, it's a little more murky in terms of whether or not performance would be affected. However, if you have the unfortunate luck of having the thread scheduler allocate both instance of WoW on the same core (i.e. WoW1 is on real core 0 and WoW2 is on virtualized core 0 which is real core 0), you will take a performance hit. But in most cases, most operating system's thread scheduler are design to limit this possibility under the condition that no other (non-WoW) application is competing for the same resource, you do not overload your cores with too many WoW instances, and mathematical probability (i.e. luck).
With luck and everything else in life, your mileage will vary according to your unique setup.
Continuing on with observation 2, I did some long-term comparisons between different processAffinityMask between loading a single instance of WoW on one core mask, two core mask, and four core mask. Observing the windows task manager over an average 4-8 hour gaming session showed no noticeable performance difference whatsoever. Going from a single core mask to a dual core mask offered a slight performance improvement. Going from a dual core to a quad core offered no performance improvement.

The one thing I do want to point out is a correlation with CPU usage and core mask. The formula I have developed is as follows:

CPU usage for 1 WoW running on 1 core = (CPU usage for 1 WoW / number of core mask) * number of core mask

No matter how many core masks a single instance of WoW received, this would always resulted in CPU usage equaling one core at full utilization. However, I did not test a core mask where all 8 cores were masked due to the non-uniform memory architecture (NUMA) of the dual quad core processor. Using a core mask above 4 would have resulted in lower performance due threads bouncing back and forth between the two processors and memory requests crossing over the hyper-transport (HT).

Observation 3:
At the time I had my 4 core setup and was currently still in Outland, my average framerate was 30 fps for each of the WoW instances. However, once I stepped into Zangarmarsh (but most notably when I got to the Horde flight point), all WoW instances fps dropped to 15-20 fps or lower. Mouse control and input was very sluggish. I tried every combination of core mask for each WoW instance, but that did not alleviate my issue. Only after upgrading to a dual quad core did this issue disappear. Afterwards, the only determination I could guess was related to WoW5 instance overloading WoW1 thru WoW4 due to my setup only having 4 cores available.
Observation 4:
Using the 4 core setup, whenever I minimized one or more WoW instance, I noticed on the performance monitor a large drop in CPU utilization and overall improvement on system response. After upgrading to 8 core setup, the performance monitor showed the same result but no change in system response due to more available cores. Getting back to Observation 3, the way I made that determination was by minimizing WoW5, which resulted in system response improvement.


Observation 5:
No matter which zone I was in or whenever I visited Dalaran, my CPU utilization always remained at a constant rate.

vchi
02-15-2009, 11:45 PM
4. System Memory Bound:
Observation 1:
All WoW instances are compared between the two following settings:
Setting 1:
Video Settings -> Resolution Tab:
Resolution:1920x1200 (Wide)
Multisampling:24-bit color 24-bit depth 8x multisample
Refresh:60Hz
Vertical Sync:Unchecked
Hardware Cursor:Checked
Reduce Input Lag:Unchecked
Windowed Mode:Checked
Maximized:Unchecked
Video Settings -> Effects Tab:
Video Quality:Custom (Everything high minus the Shadow Quality: Low)

Setting 2:
WoW2-WoW5 instances are set as follows:
Video Settings -> Resolution Tab:
Resolution:1920x1200 (Wide)
Multisampling:24-bit color 24-bit depth 1x multisample
Refresh:60Hz
Vertical Sync:Unchecked
Hardware Cursor:Checked
Reduce Input Lag:Unchecked
Windowed Mode:Checked
Maximized:Unchecked
Video Settings -> Effects Tab:
Video Quality:Custom
View Distance:High
Terrain Detail:Low
Spell Detail:High
Environmental Detail:Low
Ground Clutter Detail:Low
Ground Clutter Radius:Low
Shadow Quality:Low
Texture Resolution:Low
Texture Filtering:Low
Weather Filtering:Low
Video Settings -> Effects Tab -> Shaders:
Specular Lighting:Unchecked
Full-screen Glow Effect:Unchecked
Death Effect:Unchecked

Note 1: The overriding factor in determining how much memory is needed is based off of the graphic settings for each of the WoW instances.
For the highest setting, my ram usage was above and stayed above 1GB for each instance of WoW. For the lowest setting, my ram usage bounced between 700MB to 1GB for each WoW instance.

The only danger here is when the following happens:

(WoW video setting * number of WoW instances) memory usage >= available memory per processor node.

Like I stated earlier, performance is affected when memory request starts crossing over the HT to the other processor node. To minimize this effect, I recommended to keep memory usage below the memory available per processor node. Another thing to keep in mind is that processor affinity instructs the operating system to keep certain programs on certain processor node. By doing this ensures that your program will not cross another processor node unless all programs with processor affinity set to the specified node has memory requirements exceeding available memory on the node.


Observation 2:
Loading my system 8 core system with 24GB of system memory resulted in no speed improvement given that I met most of the recommended requirements for each WoW instance. On the other hand with each WoW instance memory usage hovering between 1 GB and 1.5 GB and having 10 GB+ of system memory available for caching resulted in minimal load times when traveling and zoning. Long term observation showed little or no hard disk activity. However, once entering into Dalaran my memory usage was at it's highest, about 1.25 GB for each WoW instance. There was no lag or lack of system response. However, there was a drop in frame rate and it is due to a heavy load on the CPU with the new city design, proximity of different zones and number of characters located in a confined area.

Observation 3:
Going off observation 2, taking the 8 core system's ram and dropping the 400MHz ram for the 800MHz ram resulted in no noticeable performance improvement.

5. System I/O Bound
Update in next revision.

vchi
02-16-2009, 04:36 PM
6. GPU Bound:
My monitor and GPU setup is as follows:
Monitor A, B, C, D (each running at 2560x1600)
GPU 0 and GPU 1 (non-SLI)
Monitors arrange from left to right on the same level of vision.
A B C D
GPU 0 drives monitor A and B.
GPU 1 drives monitor C and D.
WoW1 and WoW5 appears on monitor B.
WoW2 appears on monitor C.
WoW3 and WoW4 appears on monitor D.
WoW grid coordinates and sizes:
Instance - X Coordinates - Y Coordinates - Resolution
WoW1 - 2560 - 0 - 1920x1200
WoW2 - 7680 - 0 - 1920x1200
WoW3 - 8320 - 400 - 1920x1200
WoW4 - 5120 - 0 - 1920x1200
WoW5 - 3200 - 400 - 1920x1200


Note 1: In a MS windows environment, the screen coordinates are defined as primary monitor with the top left hand corner of the screen defined as coordinates 0,0 with the numbers increasing in a down and right fashion. If you switch the primary monitor to a different monitor, then the new primary monitor with the top left hand corner of the screen would be defined as coordinates 0,0 with monitor(s) to the left and above of the primary monitor being in the negative coordinate space.
Note 2: In Vista, you cannot mix and match different GPU vendors (not the card vendors but the GPU processor manufacturers, i.e. AMD ATI, Nvidia, Intel, S3 or etc). By design Vista will prevent you or it will revert to generic drivers to ensure system stability. In the previous operating systems, you could mix and match different vendors, however based on MS long-term bug-report and crash analysis, they determined a majority of the stability issues were related to the GPU drivers (I do not have the link to it, I will try to google it for the next revision). In terms of mix and matching different GPU families (i.e. Nvidia GTX 200 series with Nvidia GeForce 8000 series), you can but it is very dependent on whether or not the latest drivers support both families at the same time. Vista limits you to one set of drivers for all video cards. If your video drivers are unified and supports the video cards currently in your machine, then you are good.
Note 3: My current 8 core setup does not have a mix and match different GPU family, so I can't tell you with concrete facts if it would work or how fast it would be. Depending on available time, I have an old Nvidia GeForce 8800 GTX that is floating as a spare part that I could use with my current setup and substitute one of the Nvidia GTX 280 series with it and test this out.
Note 4: For applications to benefit from CrossFire / SLI, three conditions must be met:
1. Operating System support
2. Driver support
3. Applications specifically written to taken advantage of a CrossFire / SLI
Condition 1 and 2 are already met. However in terms of WoW, condition 3 is not met. WoW is not specifically written to take advantage of a CrossFire / SLI setup. This is probably due to programming cost of building, testing, and maintaining that support if they so choose to add it. Even if they were to add it in a future version of WoW, there will two sets of source codes to maintain and over the long-term would add up in cost and the man-hours. This leads to another issue of coarse vs fine multi-threaded issues which I will cover in the next revision.
Drivers and the driver control panel can to some extent force WoW to enable the CrossFire / SLI support, but that would be stretching it.
Note 5: Spanning displays do not offer performance improvement. Monitor(s) connected to the same GPU (non CrossFire / SLI) still have the same GPU rendering size of workload = number of monitors * each monitor's workload. However, if the drivers provide support for rendering to multiple monitors and multi video card, there is no bus contention or bandwidth issues between the said video cards, and the application is written to take advantage of this setup, then there is a possibility for performance improvement.
Note 6: There's been unconfirmed reports of certain CPU/GPU combination and the Nvidia GTX 200's graphic cards not reaching it's maximum performance due to the Nvidia graphic drivers not being optimized for a 8 core setup or 4 core 8 thread (hyperthreading) setup. This may be due to a lag time between new hardware introduction and drivers developed and optimized for said hardware.

Observation 1:
With a muti-monitor setup, I wanted to be able to see all instances of WoW on different screens. However, every time a WoW instance that crosses from one GPU zone to another GPU zone resulted in the WoW instance's graphic being software rendered instead of hardware rendered. Case in point, using the above example, if WoW1 which is seen on monitor B which is driven by GPU 0 and was to be click and dragged to either monitor C or D, would have resulted in the framerate dropping to below 10 fps. Moving it back to the original windows restored it to 60 fps.

To compensate for the above issue, I had to adjust when and where I loaded my WoW with which monitor and GPU. By adjusting the primary display in the vista display control panel solved this problem and ensured an equal GPU load balance on both GPUs. The techniques is as follows:

Load Sequence:
1) Primary display is set to monitor A or B.
2) Load WoW1 instance and display on monitor B.
3) Load WoW5 instance and display on monitor B.
3) Switch primary display from monitor A to monitor C or D.
4) Load WoW2 thru WoW4 instances on monitor C or D.
5) Switch primary display from monitor C or D to monitor A or B.
6) Use software to reposition WoW window instances. WoW1 and WoW5 on monitor B. WoW2 on monitor C. WoW3 and WoW4 on monitor D.
During the above load sequence, your WoW instances would be moved around due to changing primary display. Your mileage may vary on where they would finally end at, but executing step 6 would reposition the windows to their correct positions.

Observation 2:
For SLI, running 1 instance of WoW did not show any noticeable improvement and at times lowered performance. Running multiple WoW instances on SLI also resulted the same thing.

Observation 3:
In the memory bound section, using the first settings (high quality) for each respective WoW instance and testing it at Dalaran, my average fps for all WoW instances were between 10-15fps. The only thing noticeable was lowering the shadow quality improved average fps by 5 fps for each WoW instance.

7. PPU Bound:
Update in next revision.
8. AIPU Bound:
Update in next revision.
9. APU Bound:
Update in next revision.

vchi
02-16-2009, 04:37 PM
10. Storage System Bound:
Hard drives are setup as follows:
150GB raptor contains OS and one main WoW folder with four WoW symbolically linked to the main WoW folder (the symbolically linked folder each has there own WTF configure file).
74GB raptor contains 24GB page file.


Note 1: There's been a lot of talk about using a Solid State Device (SSD) for certain performance improvement. Now, I will agree that SSD offer load times improvement. However, I will not go out of my way to by the best or even a mediocre and cheap SSD. Here's why:
Reasoning(s):
1. Hierarchical caching system. A computer is by design multiple levels of abstraction and hierarchy. Each level is designed to hide the complexities in the lower levels. Case in point, the CPU core is at the highest point on this hierarchy. Every time it puts a request for data, it does not care where the data comes from and only that it gets it data. Keeping in this mind, a multi-tier caching system is design to hide the complexity of where the data is coming from and how difficult it is to retrieve such data. You hope that the data is found in the space closest to the CPU (and in most cases, this is true due to locality). You also hope that the data that is not found in the space closest to the CPU (and in most case, you minimizing this with locality and caching algorithms.), will be found in the next space closest to the CPU.
Also keep in mind the issue of performance vs cost and they are a one-to-one correlation. The more performance you need, the more you need to pay up for it. However, like all WoW tanking stats are subjected to diminishing returns (DR) so is maximum performance.
Example:
The above example may be a little extreme but does make a point that as you get further and further away from the CPU, your relative performance increase subjected to diminishing returns. In addition increasing the each space size and retrieval rate does offer some performance improvement and additional cost, but depending on the code, it will reach a point where there is no performance improvement. But by using a multi-tier caching system, you have the best compromise in performance and cost.
2. Vista or any modern system that puts a strong emphasis on maximizing caching of available system memory.
Getting back to the issue of whether or not an individual should buy a SSD for performance improvements should reconsider for other reasons:
Other reasoning(s):
1. Limited acoustic noise.
2. Limited thermal dissipation.
3. Compactness.
4. Limited memory for caching purposes (this is a stretch) whether it be due to hardware limitation (i.e. Pentium III), software limitation (32-bit operating system), or budgetary reasons.
5. Load times (This is also a stretch too. WoW does not spend most of it's time loading data. Performance improvement would be dependent on how often you load data that is needed and not available in cache. Look at other reasoning 4.).

Observation 1:
Based on long-term observations, the only time the hard drive accessed anything was during boot up, shut down, loading WoW, quiting WoW, loading new area/zone and loading dungeon instance. Other than that, there was little or no hard drive access.

11. NPU Bound:
Note 1: There's some talk about and some review related to the KillerNIC. Whether or not it would improve your network connection depends on the following:
Condition(s):
1. Are you running another network dependent application in the background with your WoW instances?
2. How much of a network load are the other applications putting on your Network Interface Card (NIC)?
3. Are the other applications TCP or UDP packet dependent?


Condition(s) beyond your control:
1. Internet traffic worldwide.
2. Is there a virus outbreak, DoS, DDOS or etc going on the Internet?
3. How packets are prioritized and routed across the Internet.
4. How packets are prioritized and routed once it reaches Blizzard.
5. How many individuals are currently logged into WoW.
6. If you are using a wireless connection, electromagnetic (EM) interference and other devices on your LAN.
Observation 1:
For a while I was playing five instances of WoW on Time Warner Cable Road Runner Turbo (Up: 3.0Mbits / Down: 22.0Mbits) then I lowered it down to Road Runner lite (Up: 384Kbps / Down: 768Kbps). There was no noticeable lost in performance or increase in network latency. However, there was a drop in reliability of the Internet connection, but this was mainly due to the issue of insufficient upload bandwidth for running 5 instances of WoW. During this time, one of the WoW instances would occasionally lose a connection to the server.

12. OS Bound:
Observation 1:
For a while I ran my gaming rig with default settings in Vista. Then I tried to shutdown unnecessary services in Vista, but there was no noticeable improvement between the two settings. The only thing different was Vista memory usage and boot up time was slightly lower but none of this affected WoW.

Observation 2:
From a careful observational point of view, while running 5 instances of WoW, there is a slight pause every so often and it occurs with the mouse response being sluggish or non-responsive. It occurs at random, but mainly in the Zul'drak zone. I'm not sure if occurs when I enter other zones, but it may be a thread scheduler issue. The reason why is every time I observe the resource monitor and task manager for clues as to what may be causing the issue, I can not find it. The only guess I can make is it has something to do with the Vista's thread scheduler and NUMA Most common desktop applications are designed to run on any core, however with the trend towards NUMA, the issue then becomes would you want to have your application run on any core at random. Getting back to the CPU affinity and NUMA issue, if an application can bounce back and forth across processor nodes, how would that affect application performance. In this case, with the mouse and keyboard software set by default to run any core, I tried to limit them to processor node 0 (cores 0-3). Unfortunately, that did not resolve the issue.

13. Driver Bound:
Update in next revision.

14. App Bound:
Update in next revision.

15. MMI Bound:
Update in next revision.

vchi
02-16-2009, 04:37 PM
16. Conclusion:
After analyzing all of the above data, the conclusion was made that my computer rig is CPU bounded. The reason behind this is the fact that each of the three links shown above in the GPU section by two different websites pertaining to WoW performance on current generation of technology at that time show that in each and every case that WoW is some extent CPU bounded. By reasonably meeting the performance requirements for other areas of your computer system, overall your computer performance is ultimately limited by how powerful your CPU is. For my setup, dealing with each of the sections:
CPU Bound: Increasing the number of core mask beyond two per instance of WoW resulted in no performance improvement. However, doubling or tripling up a WoW instance on an already allocated core to another WoW instance would detrimentally affect your performance. Hopefully, a faster opteron processor in the near future would increase my fps.
Memory Bound: If the minimum requirement is met (your processor's recommended or maximum memory speed without overclocking and 1GB of memory per instance of WoW), increasing memory speed and adding more memory would result in no performance improvement. More memory would have resulted in more of the WoW data files being cache however.
GPU Bound: Looking at the three website listed in the GPU section, if you meet the minimum requirements for select CPU/GPU combination, increasing the GPU's processor and memory speed or adding GPUs and GPU memory would have resulted in performance improvement but not the maximum performance improvement. This would probably allow to you to run one or more instance of WoW for each GPU depending on whether or not you met the minimum requirement of two cores per one instance of WoW.
Hard Drive Bound: By symbolically linking each of your WoW folder to your main WoW folder, you explicitly state to Vista that all WoW data are the same. This ensures that Vista will not create multiple cache copies of your WoW folder in memory, there by overburdening memory with too many duplicates.
Network Bound: Meeting the minimum the bandwidth requirement of 56Kbps upload speed for each copy of WoW ensures adequate performance. Increasing bandwidth above the 56Kbps upload speed would not provide any performance improvement. Latency does play a strong role in how smooth online play is.
Software Bound: Turning off unneeded Vista services compared to a default Vista installation resulted in no performance improvement for WoW. However, it did provide a small improvement in boot and shutdown time and lowered average memory usage.

17. Unknown Issues:
1. Basic application and services CPU affinity and NUMA issues.
2. Vista thread scheduler.
3. GPU memory allocation and caching.

IV. Guide B (Theoretical):
1. Introduction
This last part of the guide is a review of the list of previous recommendations, what other recommendations are available and how future improvements in computing technology will affect the list. Like the previous guide, this guide is broken down into seventeen sections:
1. Introduction
2. List of Terms and References
3. Central Processing Unit (CPU) Bound
4. System Memory Bound
5. System I/O Bound
6. Graphic Processing Unit (GPU) Bound
7. Physics Processing Unit (PPU) Bound
8. Artificial Intelligence Processing Unit (AIPU) Bound
9. Audio Processing Unit (APU) Bound
10. Storage System Bound
11. Network Processing Unit (NPU) Bound
12. Software (Operating System) Bound
13. Software (Driver) Bound
14. Software (Application) Bound
15. Man-Machine Interface (MMI) Bound
16. Conclusion
17. Unknown Issues
V. Improvements:
1. Reference links to outside material.
2. Screen capture.
3. Typos.
4. Wordcrafting.
5. Index and listing of terms.
6. Source code.

If you have any suggestions on what needs improvement, what needs better explaining or what needs to be added to this guide, please let me know. Thank you.

Gnies
02-17-2009, 04:01 AM
sticky the wiki

dubiox
02-17-2009, 02:13 PM
Good stuff. What I have taken from the whole thing is that if I add some ram I can play all five on a single PC again (it worked fine until I hit outland, then I needed a second PC). I had narrowed it down to either game-loading disk access or running out of memory and swapping being the possible bottlenecks because the disk light was on continuously. I need to get 64 bit vista though as I only have 32 bit.

Why symlink the wow dirs instead of running them all from the same directory?

-K

vchi
02-18-2009, 09:34 PM
Dubiox-

First thing is did you leave your paging file at default or did you turn it off?

Second thing is to verify whether or not you have a shortage of system memory and how much of additional system memory you need to overcome that shortage, look in your task manager and see where memory usage is higher than available physical memory. Take the difference between the two and see whether or not you have available DIMM slots and / or correct density of memory to meet that shortage. In addition:

1) 32-bit OS: Task manager will not report correctly how much memory you have available (paradox with the second thing statement). Remember that the 32-bit OS is limited to an address space of 4GB which is shared with applications, i/o and other related operating system components. Out of that 4GB address space, 2GB is designed for applications and the other 2GB is designed for i/o and operating system related functions. As you add more and more components that utilize the address space, you will lose access to more and more of the available memory. Example:

Reported memory installed: 4GB
Usage
500MB address space for system I/O
500MB address space for graphics card A
500MB address space for graphics card B
Total available memory: 4GB - 500MB - 500MB - 500MB = 2.5GB

Application A needs 2GB
Application B needs 1GB

Operating system allocates 2GB to application A.
Operating system attempts to allocate 1GB to application B and either application gets 500MB of memory and 500MB page file or crashes because of no available memory for allocation.
or...
Operating system allocates 1GB to application A and 1GB page file.
Operating system allocates 1GB to application B.
or...
It really depends on the application...

2) There are boot time parameters you can modify to increase your application address space, but that is beyond the scope of this discussion and irrelevant to your purpose. But if someone insist on using the boot time parameters, then using the above example, this might happen:

Boot time parameter adjust I/O address space of 2GB to 1GB. Application address space adjusted from 2GB to 3GB.
Operating system allocates 500MB address space for system I/O.
Operating system allocates 500MB address space for graphics card A.
Operating system crashes because it cannot allocate 500MB address space, because there is none available or operating system loads but reports that graphic card B has an error due to unavailable address space.

or...
When you load an application, it crashes and causes random lock ups.

or...
It really depends on the application...


4) When and if you do decide to upgrade to a 64-bit operating system, keep in mind that 64-bit does not necessarily mean improved performance. The only thing it means is access to a larger address space (40-bit).

And replying to your question. "Why symbolic the wow dirs instead of running them all from the same directory?" You bring up a good question and to be honest, I don't think there is a really big different between symbolic link for 5 WoWs vs running 5 WoWs from the same folder. The reason why I choose the symbolic link option is to have each WoW have it's own custom WTF config file. As stated earlier, I have a NUMA platform and I don't want to risk running my application threads and memory transactions across the HT. To solve this issue, I prefer to set the processor affinity in the WTF config file. I know there are applications out there and listed in this forum that can do it at application load up without the use of the WTF config file. But I prefer to set it manually. Good point / question. I never actually thought about that when I did this guide. Thank you for bringing that up.

Ken
02-19-2009, 07:33 AM
1. Central Processing Unit (CPU) Bound
Minimum: 1 core per WoW instance
Recommended: 2 cores per WoW instance
This is incorrect, as the amount of cores required depends completely on the speed of that core and other factors like your target framerate.
For example: I have a quad core CPU at 2.67Ghz. When I put 1 WoW instance on 1 core then this WoW instance(rendering at 30fps fixed rate) uses only about 50% of the core's CPU time.
Adding extra cores for 1 WoW instance will very likely not make any noticable difference.


2. System Memory Bound
Minimum: 600MB to 1 GB per WoW instance
Recommended: 2 GB per WoW instance
This is inaccurate, because it completely depends on the operating system, the OS settings(cache etc.) and the WoW quality settings. All these factors influence WoW RAM usage.
A minimum doesn't require a range, because a minimum is a set limit. Settings will also differ greatly depending on whether you use the windows 'swap file' or not.


6. Network Processing Unit (NPU) Bound
Minimum: Ensure your ISP connection's upload speed >= number of WoW instances * 56Kbps upload speed
Have you actually measured WoW's network speed? One WoW instance uses about 3kByte per second (1.5 up and 1.5 down), which is far from 56kbits per second last time I measured it.
Also: a 56kbit telephone line(which you are insinuating) is absolutel not comparable to an ISDN/ADSL/cable connection with an increased speed.


Recommended: 64-bit operating system for 4+ WoW instances running at full spec.
The reason for running 64-bit is when you have a 64bit CPU and when you want to use more than 4GB of RAM. This is a quite an important detail.


A node In the Unified Modeling Language (UML) is a computational resource upon which UML artifacts may be deployed for execution. [...]
Why quote that? This is one of the pieces of information that makes the article very bloated(9 lines of text that have nothing to do with the article!) and unreadable, as it's completely irrelevant information.


3. CPU Bound:
Observation 1:
While in Outland and also in the original WoW, overclocking my 4 core setup from 2.66GHz to 3.2GHz resulted in a modest 5-10 fps increase in each of the WoW instances.

How is that relevant information? It says absolutely nothing about FPS targets, how the original FPS was, whether the FPS increase was necessary, etc. etc.

I see a lot of references and observations, but our article really misses concrete (general) conclusions to be of any value to other people. There are too many inaccuracies, gaps and errors too, in my opinion. (I just picked out a few)
Sorry to be so negative, but please don't give people general hardware advise based the observations for your specific hardware.


[edit]
Also, an upgrade from:
4x2GB DDR2 800MHz

to:

4x4GB ECC Registered DDR2 667MHz
4x2GB ECC Registered DDR2 400MHz
Memory runs at 400MHz.

... is actually a downgrade, because memory speed is quite important in 3D rendering applications.
I recall someone posting on this forum about a performance increase after upgrading their RAM(in terms of speed) to 800MHz or faster.

Moocifer
02-19-2009, 04:41 PM
50% cut and paste Wikipedia.

30% irrelevent.

15% utter nonsense.

5% meh.

Sorry.

vchi
02-19-2009, 09:19 PM
Ken-

Thank you for replying to this thread. Constructive criticism is always appreciated in making this a better wiki. I have been working on this wiki for about a couple of days now. Adding stuff here and moving stuff there. But always making improvements. No, your comments are not negative. Like I said earlier, constructive criticism is always welcomed. I do think my hardware setup and the observations made to setup A and setup B can be useful to other people and could also apply to other people's setups.

Most of the observations made are in reference to common computing principles that apply to most computer setups. I try my best to limit unknown variables. I use settings that could be used on other setups so the results can be replicated. I don't say using these settings will have the same exact results, but will be close enough within reason to be the same.

Getting to your first comment about the CPU, the reason for the recommendation was load balancing. But I think I may need to reword this part depending on the point of view.
Viewpoint 1: The recommendation was made because Blizzard made a statement that WoW had dual-core support. Now, this suggest that WoW may have certain parts of the application multithreaded to some extent and at a later point in the future will include more multithreaded support. In anticipation to this possibility, having the additional core would not hurt but like I stated earlier, this would not improve performance noticeably (if a future WoW application had increased multithreaded support there's a chance for improved performance on a multi-core system).

Viewpoint 2: For a pure fps increase, I would just allocate 1 WoW instance to one core and go for the fastest core you can find. However, putting one or more WoW instances on the same core will hurt the performance.

Viewpoint 3: If you are using a framerate limiter, then depending on core utilization for the WoW applications that are affected by the framerate limiter, placing more than 1 WoW instance on the same core, would or would not affect performance too badly.

I must disagree with the comment about the amount of cores required (to play 1 WoW instance) is determined by the speed of each of the core. In setup A and setup B, where I tried one-core, two-core, and four-core mask, there was no noticeable improvement in performance. Even with the higher clocked frequency in setup B, the different in core mask did not offer noticeable improvement. Studying the task manager and analyzing the load of 1 WoW instance, there are times where the load would split across the number of cores specified in the core mask. But when you add up the different core utilizations, it will always equal to about one core at full utilization.

In reference to your comment, a test case we can build to check and see if lowering the CPU frequency and testing it against different core mask. See if there is any difference in performance or see if the WoW application will split it's load among the different core mask to maintain a certain level performance. Those would be very interesting results.

You referenced that 1 WoW instance on 1 core with a 30fps fixed rate rendering resulted in 50% core utilization on that one core. Your fixed rate fps setting is probably the one that is producing that result. You are limiting your graphics card to the amount of information it will process and as result of this the GPU requires a lesser amount of information from the CPU hence the lower CPU utilization. If you were to remove the fixed rate fps setting, you would see similar results stated earlier in the guide. The settings I used (for 1 WoW instance) were:
for the CPU:
WoW1 - SET processAffinityMask "15"

for the GPU:
Video Settings -> Resolution Tab:
Resolution:1920x1200 (Wide)
Multisampling:24-bit color 24-bit depth 8x multisample
Refresh:60Hz
Vertical Sync:Unchecked
Hardware Cursor:Checked
Reduce Input Lag:Unchecked
Windowed Mode:Checked
Maximized:Unchecked
Video Settings -> Effects Tab:
Video Quality:Custom (Everything high minus the Shadow Quality: Low)

vchi
02-19-2009, 09:19 PM
and the test location was Dalaran. Test was runned for about 10 minutes standing in one of the major intersections within the city. CPU utilization for the 4-core mask was between 20-30%. Try the above settings, disable the frame rate limit and try the different 1-core, 2-core, and 4-core masks. You will probably get similar results with the CPU utilization.

Getting to your second comment about the Memory, application memory usage is to some extent not affected by the OS settings. I believe caching is not included in the calculation of the WoW application memory usage (I will need to google the Internet on Vista caching system to verify.) but as a separate stat. Yes, WoW quality settings does have a strong affect on memory usage and was pointed out in a test case later on in the guide. I did go in-depth about the two different settings and compared the difference in memory usage. The thing I did forgot to add was WoW addons affecting memory usage which I will add to the guide in the next revision or the following revision after that.

As to the page file, all these test were done with the page file set to the maximum size recommended by the Vista OS. As to whether or not turning off the page file will affect actual application memory usage, it will probably increase the memory usage due to not being able to page certain parts of the application memory to the page file (Need to find reference or link to this and verify in a test case).

Now whether or not it is recommended to turn off the page (you referred to it as swap) file completely, I recommend not to do that. Certain applications are dependent on the page file for proper operation. Yes, there are certain applications that will run fine without it and your mileage will vary on applications you run (or crash).

Another good test case would be to test and see if reducing / disabling the page file would affect performance and stability of the OS and the WoW application. The next test case with the page file setting is to see if reported application memory changes.

As for the minimum memory vs range of memory, I will change that to 1 GB.

Getting to your third comment about NPU, I did measure the network bandwidth usage and it bounced between 10-60Kbps range to upwards of 100Kbps-150Kbps range but this is also dependent on which WoW addon you use and I need to add that to the next revision of the wiki. Settings used was similar to the above core mask test and the same location. You reference 1 WoW instance uses about 3KBps which translates into 24Kbps (reference for some of the readers, 8 bits = 1 byte. Capital "B" notation is usually referenced as a byte. while lowercase "b" is usually references as a bit.). Now 24Kbps is not really close to 56Kbps but close enough for this example and I use 56Kbps as a basic (generic and rounded up to something that people could recognize as a basic tier of service) unit of network bandwidth. I was not insinuating that 56Kbps was a telephone line, I was using it as a generic unit of network bandwidth.

Getting to your fourth comment about the OS, I do reference the reasons later on in the guide as information for those interested in learning more about it.

Getting to your fifth comment about UML, I was planning to go somewhere with it but I haven't finished adding/editing the last part of the guide.

Getting to your sixth comment about CPU Bound Observation 1, I'm adding more stuff to that sections and the reason why this information is relevant was to point out that CPU frequency has a strong affect on framerates. Referencing the three websites that had conducted performance test with the three different versions of WoW shows this. But I am still in the process of adding more to this section.

Your last comment about downgrading the memory from 800MHz to 400MHZ, I will agree to a certain extent about memory speed is important in 3D rendering applications. However, for certain applications and certain video games, this is not the case. Now depending on the programming of the application, some / most of the 3D graphics pipeline rendering is actually performed on the GPU (depends on which one you have). The textures or any related graphic items needed would be placed within the GPU node. Coders would try to limit transactions crossing between the GPU and CPU nodes. Some parts of the system memory would be used kind like a caching zone for graphic related items not needed or if there was not enough room on the GPU node to begin with. The cases where speed of the system memory starts playing a role is when you are talking about CAD or modeling programs where it's mainly dependent on the CPU and system memory. However, with the introduction of the CUDA programming language and/or changes in the 3D programming languages (etc.), the lines separating the CPU from the GPU are getting murky and application performance becomes less concrete as to which component is limiting it.

Another case is depending on the setup and whether or not it has an integrated memory controller (and the multi-tier caching scheme) may explain the difference between the individual you referenced as having a performance increase in fps vs setup A where the memory speed was lowered from 667MHz to 400MHz. In setup A, I did pull out the 400MHz memory to see if there was a difference, but there was no noticeable improvement in framerates. And could you provide the link to the individual who had a fps increase due to faster RAM?

You referenced that your system was a quad core running at 2.66GHz, is this a core 2 based or the new Core i7 based series. What are the specs on your machine and what kind of software settings adjustments have you made above the default settings. A good case test would be test a Core i7 system (test if the integrated memory controller plays a big role in performance, if the new multi-tier cache is affected by different memory speeds and whether or not hyperthreading affects performance on a small vs heavy WoW loads). These test cases, would provide valuable information and better inform the readers of this site.