Notes on Fps/CPU correlation
Kama_Blue
Join Date: 2012-03-13 Member: 148710Members
<div class="IPBDescription">and some quick ideas/fixes to the CPU load.</div>So i was a little annoyed at my FPS recently, after the 100th time i had died from some form of frame skipping or frame loss during an alien vs marine battle, i decided i would try to fix the issue in the only way a true man knows. <!--coloro:#FF0000--><span style="color:#FF0000"><!--/coloro-->MORE POWER<!--colorc--></span><!--/colorc-->.
Overclocking my FX-6100 from 3.6 Ghz to 4.3Ghz on all six cores, i noticed that what people said was generally true. My FPS jumped from an average of 40 (20ish in combat) to a neat 60, with 25-30 in combat. Just increasing my processor power was all it took to make the game engine run smoother, which makes sense considering the engine's unique occlusion system, but is still the tiniest bit insane.
Occlusion Culling can be seen here "http://www.unknownworlds.com/ns2/news/2009/03/occlusion_culling" and is the assumed reason the CPU is so important in NS2's engine.
Going full-testing-mode, i decided to do some research and see what resources the engine was using, and how (if possible) it could be improved. Here's what i found out.
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo-->Overview<!--sizec--></span><!--/sizec-->
Firstly i started up MSI afterburner (to watch GPU usage) and AMD Overdrive (For over/underclocking, and CPU usage)
My GPU = Nvdia 560TI 4gb
My CPU = FX-6100, running at a crazy 4.3Ghz (Normal is 3.3)
Doing a quick check of resource usage during a simple ready room with a good 24 player server, the stats were as follows.
<b>FPS:</b> 55
<b>CPU</b> (Total Load)
Core #1 : 80%
Core #2 : 50%
Core #3 : 40%
Core #4 : 2%
Core #5 : 2%
Core #6 : 60%
<b>GPU</b>: 30% Load
This tells me right away a couple things.
1). The game is not utilizing two of my CPU cores
2). An isolated ready-room, is proving extensive work for the CPU to decide what should be rendered or not (Spark Engine Occlusion system)
3). The GPU is barely working to render the scene
During Combat
<b>FPS:</b> 20-30
<b>CPU</b> (Total Load)
Core #1 : 85%
Core #2 : 80%
Core #3 : 60%
Core #4 : 3%
Core #5 : 5%
Core #6 : 80%
<b>GPU</b>: 50% Load (with small spikes to 100%)
This tells a far more important story. Namely that the GPU isn't able to do it's job. If the GPU was at 100% load the entire time my screen was displaying 20 FPS then it would be understandable, but it's not. The reason the GPU is at 50% load is because the slow CPU is passing only enough frames to the GPU in an amount of time to add up to 20 fps, and 50% of the load.
I'm sure you've heard this before, it's called a "CPU Bottleneck"
You'll also notice that it's not exactly demanding all of the power available from the four CPU cores NS2 decided to use. So it's not actually an issue of how much processing power the CPU has, but how fast it's processing this information. Which makes a bit of sense considering overclocking your CPU involves increasing the frequency (or speed) at which it runs and sends electrical signals. Faster = better. This is an important point to remember for the next section.
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo-->Cores to FPS<!--sizec--></span><!--/sizec-->
Next i decided to see exactly where the point was at which i didn't have enough processing power to sustain enough FPS for gameplay. Like a power-hungry man denying a baby of much needed sustinence, i started denying NS2 access to CPU cores and cycles. Here's what i got.
3 Cores: @ 3.9 Ghz
FPS: 45
In Combat FPS: 15-25
CPU Load
Core #1 : 95%
Core #2 : 90%
Core #3 : 70%
No big change here, felt about the same as what playing on my 3.6Ghz 4-cores was before i had overclocked them. Little less stable combat FPS than the 4 core setup though.
2 Cores: @ 3.9 Ghz
FPS: 30
In Combat FPS: 10-15
CPU Load
Core #1 : 95%
Core #2 : 90%
This is where i noticed i might be hitting the CPU requirement cap, during combat the FPS would pretty much die and unlike how it usually just gets sluggish and frame-by-frame. The screen started popping from place to place. Little jitters and miniature ruberbanding of the camera only a couple inches at a time. It looked like the player camera was out of sync with the server and getting autocorrected every few feet. Only happened when the CPU load peaked at 100% during combat though. I think this is officially the cut-off point, if you guys are getting little rubberband movements of your camera your CPU is hitting it's limits.
1 Core: @ 3.9 Ghz
FPS: 20
In Combat FPS: 8-10
CPU Load
Core #1 : 100%
Ok this was dreadful, the rubber banding was happening the entire time i was playing. Just jumpy camera, impossible to see what was happening, terrible FPS in combat. Oh, but the neat thing was that enemy players couldn't hit me at ALL. It was like the rubberbanding my character was doing on my screen was tied to what my hitbox was doing in-game. Upon learning this, i decided we needed to go lower.
1 Core: @ 2.0 Ghz (Underclocked)
FPS: 10
In Combat FPS: not enough to read the FPS meter
CPU Load
Core #1 : 100%
Practically unplayable, But somehow as skulk i managed to jump into the enemy base, kill five marines and then run out without dying. Hitboxes are either connected to how ######ty your computer is, or none of those marines can aim, at all, even a little bit, at all. The camera lag was pretty much unbearable and playing a round like this gave me a headache.
From these tests i learned a couple more things
1). FPS drops are not due to CPU load being too high, it's the CPU speed being too slow. More threads = more FPS because the process was being split up and done faster, until the game refused to utilize any more threads.
2). The CPU load directly causes stutters in gameplay, much different from FPS drops.
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo-->Final Notes<!--sizec--></span><!--/sizec-->
The summary of things discovered for improving the game's performance.
- NS2 is CPU throttled, not the overall bulk processing, but the processing speed. Increasing the processing speed increases the FPS.
- The GPU is not being utilized fully, because CPU processing has to happen before each and every frame due to occlusion culling. When the processor speed increases, the GPU load increases because it has more frames to process.
- NS2 is not utilizing more than 4 cores of a CPU. This isn't a massive problem, but it would be a valuable change to allow the game to utilize the other two cores, smoothing out the game on some of the newer 6 and 8 core CPUs this generation.
- The Hitboxes of players and the ease of hitting those players may be linked to the processor speed of that player (Needs further testing and confirmation)
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo--><b>Possible Solutions</b><!--sizec--></span><!--/sizec-->
Because i assume occlusion culling (the spark engine decides which polygons to render every frame) is the reason the engine is using SO much CPU power and so little GPU power, I Would suggest a few "Graphics" options to solve this problem.
1). Add an option for players to tweak the internal time between recalculating frames for occlusion. Essentially re-use the old calculated frame a couple times instead of re-calculating it a couple times a second. I don't have the numbers for the culling system, but i would assume it's calculating the frames more than once every second. If you gave players the options to lower (or increase) this timer with a description of what it did and a note that it may leave blank textures for a portion of a second when rounding corners if the refresh rate is too low, i'm almost completely sure it would solve a huge majority of the CPU linked FPS issues in the game. If you have a crappy computer, you wouldn't mind seeing an empty polygon or two at the far end of a hallway for a split second if you could constantly run the game at 60 fps. Since you could tell the engine to no longer be constantly re-estimating what you could see, the CPU load would be far lower and would throttle the GPU far less than it does currently on older rigs.
2). Add the actual occlusion wall brushes (from the hl2 hammer editor) to NS2. Slap a few of them down in every map between the large parallel hallways and rooms and you could greatly simplify the calculations the culling system has to go through. They're essentially walls that tell the engine not to render anything past them, because you shouldn't be able to see past them. Might help, might not.
3). Offload Culling to the GPU, they're developed to deal with dynamic rendering of content and calculations of what can be rendered and what can't. Considering the GPU is usually the strongest component in most gaming computers, it wouldn't be a terrible idea to utilize them. Bonus points if you can utilize the onboard graphics cards we all have on our motherboards for culling estimations.
4). Allow players with large amounts of RAM an option that makes culling a lower priority process, but stores the data for the area around you in RAM, specifically when combat imitates. Players with 8gb Ram and 2GB Vram would happily give you a full gig to store variables of recently visited and maybe-needed-soon parts of the map.
<b>Note: The above suggestions may be absurd and not properly represent the proper uses of various components your computer. However they are good ideas, even if impossible and/or nobody in Unknown-worlds knows how to implement them.</b> <i>(Also #1 is actually a really good idea if you can do it within the current engine limitations. Let us change the culling rate for a less accurate but faster culling cycle)</i>
Overclocking my FX-6100 from 3.6 Ghz to 4.3Ghz on all six cores, i noticed that what people said was generally true. My FPS jumped from an average of 40 (20ish in combat) to a neat 60, with 25-30 in combat. Just increasing my processor power was all it took to make the game engine run smoother, which makes sense considering the engine's unique occlusion system, but is still the tiniest bit insane.
Occlusion Culling can be seen here "http://www.unknownworlds.com/ns2/news/2009/03/occlusion_culling" and is the assumed reason the CPU is so important in NS2's engine.
Going full-testing-mode, i decided to do some research and see what resources the engine was using, and how (if possible) it could be improved. Here's what i found out.
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo-->Overview<!--sizec--></span><!--/sizec-->
Firstly i started up MSI afterburner (to watch GPU usage) and AMD Overdrive (For over/underclocking, and CPU usage)
My GPU = Nvdia 560TI 4gb
My CPU = FX-6100, running at a crazy 4.3Ghz (Normal is 3.3)
Doing a quick check of resource usage during a simple ready room with a good 24 player server, the stats were as follows.
<b>FPS:</b> 55
<b>CPU</b> (Total Load)
Core #1 : 80%
Core #2 : 50%
Core #3 : 40%
Core #4 : 2%
Core #5 : 2%
Core #6 : 60%
<b>GPU</b>: 30% Load
This tells me right away a couple things.
1). The game is not utilizing two of my CPU cores
2). An isolated ready-room, is proving extensive work for the CPU to decide what should be rendered or not (Spark Engine Occlusion system)
3). The GPU is barely working to render the scene
During Combat
<b>FPS:</b> 20-30
<b>CPU</b> (Total Load)
Core #1 : 85%
Core #2 : 80%
Core #3 : 60%
Core #4 : 3%
Core #5 : 5%
Core #6 : 80%
<b>GPU</b>: 50% Load (with small spikes to 100%)
This tells a far more important story. Namely that the GPU isn't able to do it's job. If the GPU was at 100% load the entire time my screen was displaying 20 FPS then it would be understandable, but it's not. The reason the GPU is at 50% load is because the slow CPU is passing only enough frames to the GPU in an amount of time to add up to 20 fps, and 50% of the load.
I'm sure you've heard this before, it's called a "CPU Bottleneck"
You'll also notice that it's not exactly demanding all of the power available from the four CPU cores NS2 decided to use. So it's not actually an issue of how much processing power the CPU has, but how fast it's processing this information. Which makes a bit of sense considering overclocking your CPU involves increasing the frequency (or speed) at which it runs and sends electrical signals. Faster = better. This is an important point to remember for the next section.
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo-->Cores to FPS<!--sizec--></span><!--/sizec-->
Next i decided to see exactly where the point was at which i didn't have enough processing power to sustain enough FPS for gameplay. Like a power-hungry man denying a baby of much needed sustinence, i started denying NS2 access to CPU cores and cycles. Here's what i got.
3 Cores: @ 3.9 Ghz
FPS: 45
In Combat FPS: 15-25
CPU Load
Core #1 : 95%
Core #2 : 90%
Core #3 : 70%
No big change here, felt about the same as what playing on my 3.6Ghz 4-cores was before i had overclocked them. Little less stable combat FPS than the 4 core setup though.
2 Cores: @ 3.9 Ghz
FPS: 30
In Combat FPS: 10-15
CPU Load
Core #1 : 95%
Core #2 : 90%
This is where i noticed i might be hitting the CPU requirement cap, during combat the FPS would pretty much die and unlike how it usually just gets sluggish and frame-by-frame. The screen started popping from place to place. Little jitters and miniature ruberbanding of the camera only a couple inches at a time. It looked like the player camera was out of sync with the server and getting autocorrected every few feet. Only happened when the CPU load peaked at 100% during combat though. I think this is officially the cut-off point, if you guys are getting little rubberband movements of your camera your CPU is hitting it's limits.
1 Core: @ 3.9 Ghz
FPS: 20
In Combat FPS: 8-10
CPU Load
Core #1 : 100%
Ok this was dreadful, the rubber banding was happening the entire time i was playing. Just jumpy camera, impossible to see what was happening, terrible FPS in combat. Oh, but the neat thing was that enemy players couldn't hit me at ALL. It was like the rubberbanding my character was doing on my screen was tied to what my hitbox was doing in-game. Upon learning this, i decided we needed to go lower.
1 Core: @ 2.0 Ghz (Underclocked)
FPS: 10
In Combat FPS: not enough to read the FPS meter
CPU Load
Core #1 : 100%
Practically unplayable, But somehow as skulk i managed to jump into the enemy base, kill five marines and then run out without dying. Hitboxes are either connected to how ######ty your computer is, or none of those marines can aim, at all, even a little bit, at all. The camera lag was pretty much unbearable and playing a round like this gave me a headache.
From these tests i learned a couple more things
1). FPS drops are not due to CPU load being too high, it's the CPU speed being too slow. More threads = more FPS because the process was being split up and done faster, until the game refused to utilize any more threads.
2). The CPU load directly causes stutters in gameplay, much different from FPS drops.
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo-->Final Notes<!--sizec--></span><!--/sizec-->
The summary of things discovered for improving the game's performance.
- NS2 is CPU throttled, not the overall bulk processing, but the processing speed. Increasing the processing speed increases the FPS.
- The GPU is not being utilized fully, because CPU processing has to happen before each and every frame due to occlusion culling. When the processor speed increases, the GPU load increases because it has more frames to process.
- NS2 is not utilizing more than 4 cores of a CPU. This isn't a massive problem, but it would be a valuable change to allow the game to utilize the other two cores, smoothing out the game on some of the newer 6 and 8 core CPUs this generation.
- The Hitboxes of players and the ease of hitting those players may be linked to the processor speed of that player (Needs further testing and confirmation)
<!--sizeo:5--><span style="font-size:18pt;line-height:100%"><!--/sizeo--><b>Possible Solutions</b><!--sizec--></span><!--/sizec-->
Because i assume occlusion culling (the spark engine decides which polygons to render every frame) is the reason the engine is using SO much CPU power and so little GPU power, I Would suggest a few "Graphics" options to solve this problem.
1). Add an option for players to tweak the internal time between recalculating frames for occlusion. Essentially re-use the old calculated frame a couple times instead of re-calculating it a couple times a second. I don't have the numbers for the culling system, but i would assume it's calculating the frames more than once every second. If you gave players the options to lower (or increase) this timer with a description of what it did and a note that it may leave blank textures for a portion of a second when rounding corners if the refresh rate is too low, i'm almost completely sure it would solve a huge majority of the CPU linked FPS issues in the game. If you have a crappy computer, you wouldn't mind seeing an empty polygon or two at the far end of a hallway for a split second if you could constantly run the game at 60 fps. Since you could tell the engine to no longer be constantly re-estimating what you could see, the CPU load would be far lower and would throttle the GPU far less than it does currently on older rigs.
2). Add the actual occlusion wall brushes (from the hl2 hammer editor) to NS2. Slap a few of them down in every map between the large parallel hallways and rooms and you could greatly simplify the calculations the culling system has to go through. They're essentially walls that tell the engine not to render anything past them, because you shouldn't be able to see past them. Might help, might not.
3). Offload Culling to the GPU, they're developed to deal with dynamic rendering of content and calculations of what can be rendered and what can't. Considering the GPU is usually the strongest component in most gaming computers, it wouldn't be a terrible idea to utilize them. Bonus points if you can utilize the onboard graphics cards we all have on our motherboards for culling estimations.
4). Allow players with large amounts of RAM an option that makes culling a lower priority process, but stores the data for the area around you in RAM, specifically when combat imitates. Players with 8gb Ram and 2GB Vram would happily give you a full gig to store variables of recently visited and maybe-needed-soon parts of the map.
<b>Note: The above suggestions may be absurd and not properly represent the proper uses of various components your computer. However they are good ideas, even if impossible and/or nobody in Unknown-worlds knows how to implement them.</b> <i>(Also #1 is actually a really good idea if you can do it within the current engine limitations. Let us change the culling rate for a less accurate but faster culling cycle)</i>
Comments
NS2 runs slow because of the lua game code, which a majority of is single threaded.
NS2 runs slow because of the lua game code, which a majority of is single threaded.<!--QuoteEnd--></div><!--QuoteEEnd-->
Nope, it's definitely something the CPU is being forced to calculate before the GPU can render it. Based on the low GPU usage. Increasing GPU load with increased processor frequency, and the tie-ins of FPS directly to processor frequency because of it.
CPU cycles are being called before GPU cycles, and lua scripts would have absolutely no reason to do that.
(This is my entry for the Most Constructive Post of 2012)
NS2 runs slow because of the lua game code, which a majority of is single threaded.<!--QuoteEnd--></div><!--QuoteEEnd-->
It's likely those guides are exactly one of the things the OP mentioned already. Mappers can put in occlusion brushes that simplify the calculations. However, the actualy occlusion process is still handled via the hardware. If it was entirely pre-set, mappers would have to compile their maps like with Source mapping, which isn't the case.
NS2 runs slow because of the lua game code, which a majority of is single threaded.<!--QuoteEnd--></div><!--QuoteEEnd-->
XDragon is correct. NS2's occlusion culling is actually pretty fast. If you open up console and type 'profile' you can see what parts of the game are running slow. Most of the time it's the Lua code, as it is single-threaded mostly. This is why 1 core is used most for it and you see much better performance when you overclock.
my results vary. While pushing my cpu to its utmost max on cpu usage, I saw no drop in fps which was within reason, and I could run around maps just fine.
<!--quoteo(post=2053966:date=Jan 1 2013, 02:41 AM:name=Kama_Blue)--><div class='quotetop'>QUOTE (Kama_Blue @ Jan 1 2013, 02:41 AM) <a href="index.php?act=findpost&pid=2053966"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->CPU cycles are being called before GPU cycles, and lua scripts would have absolutely no reason to do that.<!--QuoteEnd--></div><!--QuoteEEnd-->So, actually calculating the position of objects, your camera, states of particles, etc. is irrelevant to rendering the scene?
Of course you could redraw the scene with the last known state, but that would obviously be a useless frame since it looks exactly the same as the last one.
And your gpu load will ofc increase if you increase your FPS, occlusion culling slowdowns or not.
<!--quoteo(post=2053977:date=Dec 31 2012, 06:19 PM:name=Dghelneshi)--><div class='quotetop'>QUOTE (Dghelneshi @ Dec 31 2012, 06:19 PM) <a href="index.php?act=findpost&pid=2053977"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->An overly elaborate post stating what we all have known for 2 years now. NS2 is CPU-bottlenecked on most systems. Overclocking the CPU helps because it makes the CPU faster. More cores do not help because game logic cannot be magically parallelized. There are pretty much no games using more than 4 cores effectively (apart from some special cases, also depending on genre and single-/multiplayer), even optimized blockbusters like Battlefield 3 for the most part only utilize 2 cores. (Then there's also the fact that there aren't technically 6 full cores on your CPU and Windows thread handling might hinder performance by shoving load into the wrong cores, especially if you restrict NS2 to certain cores, but I don't really want to go into that...)
So, actually calculating the position of objects, your camera, states of particles, etc. is irrelevant to rendering the scene?
Of course you could redraw the scene with the last known state, but that would obviously be a useless frame since it looks exactly the same as the last one.<!--QuoteEnd--></div><!--QuoteEEnd-->
<!--quoteo(post=2053981:date=Dec 31 2012, 06:29 PM:name=xDragon)--><div class='quotetop'>QUOTE (xDragon @ Dec 31 2012, 06:29 PM) <a href="index.php?act=findpost&pid=2053981"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Point about the guidelines in the map is that the previous system was fully dynamic, and was somewhat slower. The current system most likely has a pretty small impact on your FPS overall, and as stated before the lua game logic is what consumes most of your cpu time.<!--QuoteEnd--></div><!--QuoteEEnd-->
I took your advice and opened up profile.
At the top there was a rendering thread, which did include a bit of Lua scripts and building processes, but the main rendering thread was taking up about 50% of the currently used processor power. I can't estimate exactly, but the lua and script components were taking up about 20% of the currently used processing, with it looked like most of the CPU power going towards deciding what to render (hmmm, occlusion culling)
And then there was another thread near the bottom, that was an additional 40% of the processor's current load, and it contained rendering and culling specifically.
<img src="http://i.imgur.com/0En8r.jpg" border="0" class="linked-image" />
Looks to me like anywhere from 20-30% of the processor load is culling from the bottom thread, and if the top thread's rendering has anything to do with that as well, it's more like 50-60%. That's a significant amount i would think.
<!--quoteo(post=2054085:date=Jan 1 2013, 02:21 AM:name=Desther)--><div class='quotetop'>QUOTE (Desther @ Jan 1 2013, 02:21 AM) <a href="index.php?act=findpost&pid=2054085"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->You should look up the info from the forums on Occlusion Culling. I believe culling was done on the GPU (Direct3D API?) but was moved to the CPU for performance reasons.<!--QuoteEnd--></div><!--QuoteEEnd-->
That's fine, but it's a bottleneck because the CPU isn't really the head-hancho in your computer. The Graphics processor is sometimes 12x and even 20x faster than a normal processor , at certain things more than others, but those are the kinds of things culling requires.
Where are you coming up with 40% for this 'culling'???
Max and Matso/Dushan/Brian/Everyone have been combing over NS2 for over a year? now making every optimization they can - if they could change the occlusion culling and potentially make gains like your suggesting, I am quite sure they would have considering the other changes they have made to make perf. improvements.
Where are you coming up with 40% for this 'culling'???<!--QuoteEnd--></div><!--QuoteEEnd-->
Occlusion culling calculations aren't counted toward the rendering timer.
If you look at the profiler closely, the rendering has its own thread. This rendering thread runs <u><b>in parallel</b></u> to the others (e.g. the Lua game logic thread). This means if the rendering thread takes 8ms to finish like in your example, but the whole frame takes about 21ms, the rendering thread <u><b>finishes 13ms before the rest of the frame does</b></u>. In other words: In your example your CPU would have 13ms of room to do all sorts of additional stuff in that thread before it would start slowing down your game. It is not counted towards the rendering timer because it actually does not take up any frame time.
On my system (which is pretty potent), occlusion culling takes between 1ms and 2ms of time while the Lua thread takes up about 11ms when there's not much happening (6ms in ready room).
The only case where occlusion culling would slow down your game is when you restrict your game to one core, since then everything has to be executed in succession instead of in parallel. On your system it would probably even take effect if you restrict it to two cores depending on which ones they are, since the AMD FX CPUs actually consist of "modules" which count as two cores, but share most of the important stuff like the FPU. I wonder if many games have poor performance with the AMD FX CPUs because Windows apparently (from your numbers) doesn't balance the load properly across the right cores (i.e. cores 0, 2 and 4 first and then balance between 1, 3 and 5 because they share part of their resources with the other ones).
Edit: To make an even more clearly visible example: <img src="https://pickhost.eu/images/0005/6918/NS2_Threads_Profiler.png" border="0" class="linked-image" />
The frame takes 12ms and those 12ms are almost exclusively from the first thread. The second thread, which does some rendering stuff like occlusion culling, actually does not matter unless it takes longer to complete than the other thread.
<!--quoteo(post=2054147:date=Jan 1 2013, 02:42 PM:name=NeoRussia)--><div class='quotetop'>QUOTE (NeoRussia @ Jan 1 2013, 02:42 PM) <a href="index.php?act=findpost&pid=2054147"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->If only LUA code could be offloaded onto the GPU, this would fix a lot of problems.<!--QuoteEnd--></div><!--QuoteEEnd-->If it was possible (not entirely sure if it isn't), it would create an incredible GPU bottleneck which would cause us all to have framerates in the single digits. GPUs are not faster than CPUs, otherwise why do we use CPUs? GPUs are specialized equipment for massively parallel operations of a specific kind.
GPUs are faster than CPUs, that's why we have cards for video processing and not CPUs, where CPUs are left to stream other data. That doesn't mean CPUs are more effective than GPUs at that either, GPUs are a lot more effective at things like running search queries, compiling data, and rendering video. All of those they are used for. For example NASA uses them to process information on samples, and scientist use GPUs to translate DNA, because they are more effective at it than any CPU.
Taken from a wiki:
"A CPU core can execute 4 32-bit instructions per clock (using a 128-bit SSE instruction) or 8 via AVX (256-Bit), whereas a GPU like the Radeon HD 5970 can execute 3200 32-bit instructions per clock (using its 3200 ALUs or shaders). This is a difference of 800 (or 400 in case of AVX) times more instructions per clock. As of 2011, the fastest CPUs have up to 6, 8, or 12 cores and a somewhat higher frequency clock (2000-3000 MHz vs. 725 MHz for the Radeon HD 5970), but one HD5970 is still more than five times faster than four 12-core CPUs at 2.3GHz (which would also set you back about $4700 rather than $350 for the HD5970)."
This all obviously assumes that each operation is completely separate from all other operations, and that each bulk operation consists of lots of tiny little completely identical operations.
In quite a few cases in gaming, this is true for the processing that is required. However in the vast majority of cases, it is not. In almost all situations in gaming, the answer to an operation is required in order for the next operation to begin. Such a task is essentially impossible for a GPU to perform.
GPUs are also highly susceptible to extreme performance loss from even the slightest complications. For example, whereas a conditional statement (like <i>if this then that</i>) is essentially negligible to a CPU, the effect a single one can have on the performance of a GPU is huge. That's why you typically offload very simple tasks which need to be calculated many many times to the GPU, and leave more complex tasks for the CPU to do.
That is quite a twisted sense of saying "it is faster". It is actually slower, but can do A LOT of things in parallel. It's like taking 3000 people and telling them to calculate 1/7. They will not be faster than just one guy calculating 1/7 because it is not an operation that requires parallel processing (a smartass might now argue that among 3000 people there is more likely to be one that is fast at mental math). GPUs are not <b>inherently faster</b>, which was my point. It's a very common misconception nowadays that doing something on your GPU will make it run faster.
I stated that it is specialized equipment for parallel processing and that is what it is. Of course it is faster in these applications, but game logic is definitely not something that would make any sense to run on a GPU because it is definitely not a whole bunch of simple calculations running in parallel, not to mention that Lua cannot run multiple threads out of a single VM.
Where are you coming up with 40% for this 'culling'???
Max and Matso/Dushan/Brian/Everyone have been combing over NS2 for over a year? now making every optimization they can - if they could change the occlusion culling and potentially make gains like your suggesting, I am quite sure they would have considering the other changes they have made to make perf. improvements.<!--QuoteEnd--></div><!--QuoteEEnd-->
I'm not sure what those numbers at the top are referring to. Pretty sure "Game" includes quite a bit of rendering and culling as well. I got my percentages by simply correlating percentage of time in µs(microseconds) the process was taking from each cycle. Compared to the overall amount of time used in each cycle.
That profile list doesn't actually tell you much, it tells you where the CPU cycles are going, but it doesn't tell you if one thread is waiting for another to finish before rendering something. The 12ms time to complete may not be holding anything up at all.
<!--quoteo(post=2054213:date=Jan 1 2013, 09:27 AM:name=NeoRussia)--><div class='quotetop'>QUOTE (NeoRussia @ Jan 1 2013, 09:27 AM) <a href="index.php?act=findpost&pid=2054213"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->...<!--QuoteEnd--></div><!--QuoteEEnd-->
<!--quoteo(post=2054219:date=Jan 1 2013, 09:37 AM:name=Imbalanxd)--><div class='quotetop'>QUOTE (Imbalanxd @ Jan 1 2013, 09:37 AM) <a href="index.php?act=findpost&pid=2054219"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->...<!--QuoteEnd--></div><!--QuoteEEnd-->
<!--quoteo(post=2054225:date=Jan 1 2013, 09:48 AM:name=Dghelneshi)--><div class='quotetop'>QUOTE (Dghelneshi @ Jan 1 2013, 09:48 AM) <a href="index.php?act=findpost&pid=2054225"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->...<!--QuoteEnd--></div><!--QuoteEEnd-->
You're all a bit right
The easiest example is that a CPU is like a really fast car. Say it goes 100 MPH and zooms around the city doing each task quickly and in the exact way it needs to do them.
Then the GPU is like 1000 bicycles. In most cases where you have a straight shot at a target, the CPU will beat any of those 1000 bicycles by a huge amount. But if say you need to deliver 100 pizzas across the city. The bicycles (GPU), even though they are slower, will be able to deliver all of the pizzas at the same time, where as the Car (CPU) is stuck speeding from one pizza delivery to another.
In a straight number-crunching race, the CPU will win every time. But when thier are more than one number that needs to be crunched at the same time, the GPU will outperform a CPU by a huge difference.
Lua scripts would fall under the "GPUs do this better" catagory, if they are indeed holding up rendering time.
Mythbusters actually explains the difference really well, they were paid to do a presentation by nvdia for some college or other.
<a href="https://www.youtube.com/watch?feature=player_embedded&v=mwDPb3T8bOQ" target="_blank">https://www.youtube.com/watch?feature=playe...p;v=mwDPb3T8bOQ</a>
Just no. Even if it were possible to run Lua on the GPU, the Lua code is the very definition of a sequential process.
It's the Lua code that is slowing things down, not occlusion culling. End of story.
Lack of updates from the client should not cause this. If the server fails to receive an updated position from the player, it will just interpolate it and use the predicted position. If someone is teleporting as you say there is something more than low frame rate going on.
I thought this was the case. Sick of seeing players with under 50ms seemingly warp around with dodgy hit reg and attacks. I'm not sure what UW's priorities are but these fundamental engine problems need to be at the top.
Do we really know the level of engine development taken place? It appears far more 'behind the sceens' than the LUA aspect we routinely debate.
No, it will lose disastrously. Straight number-crunching problems are nearly all embarassingly easy to parallelize and they are the best sort of problem you could ever hope to run on a GPU.
<!--quoteo(post=2054264:date=Jan 1 2013, 01:54 PM:name=Kama_Blue)--><div class='quotetop'>QUOTE (Kama_Blue @ Jan 1 2013, 01:54 PM) <a href="index.php?act=findpost&pid=2054264"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->But when thier are more than one number that needs to be crunched at the same time, the GPU will outperform a CPU by a huge difference.<!--QuoteEnd--></div><!--QuoteEEnd-->
Now you're confusing me here. That's pretty much the definition of a "straight number-crunching problem"(LINPACK, FFTs and the like).
<!--quoteo(post=2054264:date=Jan 1 2013, 01:54 PM:name=Kama_Blue)--><div class='quotetop'>QUOTE (Kama_Blue @ Jan 1 2013, 01:54 PM) <a href="index.php?act=findpost&pid=2054264"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Lua scripts would fall under the "GPUs do this better" catagory, if they are indeed holding up rendering time.<!--QuoteEnd--></div><!--QuoteEEnd-->
No no no no. LUA is pretty much the textbook definition of the worst thing imaginable to run on a GPU.
I think what's confusing you is your analogy. The CPU has huge caches, branch prediction, out of order execution etc. to deal with this nasty, seqential, branchy code with poor cache locality(of which game logic is a perfect example). It is not at all like a car in that the CPU is extremely nimble, it can turn on a dime and speed through this complicated mess; I'd liken it more to a motorcycle zipping between cars in rush-hour trafic. The GPU is not at all like 1000 bicycles in that is a lumbering juggernaut, with really long pipelines, tiny caches and little or no branch prediction. It's perfectly happy to deliver 1000 pizzas and it will have enormous troughput(pizzas per hour) as long as each batch of 1000 pizzas is going to the same place and there are no trafic lights and few other vehicles on the road.
First watch this demonstration:
<a href="http://www.nvidia.com/object/nvision08_gpu_v_cpu.html" target="_blank">http://www.nvidia.com/object/nvision08_gpu_v_cpu.html</a>
That was fun!
So what's important here is that the "CPU" can be controlled to perform basically any calculation on command; For calculations that are unrelated to each other, or where each computation is strongly dependent on its neighbors (rather than merely the same operaton), you usually need a full CPU. As an example, compiling a large C/C++ project. The compiler has to read each token of each source file in sequence before it can understand the meaning of the next; Just because there are lots of source files to process, they all have different structure, and so the same calculations don't apply accros the source files.
You could speed that up by having several, independent CPU's, each working on separate files. Improving the speed by a factor of X means you need X CPU's which will cost X times as much as 1 CPU.
Some kinds of task involve doing exactly the same calculation on every item in a dataset; Some physics simulations look like this; in each step, each 'element' in the simulation will move a little bit; the 'sum' of the forces applied to it by its immediate neighbors.
Since you're doing the same calculation on a big set of data, you can repeat some of the parts of a CPU, but share others. (in the linked demonstration, the air system, valves and aiming are shared; Only the barrels are duplicated for each paintball). Doing X calculations requires less than X times the cost in hardware.
The obvious disadvantage is that the shared hardware means that you can't tell a subset of the parallel processor to do one thing while another subset does something unrelated. the extra parallel capacity would go to waste while the GPU performs one task and then another different task
Really? Random thought here, but...Parallelize this.
1. Use function superExtrapolate(var1, var2) on a and b to get c. superExtrapolate takes 250 ms to run.
2. Raise c to the third power.
3. Go back to step 1, but with c as a. Do this 4,000 times.
All steps are dependent on the results of the previous step. Parallelization works when you just have a lot of math problems to solve; NOT when you have one very big, interdependent one that pretty much HAS to be solved one step at a time.
Granted, if you had to solve the above 3 steps for 1000 combinations of a and b, then it would be a bit more of a GPU thing.
1. Use function superExtrapolate(var1, var2) on a and b to get c. superExtrapolate takes 250 ms to run.
2. Raise c to the third power.
3. Go back to step 1, but with c as a. Do this 4,000 times.
All steps are dependent on the results of the previous step. Parallelization works when you just have a lot of math problems to solve; NOT when you have one very big, interdependent one that pretty much HAS to be solved one step at a time.
Granted, if you had to solve the above 3 steps for 1000 combinations of a and b, then it would be a bit more of a GPU thing.<!--QuoteEnd--></div><!--QuoteEEnd-->
>nearly all
>gives a single counter example
That example doesn't even have a practical use.
That's what i kept telling my math teacher in 8th grade. Boy was i wrong <i>(i was pretty much right, but there's people out there that use really crazy equations and math for neat stuff)</i>