Notes on Fps/CPU correlation

Imbalanxd · January 2013

<div class='quotetop'>QUOTE (Kama_Blue @ Jan 2 2013, 07:16 AM) <a href="index.php?act=findpost&pid=2054522"><{POST_SNAPBACK}></a></div><div class='quotemain'>That's what i kept telling my math teacher in 8th grade. Boy was i wrong <i>(i was pretty much right, but there's people out there that use really crazy equations and math for neat stuff)</i></div>

Well in that example the value can probably be worked out immediately simply by knowing A, B and the number of iterations. No need to actually perform the iterations.

Makenshi · January 2013

<div class='quotetop'>QUOTE (Imbalanxd @ Jan 2 2013, 12:27 AM) <a href="index.php?act=findpost&pid=2054529"><{POST_SNAPBACK}></a></div><div class='quotemain'>Well in that example the value can probably be worked out immediately simply by knowing A, B and the number of iterations. No need to actually perform the iterations.</div>
... What about scenarios where you need the value from every iteration of the recursive operation? Some OCR algorithms require this for example.

Imbalanxd · January 2013

<div class='quotetop'>QUOTE (Makenshi @ Jan 2 2013, 07:41 AM) <a href="index.php?act=findpost&pid=2054540"><{POST_SNAPBACK}></a></div><div class='quotemain'>... What about scenarios where you need the value from every iteration of the recursive operation? Some OCR algorithms require this for example.</div>

AINT NOBODY GOT TIME FOR THAT

Amb · January 2013

a single threaded video game in 2012/13 is a joke btw.

Imbalanxd · January 2013

<div class='quotetop'>QUOTE (Amb @ Jan 2 2013, 07:55 AM) <a href="index.php?act=findpost&pid=2054550"><{POST_SNAPBACK}></a></div><div class='quotemain'>a single threaded video game in 2012/13 is a joke btw.</div>

How relevant to this discussion.

Soylent_green · January 2013

<div class='quotetop'>QUOTE (Katana314 @ Jan 1 2013, 06:01 PM) <a href="index.php?act=findpost&pid=2054354"><{POST_SNAPBACK}></a></div><div class='quotemain'>Really? Random thought here, but...Parallelize this.

1. Use function superExtrapolate(var1, var2) on a and b to get c. superExtrapolate takes 250 ms to run.
2. Raise c to the third power.
3. Go back to step 1, but with c as a. Do this 4,000 times.

All steps are dependent on the results of the previous step. Parallelization works when you just have a lot of math problems to solve; NOT when you have one very big, interdependent one that pretty much HAS to be solved one step at a time.</div>

This problem is embarassingly easy to parallelize; run it with a different set of inputs on each core. If you have 10 000 cores, solve for 10 000 different inputs. The fact that the problem takes so long to solve means that there is very little communication between cores and the organizational overhead is tiny.

Oh, but what if you only want to do this calculation once, ever? Well, that's almost never the case. If that is the case, then let the calculation run(total runtime 17 minutes in this case) and go and have a cup of coffee or something. If you need to reuse the result, write it down or store it in a file or in code or wherever.

DC_Darkling · January 2013

Ok since I was a lil bored I did the following test for you folk. Most, if not all, is a repeat of what many people said.. but hey.. more tests.

I ran fraps combined with process explorer to check how NS2 would do fps wise, and with cpu / gpu use overal.
I started a training round for 16 slots, hopped in the hive and started placing mainly cysts, some defense/cloak chambers, and a few whips.
This was continued untill the hive area I was hovering over starting to produce the to many entities error.

I by now wrecked my good fps to a standard of around 15fps.

cpu usage was low most of the time and during this test, except for the loading phases, did not hover above 10, 15% in general.
gpu was also not going max, but process explorer is alot less acurate on reading this.

Most of the stacks were related to textures / mipmapping and lua.dll. stacks where refreshed around every 5 seconds after i read the info of the previous stack.

increasing priority on the process for the cpu did nothing at this point.
changing texture details for minmap in nvidia control panel did nothing, although I did not bother to restart the game, which may have been required.

A IMPORTANT side note.
I noticed in the stacks that NS2 does infact use physx. Checking the forums with a little search showed this was not new info.
The only cards capable of using the videocard for physx are nvidia cards. its a nvidia thing. While other cards can technicly run it, its not supported and physx refuses to run on them, its hardcoded in.
This means that physx must run on the cpu for all AMD/ATI card users. Let me inform you right here and now. A CPU is absolute crap at running physx code. It can get a overclocked i7 to its knees and then some. This would explain the weird difference in results.

having lua code on the CPU + physx on the cpu if you do not have a nvidia card, will surely drop your fps like a stone. Updating physx will help perhaps.. a bit.
getting a strong nvidia card will help more. I SUSPECT that either particles or atmospheric (in your graphics options) are the physics calculations involved. If that asumption is correct, disabling that to not use physx (as much) should boost performance considerably.

If it is partially related to physx, then I would not start blaiming NS2 for anything. Thats a war between nvidia/amd and your better of complaining there for it to become open source or something

Tig · January 2013

lol at this thread. none of you would've made it through a single game in alpha. you'd all be crying garbage collection.

DC_Darkling · January 2013

HEY.. I take insults to that. I haven't complained about ns2 graphics or performance yet.. Im just giving them more info. ^^

Makenshi · January 2013

<div class='quotetop'>QUOTE (Tig @ Jan 2 2013, 01:44 PM) <a href="index.php?act=findpost&pid=2054826"><{POST_SNAPBACK}></a></div><div class='quotemain'>lol at this thread. none of you would've made it through a single game in alpha. you'd all be crying garbage collection.</div>

lol comparison between retail performance to alpha.

Kama_Blue · January 2013

<div class='quotetop'>QUOTE (DC_Darkling @ Jan 2 2013, 10:10 AM) <a href="index.php?act=findpost&pid=2054811"><{POST_SNAPBACK}></a></div><div class='quotemain'>If it is partially related to physx, then I would not start blaiming NS2 for anything. Thats a war between nvidia/amd and your better of complaining there for it to become open source or something</div>

I use an Nvdia card so that's not the issue, and most AMD cards from the last three years run PhysX,

PhysX is probably atmospherics though, and you can turn that off. There's no large-scale particle emulation anywhere else.

DC_Darkling · January 2013

I did a recheck after that, but I can nowhere find any proof that amd by now supports and runs physx. All big physx sites still say the opposite.
There was a small time where they could run it, but it was patched out.

xDragon · January 2013

The physX functions that they use run on the CPU only anyways, there is no GPU acceleration used by NS2 afaik. From what I remember they pretty much soley use physX for collisions, as it was faster than Havok.

sloe · January 2013

<div class='quotetop'>QUOTE (Dghelneshi @ Dec 31 2012, 09:19 PM) <a href="index.php?act=findpost&pid=2053977"><{POST_SNAPBACK}></a></div><div class='quotemain'>So, actually calculating the position of objects, your camera, states of particles, etc. is irrelevant to rendering the scene?
Of course you could redraw the scene with the last known state, but that would obviously be a useless frame since it looks exactly the same as the last one.</div>

Do
do_input <- cpu (handle keyboard/mouse state)
do_ai <- cpu (compute NPC actions)
do_physics <- cpu or gpu (compute object collision/direction/velocity)
do_scene <- cpu (culling/translate 3d space)
do_render <- gpu (send scene to video hardware/draw the 3d scene as a 2d frame/post-process shaders)
do_sounds <- cpu (play sound effects)
Loop

This "typical game loop" executes as fast as the processor will allow. Each time it loops one frame is rendered to the screen. What he means is if you throttle the culling down to say 10 times per second (meaning the culling does not have to be processed every iteration of the loop) it could improve performance. It's not always necessary to compute every piece of the game logic during every pass of the game loop (e.g. we can compute physics every OTHER pass and it won't be as 'accurate mathematically' but you may not notice a difference visually). Since the engine was written in LUA I assume that's why the big percentage is noticed with the profiler. Not sure how NS2 handles multithreading, so my examples could be vastly incorrect.

<a href="http://www.tilander.org/aurora/comp/gdc2009_Tilander_Filippov_SPU.pdf" target="_blank">God Of War III Game Loop</a>

DC_Darkling · January 2013

<div class='quotetop'>QUOTE (xDragon @ Jan 2 2013, 10:40 PM) <a href="index.php?act=findpost&pid=2054924"><{POST_SNAPBACK}></a></div><div class='quotemain'>The physX functions that they use run on the CPU only anyways, there is no GPU acceleration used by NS2 afaik. From what I remember they pretty much soley use physX for collisions, as it was faster than Havok.</div>
thankfully that should be incorrect, as its the physx drivers which deside that, not NS2 itself. ^^

Dghelneshi · January 2013

@Sloe:
The thing is that this is not a sequential loop. Sound processing, rendering, lua game logic, etc. all run in parallel as far as possible.
The profiler very clearly displays multiple threads and one of them has an execution time just slightly short of the frame time. Hence, this is the thread that holds up the whole thing and this is what needs to be optimized in order for the whole frame to be delivered faster. The profiler also clearly displays the occlusion culling happening in another thread with less than half of the execution time in total. Thus I conclude that occlusion culling is definitely not holding up anything. If the first thread had to wait for the second one to finish, it wouldn't have an execution time so close to the total frame time.

This is all assuming multi-threading works properly on the provided hardware (i.e. there are at least 2 full cores available to the NS2 process (not 1 module / 2 virtual cores), 3+ to be safe).

@DC_Darkling:
<div class='quotetop'>QUOTE (Max @ Jan 5 2011, 07:51 PM) <a href="index.php?act=findpost&pid=1821768"><{POST_SNAPBACK}></a></div><div class='quotemain'>Natural Selection 2 is configured to use PhysX in software mode regardless of what type of hardware you have. I'm not married to PhysX, though for the amount of simulation we're currently doing it's really not going to make a difference what physics engine we use.</div>

DC_Darkling · January 2013

<div class='quotetop'>QUOTE (Dghelneshi @ Jan 2 2013, 11:49 PM) <a href="index.php?act=findpost&pid=2054955"><{POST_SNAPBACK}></a></div><div class='quotemain'>@Sloe:
The thing is that this is not a sequential loop. Sound processing, rendering, lua game logic, etc. all run in parallel as far as possible.
The profiler very clearly displays multiple threads and one of them has an execution time just slightly short of the frame time. Hence, this is the thread that holds up the whole thing and this is what needs to be optimized in order for the whole frame to be delivered faster. The profiler also clearly displays the occlusion culling happening in another thread with less than half of the execution time in total. Thus I conclude that occlusion culling is definitely not holding up anything. If the first thread had to wait for the second one to finish, it wouldn't have an execution time so close to the total frame time.

This is all assuming multi-threading works properly on the provided hardware (i.e. there are at least 2 full cores available to the NS2 process (not 1 module / 2 virtual cores), 3+ to be safe).

@DC_Darkling:</div>

odd.. that conflicts directly with documentation which I read, even recently.. But well tag a point on your side atm. ;)

Dghelneshi · January 2013

<div class='quotetop'>QUOTE (DC_Darkling @ Jan 2 2013, 11:54 PM) <a href="index.php?act=findpost&pid=2054961"><{POST_SNAPBACK}></a></div><div class='quotemain'>odd.. that conflicts directly with documentation which I read, even recently.. But well tag a point on your side atm. ;)</div>
It's possible that it has changed since then, but I haven't heard anything about hardware accelerated physics for NS2 and I think I recall some threads with people complaining that PhysX is "not working properly" because it doesn't select your GPU, so I assume it is still true.

soccerguy243 · January 2013

idk if it will help you guys figure this out but i ran NS2 at a "high priority" and it gave me an FPS boost. Idk what high priority means but it helped me get more FPS...

more data for you guys to think about?

Soylent_green · January 2013

<div class='quotetop'>QUOTE (sloe @ Jan 2 2013, 04:58 PM) <a href="index.php?act=findpost&pid=2054936"><{POST_SNAPBACK}></a></div><div class='quotemain'>Each time it loops one frame is rendered to the screen. What he means is if you throttle the culling down to say 10 times per second (meaning the culling does not have to be processed every iteration of the loop) it could improve performance.</div>

What you are proposing is disastrous and would cause monstrous pop-ins.

Davil · January 2013

<div class='quotetop'>QUOTE (soccerguy243 @ Jan 2 2013, 04:53 PM) <a href="index.php?act=findpost&pid=2055005"><{POST_SNAPBACK}></a></div><div class='quotemain'>idk if it will help you guys figure this out but i ran NS2 at a "high priority" and it gave me an FPS boost. Idk what high priority means but it helped me get more FPS...

more data for you guys to think about?</div>

This means that the processor will work on those threads above all others. Sometimes this works, sometimes it can cause a bad crash.

Soylent_green · January 2013

<div class='quotetop'>QUOTE (DC_Darkling @ Jan 2 2013, 05:34 PM) <a href="index.php?act=findpost&pid=2054948"><{POST_SNAPBACK}></a></div><div class='quotemain'>thankfully that should be incorrect, as its the physx drivers which deside that, not NS2 itself. ^^</div>

AFAIK this is true in every game that uses PhysX. You do not want to wait for a round trip to the GPU and back to find out if a bullet hit something or if a player is colliding with something; so you run the gameplay relevant part of physics(which is nearly all NS2 has!) on the CPU and you run water, cloth, debris, ragdolls etc. on the GPU.

Davil · January 2013

Are people suggesting adapting the game to use physx? Doesn't do much for the AMD card users out there... Which unfortunately there are a lot of.

Kama_Blue · January 2013

<div class='quotetop'>QUOTE (sloe @ Jan 2 2013, 01:58 PM) <a href="index.php?act=findpost&pid=2054936"><{POST_SNAPBACK}></a></div><div class='quotemain'>Do
do_input <- cpu (handle keyboard/mouse state)
do_ai <- cpu (compute NPC actions)
do_physics <- cpu or gpu (compute object collision/direction/velocity)
do_scene <- cpu (culling/translate 3d space)
do_render <- gpu (send scene to video hardware/draw the 3d scene as a 2d frame/post-process shaders)
do_sounds <- cpu (play sound effects)
Loop

This "typical game loop" executes as fast as the processor will allow. Each time it loops one frame is rendered to the screen. What he means is if you throttle the culling down to say 10 times per second (meaning the culling does not have to be processed every iteration of the loop) it could improve performance. It's not always necessary to compute every piece of the game logic during every pass of the game loop (e.g. we can compute physics every OTHER pass and it won't be as 'accurate mathematically' but you may not notice a difference visually). Since the engine was written in LUA I assume that's why the big percentage is noticed with the profiler. Not sure how NS2 handles multithreading, so my examples could be vastly incorrect.

<a href="http://www.tilander.org/aurora/comp/gdc2009_Tilander_Filippov_SPU.pdf" target="_blank">God Of War III Game Loop</a></div>

This could be a vast improvement if we look at it. The current game loop has scripts running far more than 10 times a second, and many of them like AI, physics, and culling really do not need to be handled at that speed.

It also seems as if NS2 players are each simulating physics fully, as opposed to the server.

DC_Darkling · January 2013

<div class='quotetop'>QUOTE (Davil @ Jan 3 2013, 03:01 AM) <a href="index.php?act=findpost&pid=2055036"><{POST_SNAPBACK}></a></div><div class='quotemain'>Are people suggesting adapting the game to use physx? Doesn't do much for the AMD card users out there... Which unfortunately there are a lot of.</div>

No, just some confusion about the physx already IN the game.

>edit.
also with some dedicated searching on the big net, I did understand most of the things Soylent_green mentioned are infact ran on CPU and only, often very gorgeous, flavor physx are on the gpu.
But enabling that on a non nvidia card in a physx heavy game still is torture to the cpu haha.
But thats not ns2.<

sloe · January 2013

<div class='quotetop'>QUOTE (Soylent_green @ Jan 2 2013, 07:58 PM) <a href="index.php?act=findpost&pid=2055010"><{POST_SNAPBACK}></a></div><div class='quotemain'>What you are proposing is disastrous and would cause monstrous pop-ins.</div>

Likely :) I think that's what the OP said too... but we won't know until tested.

I wonder if the NS2 culling implementation (CHC++ occlusion culling) is faster/better/more dynamic than view frustum culling using bsp/quadtrees/octtrees. Again the only way to know for sure would be to compare notes with other engines and implement/test.

For example, compare these two real scenarios:

A) Use the CPU to do a highly accurate occlusion culling (lots of time) and finally send the very small set of geometry/vertices to the GPU, reducing work at the GPU. This is a tradeoff that uses the CPU to reduce load on the GPU.

B) Use the CPU to do a faster but less accurate culling, resulting in some geometry being sent to the GPU that isn't actually visible on screen (wasteful, brute force rendering). The GPU can still do some culling of it's own in the hardware using zbuffer sort techniques. This is a tradeoff that uses the GPU to reduce load on the CPU.

Essentially we are sharing the workload differently and making time sacrifices to do so. Performance still depends on a lot of factors that are interrelated which could be analyzed further with profiling or throttling.

<a href="http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter06.html" target="_blank">Check out this GPU Gems article which outlines how to improve performance of occlusion culling using octtrees.</a>

Or a video of the <a href="http://www.youtube.com/watch?v=fU2L7IdYDVc" target="_blank">Umbra's Occlusion Culling Middleware</a> which shows their occlusion working in real-time for several games and without pre-processing. It seems like NS2 may be similar.

Regarding PhysX, can't you force that to the GPU in your video driver settings? I know the only way I could disable Bloom was by moving the slider to High Performance in the driver settings. The individual game settings for ns2.exe and even the in-game Bloom setting would NOT disable it (Nvidia).

DC_Darkling · January 2013

a quick wondering... do the cysts connect to every other cyst on the map / the hive (from a technical point, not gameplay), or to the nearest structure?

Because if your cysts go from hive to structure, and from there to structure 2.. structure 2 doesnt need to know anything about the hive or its surrounding cysts. It just needs to be sure its closest structure is alive. (and if its not, is there another one next in line which is alive)
recalculate the path when something in it dies or gets added, and save the result till the next change in memory.

perhaps im just talking gibberish here, but worth a gamble.

sloe · January 2013

<div class='quotetop'>QUOTE (Kama_Blue @ Jan 3 2013, 09:39 AM) <a href="index.php?act=findpost&pid=2055331"><{POST_SNAPBACK}></a></div><div class='quotemain'>This could be a vast improvement if we look at it. The current game loop has scripts running far more than 10 times a second, and many of them like AI, physics, and culling really do not need to be handled at that speed.

It also seems as if NS2 players are each simulating physics fully, as opposed to the server.</div>

I would be surprised if they aren't already throttling some of the logic in the game loop as it is (or was) fairly common to see this technique in an RTS/FPS engine.

I think we want to simulate physics at both server/client. Server maintains the full accuracy of the physics for the entire game but does not send this accuracy to the clients every x milliseconds. Instead, the clients also simulate physics locally which are then corrected by the server every xx milliseconds. This makes for much smoother play and introduces familiar terms such as interpolation/extrapolation, and rubberbanding (highly delayed corrections) when performance is low overall.

Soylent_green · January 2013

<div class='quotetop'>QUOTE (sloe @ Jan 3 2013, 01:29 PM) <a href="index.php?act=findpost&pid=2055432"><{POST_SNAPBACK}></a></div><div class='quotemain'>Likely :) I think that's what the OP said too... but we won't know until tested</div>

No, we do know. If you peek around a corner it will take up to 100 ms(next occlusion update) until players, props and other objects suddenly pop into existance.

Culling too little is just a performance problem, culling too much results in hideous graphical artifacts. I remember an anti-cheat server plugin for HL mods that would not send player positions of players that could not see each other on the server, improving upon the lax PVIS system used for this by default. Regardless of having latency <10 ms and interp set to 0.05 it's incredibly distracting. And this was just players, not models.

Soylent_green · January 2013

<div class='quotetop'>QUOTE (sloe @ Jan 3 2013, 01:39 PM) <a href="index.php?act=findpost&pid=2055438"><{POST_SNAPBACK}></a></div><div class='quotemain'>I would be surprised if they aren't already throttling some of the logic in the game loop as it is (or was) fairly common to see this technique in an RTS/FPS engine.

I think we want to simulate physics at both server/client.</div>

Ragdolls and some other such effects are client-side only since they don't hinder player movement and it doesn't matter if they behave slightly differently on every player's screen.

The physical effects that are important are player movement, bullets etc. Those are already interpolated/extrapolated or in the case that the client is causing them, predicted by the client.

Notes on Fps/CPU correlation

Comments