267 : Physics Multithreading
matso
Master of Patches Join Date: 2002-11-05 Member: 7000Members, Forum Moderators, NS2 Developer, Constellation, NS2 Playtester, Squad Five Blue, Squad Five Silver, Squad Five Gold, Reinforced - Shadow, NS2 Community Developer
One very important part of the NS2 client is showing up the various models going through their animations. Internally, the model consists of bones to which the skin of the model (with its textures) are attached. The animation then controls how the bones move in relation to each other. When you move the bones, the skin (and hitboxes) follow.
Animations needs be blended together (ie, running forward while aiming the rifle requires blending several animations together), and there is quite a few bones to keep track of, so figuring just how the left little pinkie should be curved takes a fair bit of processing power.
Lategame, when the number of entities goes up, this tends to become a bottleneck when it comes time to produce new frames. Upto 30-40% of the frametime can be spent setting up the physics.
In 266, this is all controlled by Lua code, which goes through all the models one by one, collects the animation data, asks the engine to calculate the new settings and then finally applying the result to the collision world before continuing with the next model.
Adding multithreading is kinda difficult, because the Lua VM is strictly single-threaded. Also, the results of the physics calculations needs to be stored in the collision world, which is also single-threaded. However, the middle part - where the little toes and pinkies gets sorted out - is where most of the time is spent. This opens up for applying a bit of pipelining.
Pipelining basically means that instead of having one thread doing
we instead do
So we need to do the first and last part in a single-threaded fashion, but we can do all the core heavy lifting in the middle part, where we have 1-3 threads all working to keep things going as fast as possible (the balance between the first thread producing data for the rest of the threads to work with means that more than 3 isn't really going to help any).
Now, the engine is not really designed to be used this way, but its been playtested for about 6 weeks now and after a few initial (and fixed) bugs, it seems to be pretty stable (ie, no known bugs).
However, it is OFF by default, because ...well, shit happens. And multhreaded shit happens to be especially stinky. So its off until we feel confident it won't cause too many sticky problems. Yes, we want victi^H^H^H^H volunteers to try it out... if it crashes too much, it stays off by default ( and its off for linux users, because it causes the linux client to crash ).
That said, the effect can be quite dramatic for your worst case FPS. 20% better worst-case FPS seems to be pretty standard for quad-cores. Even dual-cores gain a bit, though not as much.
tl;dr - Multithreading will increase your worst-case FPS by 20% (for quad core+) _OR_ possibly crash your machine. Off by default, turn on at your peril (yea, as if that's going to stop anyone... :-) )
Animations needs be blended together (ie, running forward while aiming the rifle requires blending several animations together), and there is quite a few bones to keep track of, so figuring just how the left little pinkie should be curved takes a fair bit of processing power.
Lategame, when the number of entities goes up, this tends to become a bottleneck when it comes time to produce new frames. Upto 30-40% of the frametime can be spent setting up the physics.
In 266, this is all controlled by Lua code, which goes through all the models one by one, collects the animation data, asks the engine to calculate the new settings and then finally applying the result to the collision world before continuing with the next model.
Adding multithreading is kinda difficult, because the Lua VM is strictly single-threaded. Also, the results of the physics calculations needs to be stored in the collision world, which is also single-threaded. However, the middle part - where the little toes and pinkies gets sorted out - is where most of the time is spent. This opens up for applying a bit of pipelining.
Pipelining basically means that instead of having one thread doing
for every entity do A (collect data), B (compute) C (apply them)
we instead do
Thread 1: [ Pipeline stage 1 ] for every entity do A (collect data) Store data in engine for use by the next stage Thread 2-4: [ Pipeline stage 2 ] Wait for data to be stored in engine: B (compute) Store result for use by the next stage Thread 5: [ Pipeline stage 3 ] Wait for data: C (Apply it).
So we need to do the first and last part in a single-threaded fashion, but we can do all the core heavy lifting in the middle part, where we have 1-3 threads all working to keep things going as fast as possible (the balance between the first thread producing data for the rest of the threads to work with means that more than 3 isn't really going to help any).
Now, the engine is not really designed to be used this way, but its been playtested for about 6 weeks now and after a few initial (and fixed) bugs, it seems to be pretty stable (ie, no known bugs).
However, it is OFF by default, because ...well, shit happens. And multhreaded shit happens to be especially stinky. So its off until we feel confident it won't cause too many sticky problems. Yes, we want victi^H^H^H^H volunteers to try it out... if it crashes too much, it stays off by default ( and its off for linux users, because it causes the linux client to crash ).
That said, the effect can be quite dramatic for your worst case FPS. 20% better worst-case FPS seems to be pretty standard for quad-cores. Even dual-cores gain a bit, though not as much.
tl;dr - Multithreading will increase your worst-case FPS by 20% (for quad core+) _OR_ possibly crash your machine. Off by default, turn on at your peril (yea, as if that's going to stop anyone... :-) )
Comments
I shall also enjoy spamming the community devs with any crash logs it shall produce.
The server updates the physics on an as-needed basis only. It's actually very cool; all models have a maximum extents box, which fits around all of model, no matter what animation (yea, the largest box isn't the hive - it's the whip due to its 7m ranged slap attack).
All active models place their big boxes in a sort of pre-collision world. Whenever you check for collision (when you move the player or shot a gun or slash ...), you first trace through this pre-collision world.
Any models whose big boxes are touched, gets to have their physics updated to the correct time. THEN you do a collision trace through the detailed world, knowing that the models are all correctly configured, claws curled just like that, perky tail in _just_ the right position.
The reason the client can't do the same is that you are implicitly tracing through your whole view volume... everything has to be rendered anyhow.
Before you say "render only in view volume" - sorry, the view volume which is going to be rendered is not known yet - we set the physics for what will be rendered the NEXT frame (multithreaded rendering - we are calculating the physics for the next frame while we use last frames physics to render the current frame).
And we don't know where the player will be looking next frame. High sensitivity mouse can easily turn you 180 degrees in one frame. Well, not easily, but...
Question, is the PhysX sdk in NS2 (post v3.1 of the Nvidia SDK I believe) one of the later cores which also allows for the physX calculations to be further multi-threaded? I remember this was a big deal a while back when they moved from x87.
The game already uses PhysX, in fac there was an "attempted " GPu-PhysX implemetnation that never seemed to work. IMO, using the non-GPU multiäthreaded physX is better for all users than attempting to use the proprietary GPU libraries.
Sooooooon.
Don't write checks that your body can't cash.
Performance boost
Smoother/more fluid frames in this and more in next updates
unlocked server rates
it's the ns2 I've always dreamed of
The faster you update to the server, the less the server has to remember stuff from ages ago?
Maybe some of those overclocked 8GHz servers can go down a bit? :P
>20%
if it were to crash, what would be the possible reasons? race conditions? deadlocks? so much awesomeness that it breaks the laws of physics?
That has all been fixed, but that is the type of problems you would look out for. Animations etc that are not updated correctly or badly placed objects. Crashes where not an issue.
The client guy should be able to enable/disable it properly (procedure as the dev want) and at will. So nobody get stuck. You should produce some documentation for these vic.. very important testers.
This means clients should be granted a FPS analyzer/profiler (better than what we have, saving datas, upload etc...). So it can help find bugs, and other optimization. It save so much time in debugging.
Another thing would be evaluate the impact on other stuff it produce. Netcode behavior etc...
Why do i say that ??... no clue ?
We allready have such a analyser which will get improved in various ways with 267 as it was used alot to track down those hitching issue.
The analyzer is hidden in your ns2 install folder under utils/PerfAnalyzer. You need to install python to run it.
Ingame console commands to start logging are p_log (basic logging) or p_logall (advandced logging). Logs ( .plog file format ) will be saved into the ns2 config folder (for windows %appdata%\Natural Selection 2\ )
BTW: This works also for the server.
1: Python Console ( for text output )
2. Analyser Main Gui
3. Frame Detail Window
4. Frame Pick Window
The graphs show how long each frame needed to get rendered. You might have seen those at the cdt twitter lately
You open 3 and 4 by clicking at the graph in the main gui.
digital vibrancy strikes me again (moans in agony)
Yes please enable that we get more stutter and fps fluctuations, great. Like Borderlands 2 is Stutterlands 2 with phyx on. Gimmick feature
Things are looking good...not long now!
We have a multi-thread option we can enable (but it's disabled by default) and we have a new vram option? and if we have 2gb of VRAM we should set it to 1.5GB or is it default to that. That's what I got out of all the posts so far