224 tech changes, part 1
matso
Master of Patches Join Date: 2002-11-05 Member: 7000Members, Forum Moderators, NS2 Developer, Constellation, NS2 Playtester, Squad Five Blue, Squad Five Silver, Squad Five Gold, Reinforced - Shadow, NS2 Community Developer
<div class="IPBDescription">"You don't have to be insane. But it helps".</div>About 3 weeks ago, I had finished my last batch of performance improvements and was scanning through the latest playtest performance logs, looking for something to improve. And it was all dross - 0.5% here, another possible 0.3 percent there - so I was looking at spending days at the 0.5% improvement level, twiddling with minor tweaks here and there.
Boring.
So I decided to go insane instead.
Now, that sounds a bit worse than what it actually means - it simply means picking something from my list of "stuff that would be insane to do before 1.0 release". Insane because they would introduce new architectural concepts in the engine, so its hard to figure out just how much it would destabilize everything.
However, there was this thing about movement prediction on the client that had been itching at the back of my head for a long time.
Some background info here ... the Spark Engine samples input before rendering every frame, generating a Move data structure (ie, a "move"). It adds that move to the list of moves-not-yet-part-of-the-latest-server-update, then resets the world back to the latest server update and executes all the moves, using the final state of the world render from.
Each move is quite costly to run, at about 0.5-0.7ms or so, and the length of the queue grows with effective server latency. Typically, you have maybe 100 ms net lag and 100ms interpolation lag for an effective lag of about 200ms. At 50 fps, you are looking at running a minimum of 10 prediction frames every fps (this is the "Prediction" line at the bottom of the net_stats display). If you wanted to run at 100fps, you would need to run 20 prediction frames instead - every frame. Yea, that would be 10-15ms every 10 ms. Kinda hard to do.
And that's the reason why fps goes down with latency. And why fps goes down when the server drops below 20 ticks per second - the queue gets longer. And why its so hard to increase fps on the client - faster fps means you need to predict more moves, more times.
Now, the client doesn't strictly need to do it this way - it could just take the world it has already predicted to the previous frame, add only the latest move to the world and use it to render right away. Unfortunately, 20 times per second the server sends a new update, and you would need to run all the moves from that in order to get in sync - which would cause 20 frames every second to be MUCH longer than all the other frames, resulting in some really hitchy experience. Not a good thing.
As to why the Spark Engine runs this way? Well, to quote Max: "It was not supposed to be that slow". In other words, other engines avoid similar problems by running moves really fast in hardcoded C++. In Spark, its run in Lua, allowing awesome flexibility (skulk wallwalking, jetpacks, sprinting, lerk flying - they are all coded in Lua) - at greater cost than was foreseen when the choice was made.
ENOUGH BACKGROUND .... back to the insanity.
The idea is actually quite simple. Instead of delivering a raw server snapshot to the main thread which it then has to run all the moves on, why not deliver an already predicted world to the main thread? Ie, give the snapshot to another Lua VM with almost identical code to the Client world, have it run all the moves in its own thread, and only deliver an updated world to the main thread.
That allows the main thread to just keep adding moves to its world, and now and then whenever the Prediction thread is finished preparing the server snapshot, it can just swap its state with it, at pretty much zero cost.
Nice idea. And faced with < 0.5%/day improvements, I figured I might as well give it a try - if I spent 3 days on it and it turned out to not work, three days less wasn't going to make much difference to performance anyhow.
After an intense three day hacking, the prototype was finished and worked beyond expectation. Depending on latency and how built-up the area was, the FPS increase was 30-50% extra. Some minor bugs here and there, but it was good enough to present it to UWE. When I pitched it to Brian C, I could sense his "Are this guy insane? Introducing multithreading and multiple Lua VM's less than a month before release?" - but after testing it and tasting the FPS increase, it was pretty much ... "Yea, we have to do this".
This was Monday of the 223 release. Right after the 223 release, UWE switched to iron out the bugs and unforeseen weirdness to be expected when doing something like that. It went pretty well, all things considered, and the new version was build and presented to the playtesters the following Monday.
At which time the ###### hit the server fan.
To be continued in part 2.
Boring.
So I decided to go insane instead.
Now, that sounds a bit worse than what it actually means - it simply means picking something from my list of "stuff that would be insane to do before 1.0 release". Insane because they would introduce new architectural concepts in the engine, so its hard to figure out just how much it would destabilize everything.
However, there was this thing about movement prediction on the client that had been itching at the back of my head for a long time.
Some background info here ... the Spark Engine samples input before rendering every frame, generating a Move data structure (ie, a "move"). It adds that move to the list of moves-not-yet-part-of-the-latest-server-update, then resets the world back to the latest server update and executes all the moves, using the final state of the world render from.
Each move is quite costly to run, at about 0.5-0.7ms or so, and the length of the queue grows with effective server latency. Typically, you have maybe 100 ms net lag and 100ms interpolation lag for an effective lag of about 200ms. At 50 fps, you are looking at running a minimum of 10 prediction frames every fps (this is the "Prediction" line at the bottom of the net_stats display). If you wanted to run at 100fps, you would need to run 20 prediction frames instead - every frame. Yea, that would be 10-15ms every 10 ms. Kinda hard to do.
And that's the reason why fps goes down with latency. And why fps goes down when the server drops below 20 ticks per second - the queue gets longer. And why its so hard to increase fps on the client - faster fps means you need to predict more moves, more times.
Now, the client doesn't strictly need to do it this way - it could just take the world it has already predicted to the previous frame, add only the latest move to the world and use it to render right away. Unfortunately, 20 times per second the server sends a new update, and you would need to run all the moves from that in order to get in sync - which would cause 20 frames every second to be MUCH longer than all the other frames, resulting in some really hitchy experience. Not a good thing.
As to why the Spark Engine runs this way? Well, to quote Max: "It was not supposed to be that slow". In other words, other engines avoid similar problems by running moves really fast in hardcoded C++. In Spark, its run in Lua, allowing awesome flexibility (skulk wallwalking, jetpacks, sprinting, lerk flying - they are all coded in Lua) - at greater cost than was foreseen when the choice was made.
ENOUGH BACKGROUND .... back to the insanity.
The idea is actually quite simple. Instead of delivering a raw server snapshot to the main thread which it then has to run all the moves on, why not deliver an already predicted world to the main thread? Ie, give the snapshot to another Lua VM with almost identical code to the Client world, have it run all the moves in its own thread, and only deliver an updated world to the main thread.
That allows the main thread to just keep adding moves to its world, and now and then whenever the Prediction thread is finished preparing the server snapshot, it can just swap its state with it, at pretty much zero cost.
Nice idea. And faced with < 0.5%/day improvements, I figured I might as well give it a try - if I spent 3 days on it and it turned out to not work, three days less wasn't going to make much difference to performance anyhow.
After an intense three day hacking, the prototype was finished and worked beyond expectation. Depending on latency and how built-up the area was, the FPS increase was 30-50% extra. Some minor bugs here and there, but it was good enough to present it to UWE. When I pitched it to Brian C, I could sense his "Are this guy insane? Introducing multithreading and multiple Lua VM's less than a month before release?" - but after testing it and tasting the FPS increase, it was pretty much ... "Yea, we have to do this".
This was Monday of the 223 release. Right after the 223 release, UWE switched to iron out the bugs and unforeseen weirdness to be expected when doing something like that. It went pretty well, all things considered, and the new version was build and presented to the playtesters the following Monday.
At which time the ###### hit the server fan.
To be continued in part 2.
Comments
This was pretty much my biggest hurdle with the NS2 architecture for a long time because it mad client optimizations to hurt other parts of the system, especially on servers.
I haven't been able to try out the new patch yet but it sounds awesome.
Great work man!
Can't wait to play this build now. So excited. :D
€dit: While I'm certain that the statement from Toothy must be sarcasm, I want to warn before the real trolls arrive: Don't feed them! Devs should never answer to posts containing bad manners or insults. Things like that only encourage the community to cry louder to get a dev-response.
€dit2: Oh and would you mind to tell more about the background of the rubber-banding problem?
So if I understood correctly, you decreased the computational complexity (total number of calculation) and improved parallelization at the same time ?
I was already curious of the changes, as the server console stated that a server VM had been started.
LOL. Max must have been ashamed when you asked it. Almost only flaw(of course animation also.), but so big of his own-made-engine.
Thank you, matso!
<!--quoteo(post=1996755:date=Oct 25 2012, 12:56 PM:name=Toothy)--><div class='quotetop'>QUOTE (Toothy @ Oct 25 2012, 12:56 PM) <a href="index.php?act=findpost&pid=1996755"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->But why haven't you added bullet holes?<!--QuoteEnd--></div><!--QuoteEEnd-->
Because nanites repair bullet holes before they happen.
Nice to hear about what you've been working on and I look forward to seeing if it's worked on my specific PC! Even it it hasn't looks to be a good improvement for others, which is good. Well done! :)
Great work Matso, you just need to ask Max where his LUA JIT VM is up to now...
<!--quoteo(post=1996749:date=Oct 25 2012, 11:43 AM:name=matso)--><div class='quotetop'>QUOTE (matso @ Oct 25 2012, 11:43 AM) <a href="index.php?act=findpost&pid=1996749"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Some background info here ... the Spark Engine samples input before rendering every frame, generating a Move data structure (ie, a "move"). It adds that move to the list of moves-not-yet-part-of-the-latest-server-update, then resets the world back to the latest server update and executes all the moves, using the final state of the world render from.<!--QuoteEnd--></div><!--QuoteEEnd-->
But why would anyone design a net game state system like this? Wouldn't it just be simpler and faster to have both the server and client running state update simulations/predictions, like the server frame updates synchronizes the client-server states and the S/C just keep on simulating/predicting the game sending synchronize data back and forth, instead of keeping a list of updates-not-part-of-the-latest-server-update and just predicting what will happen over and over again? If both the server and client has access to the same game state data (which it has), there would be no additional downsides, coming with the improvements. Why would it according to you be a "hitchy experience" when that way of doing things can be implemented in a very nice and well-working way?
Doesn't the Source engine work like that?
Seriously. Who cares about this mumbo jumbo?
Bullet. Holes.
<a href="https://twitter.com/NS2/status/253188640656719872" target="_blank">https://twitter.com/NS2/status/253188640656719872</a> :D
you know how this word is on high stakes lately
The source engine and every single multiplayer FPS game since quakeworld works the same as NS2 in this regard.
EDIT: <a href="http://gafferongames.com/networking-for-game-programmers/what-every-programmer-needs-to-know-about-game-networking/" target="_blank">here</a> is a good explanation
Was the client framerate increase from splitting the client prediction into a parallel thread what prompted the move command rate change?
Or did the that optimization come first?
Bullet. Holes.<!--QuoteEnd--></div><!--QuoteEEnd-->
Toothy <a href="http://www.unknownworlds.com/ns2/forums/index.php?showtopic=122176&st=0&p=1996638&#entry1996638" target="_blank">is making fun of somone</a> I suspect :D
Or is it that the client updates are coming in fast enough that your prediction's drift is negligible?