224 tech changes, part 1

matso · October 2012

<div class="IPBDescription">"You don't have to be insane. But it helps".</div>About 3 weeks ago, I had finished my last batch of performance improvements and was scanning through the latest playtest performance logs, looking for something to improve. And it was all dross - 0.5% here, another possible 0.3 percent there - so I was looking at spending days at the 0.5% improvement level, twiddling with minor tweaks here and there.

Boring.

So I decided to go insane instead.

Now, that sounds a bit worse than what it actually means - it simply means picking something from my list of "stuff that would be insane to do before 1.0 release". Insane because they would introduce new architectural concepts in the engine, so its hard to figure out just how much it would destabilize everything.

However, there was this thing about movement prediction on the client that had been itching at the back of my head for a long time.

Some background info here ... the Spark Engine samples input before rendering every frame, generating a Move data structure (ie, a "move"). It adds that move to the list of moves-not-yet-part-of-the-latest-server-update, then resets the world back to the latest server update and executes all the moves, using the final state of the world render from.

Each move is quite costly to run, at about 0.5-0.7ms or so, and the length of the queue grows with effective server latency. Typically, you have maybe 100 ms net lag and 100ms interpolation lag for an effective lag of about 200ms. At 50 fps, you are looking at running a minimum of 10 prediction frames every fps (this is the "Prediction" line at the bottom of the net_stats display). If you wanted to run at 100fps, you would need to run 20 prediction frames instead - every frame. Yea, that would be 10-15ms every 10 ms. Kinda hard to do.

And that's the reason why fps goes down with latency. And why fps goes down when the server drops below 20 ticks per second - the queue gets longer. And why its so hard to increase fps on the client - faster fps means you need to predict more moves, more times.

Now, the client doesn't strictly need to do it this way - it could just take the world it has already predicted to the previous frame, add only the latest move to the world and use it to render right away. Unfortunately, 20 times per second the server sends a new update, and you would need to run all the moves from that in order to get in sync - which would cause 20 frames every second to be MUCH longer than all the other frames, resulting in some really hitchy experience. Not a good thing.

As to why the Spark Engine runs this way? Well, to quote Max: "It was not supposed to be that slow". In other words, other engines avoid similar problems by running moves really fast in hardcoded C++. In Spark, its run in Lua, allowing awesome flexibility (skulk wallwalking, jetpacks, sprinting, lerk flying - they are all coded in Lua) - at greater cost than was foreseen when the choice was made.

ENOUGH BACKGROUND .... back to the insanity.

The idea is actually quite simple. Instead of delivering a raw server snapshot to the main thread which it then has to run all the moves on, why not deliver an already predicted world to the main thread? Ie, give the snapshot to another Lua VM with almost identical code to the Client world, have it run all the moves in its own thread, and only deliver an updated world to the main thread.

That allows the main thread to just keep adding moves to its world, and now and then whenever the Prediction thread is finished preparing the server snapshot, it can just swap its state with it, at pretty much zero cost.

Nice idea. And faced with < 0.5%/day improvements, I figured I might as well give it a try - if I spent 3 days on it and it turned out to not work, three days less wasn't going to make much difference to performance anyhow.

After an intense three day hacking, the prototype was finished and worked beyond expectation. Depending on latency and how built-up the area was, the FPS increase was 30-50% extra. Some minor bugs here and there, but it was good enough to present it to UWE. When I pitched it to Brian C, I could sense his "Are this guy insane? Introducing multithreading and multiple Lua VM's less than a month before release?" - but after testing it and tasting the FPS increase, it was pretty much ... "Yea, we have to do this".

This was Monday of the 223 release. Right after the 223 release, UWE switched to iron out the bugs and unforeseen weirdness to be expected when doing something like that. It went pretty well, all things considered, and the new version was build and presented to the playtesters the following Monday.

At which time the ###### hit the server fan.

To be continued in part 2.

Zeikko · October 2012

Great job Matso!

This was pretty much my biggest hurdle with the NS2 architecture for a long time because it mad client optimizations to hurt other parts of the system, especially on servers.

I haven't been able to try out the new patch yet but it sounds awesome.

Great work man!

Toothy · October 2012

But why haven't you added bullet holes?

_Necro_ · October 2012

Damn! This is so awesome! Great work matso! And thanks for the heads up. I really appreciate such posts. I can imagine how much work it had to be.

Can't wait to play this build now. So excited. :D

â‚¬dit: While I'm certain that the statement from Toothy must be sarcasm, I want to warn before the real trolls arrive: Don't feed them! Devs should never answer to posts containing bad manners or insults. Things like that only encourage the community to cry louder to get a dev-response.

â‚¬dit2: Oh and would you mind to tell more about the background of the rubber-banding problem?

Yuuki · October 2012

Sounds great!

So if I understood correctly, you decreased the computational complexity (total number of calculation) and improved parallelization at the same time ?

twiliteblue · October 2012

I love you matso, and everyone at UWE! My minimum FPS leaped from 20 to 30! I think everyone should be excited about build 224!

falc · October 2012

Ok, now you got me that i <b>have to</b> try the new build.

I was already curious of the changes, as the server console stated that a server VM had been started.

YoungTrotsky · October 2012

I only understood about 10% of that but thank you very much for figuring it all out, you sound like some sort of mega-power-super genius and I am glad people like you are out there making it easier for people like me to <strike>procrastinate</strike> revel in the glory of computer games!

puzl · October 2012

The suspense is killing me.. what hit the server fan?

Wilson · October 2012

Awesome work and a really interesting read.

NurEinMensch · October 2012

/subscribe

G1R · October 2012

Any comment on future optimization? What can we hope for?

tk-421 · October 2012

Super-interesting. Thanks for sharing, can't wait for part 2.

Squirreli_ · October 2012

What a cliffhanger... You writing a soap opera or something? Give us the info already, I am holding my breath here ;)

wiry · October 2012

Thanks, very interesting read.

MisterYoon · October 2012

<div class='quotetop'>QUOTE (matso @ Oct 25 2012, 02:43 AM) <a href="index.php?act=findpost&pid=1996749"><{POST_SNAPBACK}></a></div><div class='quotemain'>As to why the Spark Engine runs this way? Well, to quote Max: "It was not supposed to be that slow".</div>

LOL. Max must have been ashamed when you asked it. Almost only flaw(of course animation also.), but so big of his own-made-engine.

Onii-chan · October 2012

That's some great stuff.
Thank you, matso!

<div class='quotetop'>QUOTE (Toothy @ Oct 25 2012, 12:56 PM) <a href="index.php?act=findpost&pid=1996755"><{POST_SNAPBACK}></a></div><div class='quotemain'>But why haven't you added bullet holes?</div>

Because nanites repair bullet holes before they happen.

carlgm · October 2012

But...but... you CANT LEAVE IT THERE!? :(
Nice to hear about what you've been working on and I look forward to seeing if it's worked on my specific PC! Even it it hasn't looks to be a good improvement for others, which is good. Well done! :)

phoenixbbs · October 2012

Toothy's a playtester, and we're always ripping the back out of him, so he's not really a troll, he's a whipping boy for the rest of us :-)

Great work Matso, you just need to ask Max where his LUA JIT VM is up to now...

phoenixbbs · October 2012

Oh, and Toothy - if you want bullet holes, stand behind a gorge when he heal sprays - I think a little bit comes out of both ends, and his "tails" stick out, revealing <b>his</b> rusty bullet hole :-)

Vitdom · October 2012

Awesome job getting this done!

<div class='quotetop'>QUOTE (matso @ Oct 25 2012, 11:43 AM) <a href="index.php?act=findpost&pid=1996749"><{POST_SNAPBACK}></a></div><div class='quotemain'>Some background info here ... the Spark Engine samples input before rendering every frame, generating a Move data structure (ie, a "move"). It adds that move to the list of moves-not-yet-part-of-the-latest-server-update, then resets the world back to the latest server update and executes all the moves, using the final state of the world render from.</div>
But why would anyone design a net game state system like this? Wouldn't it just be simpler and faster to have both the server and client running state update simulations/predictions, like the server frame updates synchronizes the client-server states and the S/C just keep on simulating/predicting the game sending synchronize data back and forth, instead of keeping a list of updates-not-part-of-the-latest-server-update and just predicting what will happen over and over again? If both the server and client has access to the same game state data (which it has), there would be no additional downsides, coming with the improvements. Why would it according to you be a "hitchy experience" when that way of doing things can be implemented in a very nice and well-working way?

Doesn't the Source engine work like that?

_Necro_ · October 2012

I don't know if I understand you. But you seem to forget that the internet does not work with 100% guaranteed packet delivery. You can never know when or if you get an update from the server or if an update gets lost. At least with UDP.

Fappuchino · October 2012

<div class='quotetop'>QUOTE (Toothy @ Oct 25 2012, 01:56 AM) <a href="index.php?act=findpost&pid=1996755"><{POST_SNAPBACK}></a></div><div class='quotemain'>But why haven't you added bullet holes?</div>
Seriously. Who cares about this mumbo jumbo?

Bullet. Holes.

carlgm · October 2012

<div class='quotetop'>QUOTE (_Necro_ @ Oct 25 2012, 09:14 AM) <a href="index.php?act=findpost&pid=1996895"><{POST_SNAPBACK}></a></div><div class='quotemain'>You can never know when or if you get an update from the server or if an update gets lost. At least with UDP.</div>
<a href="https://twitter.com/NS2/status/253188640656719872" target="_blank">https://twitter.com/NS2/status/253188640656719872</a> :D

creamsodase · October 2012

i think he meant the "merde" hit the server fan.

you know how this word is on high stakes lately

TimMc · October 2012

Interesting thread. Looking forward to next one :)

shader · October 2012

<div class='quotetop'>QUOTE (Vitdom @ Oct 25 2012, 07:58 AM) <a href="index.php?act=findpost&pid=1996886"><{POST_SNAPBACK}></a></div><div class='quotemain'>Doesn't the Source engine work like that?</div>

The source engine and every single multiplayer FPS game since quakeworld works the same as NS2 in this regard.

EDIT: <a href="http://gafferongames.com/networking-for-game-programmers/what-every-programmer-needs-to-know-about-game-networking/" target="_blank">here</a> is a good explanation

shader · October 2012

Thanks for the post, matso.

Was the client framerate increase from splitting the client prediction into a parallel thread what prompted the move command rate change?

Or did the that optimization come first?

TychoCelchuuu · October 2012

<div class='quotetop'>QUOTE (Fappuchino @ Oct 25 2012, 08:18 AM) <a href="index.php?act=findpost&pid=1996899"><{POST_SNAPBACK}></a></div><div class='quotemain'>Seriously. Who cares about this mumbo jumbo?

Bullet. Holes.</div>
Toothy <a href="http://www.unknownworlds.com/ns2/forums/index.php?showtopic=122176&st=0&p=1996638&#entry1996638" target="_blank">is making fun of somone</a> I suspect :D

countbasie · October 2012

EDIT: I gotta learn to read.

Techercizer · October 2012

Running a virtual environment server-side instead of loading in new static snapshots constantly is a pretty awesome upgrade, but I can't figure out how you're swapping states at zero cost without creating discrepancies. How do you resolve conflicts between server prediction and client information in a way that doesn't rubber-band the world into or out of some unintuitive configuration?

Or is it that the client updates are coming in fast enough that your prediction's drift is negligible?

224 tech changes, part 1

Comments