Optimiziations
Asraniel
Join Date: 2002-06-03 Member: 724Members, Playtest Lead, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow, Subnautica Playtester, Retired Community Developer
in Modding
<div class="IPBDescription">Trying to make the code faster</div>Hi there, since there are a few programmers around, i thought it would be nice to have a optimization thread.
the thing the most annoys me is the constant hitching when moving around. As we all found out thanks to the profiler, that is the lua garbage collector doings its work.
There was a spark command where you could modify the garbage collector settings, but i couldn't find it in the code anymore (well, i found the function that sets the garbage collector setting, but i don't know what the console command for it is).
So i tried a few things. First, disabling the garbagecollector. Because i didn't know where to put it, ive put the code in Player::OnProcessMove(). at the beginning i have set collectgarbage("stop");
and as expected, the hitching was gone. Now of course the game would probably crash after some time, or is there a automatic panic garbage collector in lua?
anyway, the next idea was to set it to a low value, so that a garbage collector pass would not cost that much. i tried:
collectgarbage("setpause", 1);
Again, it worked, but this cut my framerates from 50 to 35. But the hitching was not here.
At the end i tried collectgarbage("setpause", 100);
This value seems to work for me, no hitching, the framerates did not suffer. It "feels" a little bit less fast than when i disable garbage collection completely, but better than the current version.
Now if somebody knows the code better than i do, i think the idea would be to set the garbage collection interval to different values depending on what is going on. For example, one could pause the garbage collection while playing, but reactivate it when dying. The commander for example would have garbage collection all the time etc.
I think it's worth to play around with those settings. If somebody knows the console command to change the GC settings, please tell me, i will update the wiki.
the thing the most annoys me is the constant hitching when moving around. As we all found out thanks to the profiler, that is the lua garbage collector doings its work.
There was a spark command where you could modify the garbage collector settings, but i couldn't find it in the code anymore (well, i found the function that sets the garbage collector setting, but i don't know what the console command for it is).
So i tried a few things. First, disabling the garbagecollector. Because i didn't know where to put it, ive put the code in Player::OnProcessMove(). at the beginning i have set collectgarbage("stop");
and as expected, the hitching was gone. Now of course the game would probably crash after some time, or is there a automatic panic garbage collector in lua?
anyway, the next idea was to set it to a low value, so that a garbage collector pass would not cost that much. i tried:
collectgarbage("setpause", 1);
Again, it worked, but this cut my framerates from 50 to 35. But the hitching was not here.
At the end i tried collectgarbage("setpause", 100);
This value seems to work for me, no hitching, the framerates did not suffer. It "feels" a little bit less fast than when i disable garbage collection completely, but better than the current version.
Now if somebody knows the code better than i do, i think the idea would be to set the garbage collection interval to different values depending on what is going on. For example, one could pause the garbage collection while playing, but reactivate it when dying. The commander for example would have garbage collection all the time etc.
I think it's worth to play around with those settings. If somebody knows the console command to change the GC settings, please tell me, i will update the wiki.
Comments
I think it's worth to play around with those settings. If somebody knows the console command to change the GC settings, please tell me, i will update the wiki.<!--QuoteEnd--></div><!--QuoteEEnd-->
That is a GOOD idea. Not shutting it off completely during life (some ppl die rarely), but having it different during life and death.
When you are dead you dont really need super smooth performance, and could make it collect all garbage quickly.
About concmd, I doubt there is one, but it is not hard to add in lua. Just read the code for other concmds (I know about "kill", "say", "give" and "r_stats") and you should be able to do it yourself ;). If you cant find them, try find in files with notepad++.
On the subject on optimizing lua, one of the best (and maybe obvious) ways to make the garbage collection faster, is simply to create less garbage. I suspect there may be a few unoptimized inner loops that does a lot of object creation...
Another thing i try to understand is why NS2 has the exact same FPS if i'm setting LuaJIT to off than it is with on.
By default it is on, i checked that. the version used in build 161 is 2.0.0beta4.
Now, i know nothing about lua, let alone luajit, i'm a java/c++ programmer. But i'm pretty sure that if i execute the command jit.off() and check it with Shared.Message("Jit is " .. tostring(jit.version)) and i says false, that the code should me much much slower.
So either there is something wrong going in with LuaJIT, or you can not really disable luajit on the fly (i tried jit.flush(), with the same result). Anyway, i think it is worth investigating a little, if only to know more about lua/ns2
edit: after reading a little bit i think jit.off() only affects the current function. Does anybody know where the main function is? is there any or is it in c++?
They give a pretty good visual idea of how much frame skipping happens at different setpause values.
According to the lua doc, collectgarbage("setpause", arg) returns the old value, so I fired up the debugger.
It said the default is 200.
200ms
<img src="http://i.imgur.com/F8daP.png" border="0" class="linked-image" />
100ms (felt better than 200ms, but doesn't look much different)
<img src="http://i.imgur.com/qWmR3.png" border="0" class="linked-image" />
66ms (very clean)
<img src="http://i.imgur.com/P3rwg.png" border="0" class="linked-image" />
how did you draw the lines by the way?
DebugLine(trace.endPoint, trace.endPoint+trace.normal*3, 10,0,1,0,1)
Here's the methods header:
function DebugLine(startPoint, endPoint, lifetime, r, g, b, a)
And I agree that lowering the delay to tiny values isn't the solution. I did notice that there seemed to be a possibility to let lua do the <a href="http://pgl.yoyo.org/luai/i/collectgarbage" target="_blank">GC in steps</a>.. might be worth looking into.
by the way, the 200ms is interesting. because if you look at the profiler output, the spikes are much further appart than 200ms. But i guess thats because lua does incremental
garbage collection, and there might be a part of the collection that costs more.
edit:
something interesting. if i add this code to onprocessmove:
collectgarbage("stop")
if(collectgarbage("count") > 30000) then
collectgarbage("collect")
end
then the hitching is about as bad as if the incremental GC is used, not in frequency, but the single hitches are not that worse (ok, a little). Perhaps the lua GC does too much work in one step when doing it incrementaly
and another edit.
I just found out that the setPause values are in percentage, not ms. Here the explenation:
The garbage-collector pause controls how long the collector waits before starting a new cycle. Larger values make the collector less aggressive. Values smaller than 100 mean the collector will not wait to start a new cycle. A value of 200 means that the collector waits for the total memory in use to double before starting a new cycle.
edit3: i found out that eh default value for the setstepmul is 200. This means 200 for the setpause and 200 for setstepmul. now lets see if there are better values for this
edit4: last edit for today. One thing i saw is that if you look at the lua memory usage with
Shared.Message("GC size " .. tostring(collectgarbage("count")))
if the garbage collector is of, it goes up REALLY fast. I don't know anything about lua, but the speed it increased looked a little strange
how did you draw the lines by the way?<!--QuoteEnd--></div><!--QuoteEEnd-->
31 fps is not that bad, an average human cant see better than 30 fps (altough there is exceptions, <a href="http://en.wikipedia.org/wiki/Frame_rate)" target="_blank">http://en.wikipedia.org/wiki/Frame_rate)</a>.
Myself I dont have much troubles around 20 fps (sure I kinda see individual frames, but it doesnt impact my (lack of) gaming skillz much), cause I am quite used to that from gmod (when I still had my old comp I was used to 10 fps, but not anymore :P).
Just to wrap up the current findings:
garbagecollect setpause is at a default value of 200, which means it kicks in everytime the memory used doubled
garbagecollect setstepmul is at a default value of 200, which means that the garbage collector works twice as fast as the speed of memory allocation, not sure what that means.
Now while there might be better values, there certainly are, the main problem (judging without knowing much about lua) might be that the used memory that needs to be garbage
collected increases very fast. Might be interesting to debug that, but i don't know how to do that (yet?).
The LuaJIT idea was probably wrong, JIT seems to be enabled, at least i don't know how i could prove the oposite. A upgrade the luajit 2.0.0beta5 might be interesting.
Myself I dont have much troubles around 20 fps (sure I kinda see individual frames, but it doesnt impact my (lack of) gaming skillz much), cause I am quite used to that from gmod (when I still had my old comp I was used to 10 fps, but not anymore :P).<!--QuoteEnd--></div><!--QuoteEEnd-->
I don't really want to start a discussion about this, but it's a common misconception that you can only see 30fps.
In reality, experiments have been performed on fighter pilots where results showed up to 220fps. Now, obviously, they are fighter pilots. Personally I feel about 85fps is where I stop noticing any difference - but it really depends on what you are used to. No console gamer whines over FPS, noticed that? Because they are used to it. Nothing bad against console gamers, sometimes I'd wish that I didn't notice a difference.
I can find a source on the 220hz assertion, if you want :)
collectgarbage("step", 75) // somewhere between 75-100
collectgarbage("stop")
This was roughly what I saw on my machine:
collectgarbage("step", 1) - memory rises ~10mb/s
collectgarbage("step", 50) - rises ~1mb/s
collectgarbage("step", 75) - stable memory usage
Memory usage was just under 1gb.
This was roughly what I saw on my machine:
collectgarbage("step", 1) - memory rises ~10mb/s
collectgarbage("step", 50) - rises ~1mb/s
collectgarbage("step", 75) - stable memory usage
Memory usage was just under 1gb.<!--QuoteEnd--></div><!--QuoteEEnd-->
collectgarbage("count") * 1024 will tell you how many bytes lua is using (including non collected garbage).
While it would be possible to rewrite the Lua code to generate less garbage, for example by rewriting vector math ike a + b as a:Add(b), the point of using a scripting language is to keep the code "simpler" so I'm planning to optimize this with changes to the Lua VM.
Also, as for why disabling LuaJIT wouldn't affect the performance, that one is easy. Executing the actual Lua instructions isn't the bottleneck or a significant factor on the performance. The Lua VM is very fast, so this is expected. The bigger factor is the garbage collector and the layer that interfaces the Lua code with the C++ code. Once build 162 is released this is my next target for optimization.
That sounds like sloppy language implementation. It's not related to being dynamic/scripting language at all. I can get Haskell compiler to put all arguments of vector math in CPU registers when I write +. I also saw Pall saying how his FFI bindings will generate garbage for FFI ops like new Vector()/Vector:+. I hope he notices how broken that is sooner than later.
What about adding something like r_stats/net_stats for GC? It has bigger impact on performance than rendering/networking.
While it would be possible to rewrite the Lua code to generate less garbage, for example by rewriting vector math ike a + b as a:Add(b), the point of using a scripting language is to keep the code "simpler" so I'm planning to optimize this with changes to the Lua VM.
Also, as for why disabling LuaJIT wouldn't affect the performance, that one is easy. Executing the actual Lua instructions isn't the bottleneck or a significant factor on the performance. The Lua VM is very fast, so this is expected. The bigger factor is the garbage collector and the layer that interfaces the Lua code with the C++ code. Once build 162 is released this is my next target for optimization.<!--QuoteEnd--></div><!--QuoteEEnd-->
Have you stumbled across <a href="http://www.squirrel-lang.org/" target="_blank">http://www.squirrel-lang.org/</a> before?
I know nothing about it other than supposedly its very similar to Lua, and one of the differences is that it does reference counting instead of GC for memory management.
PS: Also, I don't know how the Lua GC works; but for .Net's GC it will sweep at a shallow level first (without inspecting every single object etc) to see if it can reclaim enough memory to not bother with a deeper sweep later.
<a href="http://msdn.microsoft.com/en-us/library/0xy59wtx.aspx" target="_blank">http://msdn.microsoft.com/en-us/library/0xy59wtx.aspx</a> has some neat information
It does its GC inline, but I think theres a 'GC thread' option for server processes or something to avoid (?) hitching being encountered.
<a href="http://msdn.microsoft.com/en-us/library/0xy59wtx.aspx" target="_blank">http://msdn.microsoft.com/en-us/library/0xy59wtx.aspx</a> has some neat information
It does its GC inline, but I think theres a 'GC thread' option for server processes or something to avoid (?) hitching being encountered.<!--QuoteEnd--></div><!--QuoteEEnd-->
Lua lets you do small GC collection cycles too but Max said the main problem is too much garbage. Smaller GC cycles would just make the garbage grow faster. Unless you are suggesting having very small cycles run more frequently? One thing they could do is implement some minor memory management into the engine and allow it to be used from Lua. Then you could do automatic recycling.
What about adding something like r_stats/net_stats for GC? It has bigger impact on performance than rendering/networking.<!--QuoteEnd--></div><!--QuoteEEnd-->
There is something like that in the game. "o_stats" will display all of the objects that are being created. "o_stats Vector" will display the location of all of the Vector allocations, etc.
The problem with Vector operations creating garbage is with the way that Lua interfaces with C++. In a test, I modified Lua so that I could have small objects that have copy semantics and exist on the stack and therefore don't need to junk up the garbage collector. To get this to work with the game I have to either modify luabind to accept these types of objects, or replace luabind with my own system. So one of those options is what I'll be working on next.
go Asraniel, find it!
and bring it back our daddy - err, max.
<a href="http://code.google.com/p/lua-vec/" target="_blank">http://code.google.com/p/lua-vec/</a>
quote:
"The new datatype is a first class type, like regular number type is. This means that there is no penalty for creating new vectors and there is no need for garbage collection."
Sounds like something worth looking into?
the only problem seems to be that currently there is no LuaJIT patch (yet?), so the porting would have to be done by max. But looking at what his plans are, it doesn't sound that bad.
Also that page might be interesting:
<a href="http://code.google.com/p/lua-vec/wiki/Benchmarks" target="_blank">http://code.google.com/p/lua-vec/wiki/Benchmarks</a>
edit: i'm updating that post when i find new info:
here is a quote from a dev of lua-vec:
"I actually hacked value semantics vectors into standard 5.1 Lua VM some time ago with major perf improvements (see lua-vec @ google code) but unfortunately making a patch for LuaJIT is out of my reach."
So implementing this in luajit is probably not just a simple port of the patch.
Here is actually a very long thread about vectors and luajit, i'm in the process of reading it.
<a href="http://lua-users.org/lists/lua-l/2010-11/msg00418.html" target="_blank">http://lua-users.org/lists/lua-l/2010-11/msg00418.html</a>
edit: some interesting info i found in the thread:
"The new GC in 2.1 would be much faster at alloc/GC/free, so it will be even lower in the future."
This means that when luajit comes out and is integrated into NS2, there should be a fps increase.
From what i can judge by the thread is that the author of luajit does not like the way lua-vec is implemented. He also doubted that that many vectors are created in a game, removing the need to optimize this. He was then more or less proved wrong by some statistics of somebodys code. here is the answer he had: <a href="http://lua-users.org/lists/lua-l/2010-11/msg00505.html" target="_blank">http://lua-users.org/lists/lua-l/2010-11/msg00505.html</a>
So i'm note sure if apart from the luajit GC optimizations he will do something about the problem.