I applied the "step garbagecollector" to build 162 and did some benchmarking and recorded the frame times (from Client.OnClientUpdate when not predicting). During logging, I was standing still in Junctions readyroom looking at the same spot, not touching input. I plotted out about 5-10secs of deltatimes for the default build 162 and one with the changes described on the last page:
<a href="http://i.imgur.com/2pNju.png" target="_blank">Picture is kinda big, so I'll just link it</a>
I was running ~85 fps in windowed.
Not saying this should be default, just throwing numbers and information out there. The 75 thing is only with 1 player on a local server - so not really a stable solution.
The stepping GC does look much better in terms of cycles spent on GC. Did you look at the memory usage at the same time?
Is there a quick summary for how we switch this on? I can provide some additional results from my system if you can provide instructions on how to repeat the test!
1. Open up ns2/lua/Client.lua 2. Search for function OnUpdateClient 3. Skip the first line (if not Client.GetIsRunningPrediction) 4. Add in these two lines: collectgarbage("step", 75) collectgarbage("stop") // Because step leaves the GC running, this needs to be after each step call. Derp.
The argument to step (75) decides how much work the GC will do. 75 is good enough for singleplayer, but for online you probably need something like 200. If it is too low, the memory will keep rising and it will crash after some time (around the 2gb mark, on my pc)
If you also want to record frametimes, I did that by adding this line after the two other: Shared.Message(string.format("%s", deltaTime)) This will print the delta to the console, so it will end up in the logfile - from there you can copy the numbers and plot them. NS2 puts the logfile inside c:/users/<yourprofile>/AppData/Roaming/Natural Selection 2/log.txt
AsranielJoin Date: 2002-06-03Member: 724Members, Playtest Lead, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow, Subnautica Playtester, Retired Community Developer
edited January 2011
another perhaps interesting quote on vectors and luajit i found:
"Thought I'd chip in with some statistics. For our current game, allocation stats are rougly 300 vectors per frame, meaning about 9000 vectors per second. These stats are after expending some effort going over the code to minimise unnecessary allocations.
Allocation takes about 1-2% of processing time. Regarding deallocation, we hide the GC on a low-priority thread during rendering so it's hard to estimate the exact overhead. Prior to doing so we would see GC times on the order of 5% of the frame."
Seems like the GC can be threaded. Don't know how its done currently in spark, but might be interesting to try
edit: and another quote:
first the good news: "The one major change that will likely happen first is a new garbage collector for LuaJIT 2.1. I've already experimented with this on 2.0, but it turned out to cause too much instability for the code base.
The standard Lua 5.1/LuaJIT 2.0 garbage collector is just not up to the task to handle big heaps. And both it's allocation speed and the collector throughput leave something to be desired. So I'm planning to switch to an integrated allocator and garbage collector. It's going to be an incremental, generational, non-copying GC."
<!--quoteo(post=1827929:date=Jan 27 2011, 05:30 PM:name=Asraniel)--><div class='quotetop'>QUOTE (Asraniel @ Jan 27 2011, 05:30 PM) <a href="index.php?act=findpost&pid=1827929"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->"Thought I'd chip in with some statistics. For our current game, allocation stats are rougly 300 vectors per frame, meaning about 9000 vectors per second. These stats are after expending some effort going over the code to minimise unnecessary allocations.<!--QuoteEnd--></div><!--QuoteEEnd--> Where's this from? Our numbers are quite a bit higher than that, though we haven't done much to minimize the allocations.
<!--quoteo(post=1827929:date=Jan 27 2011, 05:30 PM:name=Asraniel)--><div class='quotetop'>QUOTE (Asraniel @ Jan 27 2011, 05:30 PM) <a href="index.php?act=findpost&pid=1827929"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Seems like the GC can be threaded. Don't know how its done currently in spark, but might be interesting to try<!--QuoteEnd--></div><!--QuoteEEnd--> I imagine they are talking about doing GC in a background thread while the main thread does the rendering. Our engine does the same thing essentially, except the main thread is doing the GC and the background thread is doing the rendering. I think you would run into a lot of problems if you tried to run the GC while any Lua code was executing in another thread.
That's interesting news about the changes to luajit, though sad to see they are so far off.
I've run a quick and dirty test with maesse's tweak. I tested it by creating a local server, waiting for the game to stabilise in the ready room, then joining the alien team and running around a bit biting things. Note that my PC only has 2GB of RAM, which may explain the differences between maesse's test and mine.
I noticed that there was quite a lot more hitching with maesse's code initially, but once it settled down the experience did seem to be smoother. I'm not sure how to interpret these results. I don't know enough about Lua or its GC routines. Hope it helps though?
maesse, if you'd like a copy of the log files that I generated just send me a PM and I'll mail them over.
AsranielJoin Date: 2002-06-03Member: 724Members, Playtest Lead, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow, Subnautica Playtester, Retired Community Developer
Max: sorry, forgot the references. First quote came from: <a href="http://lua-users.org/lists/lua-l/2010-11/msg00599.html" target="_blank">http://lua-users.org/lists/lua-l/2010-11/msg00599.html</a>
second one from: <a href="http://lua-users.org/lists/lua-l/2011-01/msg01238.html" target="_blank">http://lua-users.org/lists/lua-l/2011-01/msg01238.html</a>
Wow, looking at the frame times I definitely think my pc is in need of an upgrade.
I think the graph makes the effect quite clear. You can see the positive impact on framerate that I noticed (it seems even more pronounced on the graph of my results, perhaps due to the performance limitations of my hardware?). I think those spikes might have been important though - I did notice very bad hitching just as the map loaded (which would have sent the frame times through the roof for one or two frames)!
MwiftJoin Date: 2003-10-25Member: 21936Awaiting Authorization, Members
Also brought my game up to a "playable" level. I was running on lowest details with very poor FPS - now it's at least playable after doing this change(to 200).
FYI, LuaJIT 2.0.0-beta6 is out, and it includes the new FFI library (among other things). <a href="http://luajit.org/ext_ffi.html" target="_blank">http://luajit.org/ext_ffi.html</a>
<!--quoteo(post=1833192:date=Feb 18 2011, 01:57 AM:name=slime)--><div class='quotetop'>QUOTE (slime @ Feb 18 2011, 01:57 AM) <a href="index.php?act=findpost&pid=1833192"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->FYI, LuaJIT 2.0.0-beta6 is out, and it includes the new FFI library (among other things). <a href="http://luajit.org/ext_ffi.html" target="_blank">http://luajit.org/ext_ffi.html</a><!--QuoteEnd--></div><!--QuoteEEnd-->
interesting. this looks like a more generic solution than the specialized nonstandard vector patch that was pointed out earlier. should slow down the rate of GC growth by allowing lots of tiny objects to be created without generating garbage. needs special script coding, which is a little disadvantage, but it could work.
still would want GC improvements since mods will probably end up not coding to minimize garbage. 2.0* is supposed to eventually have GC fixes too though.
* Edit: actually updated GC will be in luajit 2.1 which will probably be in 2012 or so. But on the positive front FFI is to support some c++ structures & possibly native vector ops soon.
<!--quoteo(post=1826292:date=Jan 22 2011, 12:20 PM:name=maesse)--><div class='quotetop'>QUOTE (maesse @ Jan 22 2011, 12:20 PM) <a href="index.php?act=findpost&pid=1826292"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Try adding this to Client.lua inside OnClientUpdate, after "is not Client.IsRunningPrediction":
collectgarbage("step", 75) // somewhere between 75-100 collectgarbage("stop")<!--QuoteEnd--></div><!--QuoteEEnd-->
<!--quoteo(post=1827640:date=Jan 26 2011, 05:58 PM:name=maesse)--><div class='quotetop'>QUOTE (maesse @ Jan 26 2011, 05:58 PM) <a href="index.php?act=findpost&pid=1827640"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->1. Open up ns2/lua/Client.lua 2. Search for function OnUpdateClient 3. Skip the first line (if not Client.GetIsRunningPrediction) 4. Add in these two lines: collectgarbage("step", 75) collectgarbage("stop") // Because step leaves the GC running, this needs to be after each step call. Derp.
The argument to step (75) decides how much work the GC will do. 75 is good enough for singleplayer, but for online you probably need something like 200. If it is too low, the memory will keep rising and it will crash after some time (around the 2gb mark, on my pc)
If you also want to record frametimes, I did that by adding this line after the two other: Shared.Message(string.format("%s", deltaTime)) This will print the delta to the console, so it will end up in the logfile - from there you can copy the numbers and plot them. NS2 puts the logfile inside c:/users/<yourprofile>/AppData/Roaming/Natural Selection 2/log.txt<!--QuoteEnd--></div><!--QuoteEEnd-->
Hey :) Been playing around with this lately, currently its set at 200. I do notice a bit of an fps improvement but I feel like it throws off the hitreg and/or makes things twitchy and delayed. Could this cause this or am I just imagining it? Thanks in advance.
Imagining. GC is completely separate from the hitreg. It might cause slow down if the delay is too high (backlog) and it'll cause big hitches when the GC does a cleanup because there is more to clear out.
fsfodukJoin Date: 2004-04-09Member: 27810Members, NS2 Developer, Constellation, NS2 Playtester, Squad Five Blue, Squad Five Silver, Squad Five Gold, Subnautica Playtester, NS2 Community Developer, Pistachionauts
I think a backlog hurts most a when the GC starts a new mark/propagate phase where it has to traverse the whole object tree from all the valid root objects which you can't be done incrementally unlike sweep phase where the GC just walks along the big linked list of the all lua the objects it can stop at any point on the list if its freed-ed enough memory.
The next version of lua(5.2) does have a generational GC which I think takes advantage of the fact freshly allocated objects are automatically marked as a unreachable until they get stored in a table. A generational sweep can take advantage of this fact by only traversing the big linked list of all objects until it finds objects from the previous mark phase. which does give me idea of a mini generational sweep/reuse of Vectors after each call of ProcessMove
Comments
<a href="http://i.imgur.com/2pNju.png" target="_blank">Picture is kinda big, so I'll just link it</a>
I was running ~85 fps in windowed.
Not saying this should be default, just throwing numbers and information out there. The 75 thing is only with 1 player on a local server - so not really a stable solution.
Is there a quick summary for how we switch this on? I can provide some additional results from my system if you can provide instructions on how to repeat the test!
2. Search for function OnUpdateClient
3. Skip the first line (if not Client.GetIsRunningPrediction)
4. Add in these two lines:
collectgarbage("step", 75)
collectgarbage("stop") // Because step leaves the GC running, this needs to be after each step call. Derp.
The argument to step (75) decides how much work the GC will do. 75 is good enough for singleplayer, but for online you probably need something like 200. If it is too low, the memory will keep rising and it will crash after some time (around the 2gb mark, on my pc)
If you also want to record frametimes, I did that by adding this line after the two other:
Shared.Message(string.format("%s", deltaTime))
This will print the delta to the console, so it will end up in the logfile - from there you can copy the numbers and plot them. NS2 puts the logfile inside c:/users/<yourprofile>/AppData/Roaming/Natural Selection 2/log.txt
"Thought I'd chip in with some statistics. For our current game, allocation stats are rougly 300 vectors per frame, meaning about 9000 vectors per second. These stats are after expending some effort going over the code to minimise unnecessary allocations.
Allocation takes about 1-2% of processing time. Regarding deallocation, we hide the GC on a low-priority thread during rendering so it's hard to estimate the exact overhead. Prior to doing so we would see GC times on the order of 5% of the frame."
Seems like the GC can be threaded. Don't know how its done currently in spark, but might be interesting to try
edit: and another quote:
first the good news:
"The one major change that will likely happen first is a new garbage
collector for LuaJIT 2.1. I've already experimented with this on 2.0,
but it turned out to cause too much instability for the code base.
The standard Lua 5.1/LuaJIT 2.0 garbage collector is just not up to
the task to handle big heaps. And both it's allocation speed and the
collector throughput leave something to be desired. So I'm planning to
switch to an integrated allocator and garbage collector. It's going to
be an incremental, generational, non-copying GC."
The bad news:
Q4 2011 - Work on LuaJIT 2.1 starts
Where's this from? Our numbers are quite a bit higher than that, though we haven't done much to minimize the allocations.
<!--quoteo(post=1827929:date=Jan 27 2011, 05:30 PM:name=Asraniel)--><div class='quotetop'>QUOTE (Asraniel @ Jan 27 2011, 05:30 PM) <a href="index.php?act=findpost&pid=1827929"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Seems like the GC can be threaded. Don't know how its done currently in spark, but might be interesting to try<!--QuoteEnd--></div><!--QuoteEEnd-->
I imagine they are talking about doing GC in a background thread while the main thread does the rendering. Our engine does the same thing essentially, except the main thread is doing the GC and the background thread is doing the rendering. I think you would run into a lot of problems if you tried to run the GC while any Lua code was executing in another thread.
That's interesting news about the changes to luajit, though sad to see they are so far off.
I noticed that there was quite a lot more hitching with maesse's code initially, but once it settled down the experience did seem to be smoother. I'm not sure how to interpret these results. I don't know enough about Lua or its GC routines. Hope it helps though?
maesse, if you'd like a copy of the log files that I generated just send me a PM and I'll mail them over.
<a href="http://lua-users.org/lists/lua-l/2010-11/msg00599.html" target="_blank">http://lua-users.org/lists/lua-l/2010-11/msg00599.html</a>
second one from:
<a href="http://lua-users.org/lists/lua-l/2011-01/msg01238.html" target="_blank">http://lua-users.org/lists/lua-l/2011-01/msg01238.html</a>
<a href="http://i.imgur.com/JGJRz.png" target="_blank">Graph</a>
I removed a few (3-4) big spikes, to make the graph more clear
I think the graph makes the effect quite clear. You can see the positive impact on framerate that I noticed (it seems even more pronounced on the graph of my results, perhaps due to the performance limitations of my hardware?). I think those spikes might have been important though - I did notice very bad hitching just as the map loaded (which would have sent the frame times through the roof for one or two frames)!
That particular patch is probably not the "correct" way to do it, but it shows a nice hint on how to improve performance
interesting. this looks like a more generic solution than the specialized nonstandard vector patch that was pointed out earlier. should slow down the rate
of GC growth by allowing lots of tiny objects to be created without generating garbage. needs special script coding, which is a little disadvantage, but it could work.
still would want GC improvements since mods will probably end up not coding to minimize garbage. 2.0* is supposed to eventually have GC fixes too though.
* Edit: actually updated GC will be in luajit 2.1 which will probably be in 2012 or so. But on the positive front FFI is to support some c++ structures & possibly native vector ops soon.
collectgarbage("step", 75) // somewhere between 75-100
collectgarbage("stop")<!--QuoteEnd--></div><!--QuoteEEnd-->
<!--quoteo(post=1827640:date=Jan 26 2011, 05:58 PM:name=maesse)--><div class='quotetop'>QUOTE (maesse @ Jan 26 2011, 05:58 PM) <a href="index.php?act=findpost&pid=1827640"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->1. Open up ns2/lua/Client.lua
2. Search for function OnUpdateClient
3. Skip the first line (if not Client.GetIsRunningPrediction)
4. Add in these two lines:
collectgarbage("step", 75)
collectgarbage("stop") // Because step leaves the GC running, this needs to be after each step call. Derp.
The argument to step (75) decides how much work the GC will do. 75 is good enough for singleplayer, but for online you probably need something like 200. If it is too low, the memory will keep rising and it will crash after some time (around the 2gb mark, on my pc)
If you also want to record frametimes, I did that by adding this line after the two other:
Shared.Message(string.format("%s", deltaTime))
This will print the delta to the console, so it will end up in the logfile - from there you can copy the numbers and plot them. NS2 puts the logfile inside c:/users/<yourprofile>/AppData/Roaming/Natural Selection 2/log.txt<!--QuoteEnd--></div><!--QuoteEEnd-->
Hey :) Been playing around with this lately, currently its set at 200. I do notice a bit of an fps improvement but I feel like it throws off the hitreg and/or makes things twitchy and delayed. Could this cause this or am I just imagining it? Thanks in advance.
JD
The next version of lua(5.2) does have a generational GC which I think takes advantage of the fact freshly allocated objects are automatically marked as a unreachable until they get stored in a table. A generational sweep can take advantage of this fact by only traversing the big linked list of all objects until it finds objects from the previous mark phase. which does give me idea of a mini generational sweep/reuse of Vectors after each call of ProcessMove