Optimiziations

2»

Comments

  • maessemaesse Join Date: 2010-04-08 Member: 71213Members
    edited January 2011
    I applied the "step garbagecollector" to build 162 and did some benchmarking and recorded the frame times (from Client.OnClientUpdate when not predicting). During logging, I was standing still in Junctions readyroom looking at the same spot, not touching input. I plotted out about 5-10secs of deltatimes for the default build 162 and one with the changes described on the last page:

    <a href="http://i.imgur.com/2pNju.png" target="_blank">Picture is kinda big, so I'll just link it</a>

    I was running ~85 fps in windowed.

    Not saying this should be default, just throwing numbers and information out there. The 75 thing is only with 1 player on a local server - so not really a stable solution.
  • MCMLXXXIVMCMLXXXIV Join Date: 2010-04-14 Member: 71400Members
    The stepping GC does look much better in terms of cycles spent on GC. Did you look at the memory usage at the same time?

    Is there a quick summary for how we switch this on? I can provide some additional results from my system if you can provide instructions on how to repeat the test!
  • maessemaesse Join Date: 2010-04-08 Member: 71213Members
    1. Open up ns2/lua/Client.lua
    2. Search for function OnUpdateClient
    3. Skip the first line (if not Client.GetIsRunningPrediction)
    4. Add in these two lines:
    collectgarbage("step", 75)
    collectgarbage("stop") // Because step leaves the GC running, this needs to be after each step call. Derp.

    The argument to step (75) decides how much work the GC will do. 75 is good enough for singleplayer, but for online you probably need something like 200. If it is too low, the memory will keep rising and it will crash after some time (around the 2gb mark, on my pc)

    If you also want to record frametimes, I did that by adding this line after the two other:
    Shared.Message(string.format("%s", deltaTime))
    This will print the delta to the console, so it will end up in the logfile - from there you can copy the numbers and plot them. NS2 puts the logfile inside c:/users/<yourprofile>/AppData/Roaming/Natural Selection 2/log.txt
  • MCMLXXXIVMCMLXXXIV Join Date: 2010-04-14 Member: 71400Members
    Will try this out tonight and post results once I have them!
  • AsranielAsraniel Join Date: 2002-06-03 Member: 724Members, Playtest Lead, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow, Subnautica Playtester, Retired Community Developer
    edited January 2011
    another perhaps interesting quote on vectors and luajit i found:

    "Thought I'd chip in with some statistics. For our current game, allocation stats are rougly 300 vectors per frame, meaning about 9000 vectors per second. These stats are after expending some effort going over the code to minimise unnecessary allocations.

    Allocation takes about 1-2% of processing time. Regarding deallocation, we hide the GC on a low-priority thread during rendering so it's hard to estimate the exact overhead. Prior to doing so we would see GC times on the order of 5% of the frame."

    Seems like the GC can be threaded. Don't know how its done currently in spark, but might be interesting to try

    edit: and another quote:

    first the good news:
    "The one major change that will likely happen first is a new garbage
    collector for LuaJIT 2.1. I've already experimented with this on 2.0,
    but it turned out to cause too much instability for the code base.

    The standard Lua 5.1/LuaJIT 2.0 garbage collector is just not up to
    the task to handle big heaps. And both it's allocation speed and the
    collector throughput leave something to be desired. So I'm planning to
    switch to an integrated allocator and garbage collector. It's going to
    be an incremental, generational, non-copying GC."

    The bad news:
    Q4 2011 - Work on LuaJIT 2.1 starts
  • MaxMax Technical Director, Unknown Worlds Entertainment Join Date: 2002-03-15 Member: 318Super Administrators, Retired Developer, NS1 Playtester, Forum Moderators, NS2 Developer, Constellation, Subnautica Developer, Pistachionauts, Future Perfect Developer
    <!--quoteo(post=1827929:date=Jan 27 2011, 05:30 PM:name=Asraniel)--><div class='quotetop'>QUOTE (Asraniel @ Jan 27 2011, 05:30 PM) <a href="index.php?act=findpost&pid=1827929"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->"Thought I'd chip in with some statistics. For our current game, allocation stats are rougly 300 vectors per frame, meaning about 9000 vectors per second. These stats are after expending some effort going over the code to minimise unnecessary allocations.<!--QuoteEnd--></div><!--QuoteEEnd-->
    Where's this from? Our numbers are quite a bit higher than that, though we haven't done much to minimize the allocations.

    <!--quoteo(post=1827929:date=Jan 27 2011, 05:30 PM:name=Asraniel)--><div class='quotetop'>QUOTE (Asraniel @ Jan 27 2011, 05:30 PM) <a href="index.php?act=findpost&pid=1827929"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Seems like the GC can be threaded. Don't know how its done currently in spark, but might be interesting to try<!--QuoteEnd--></div><!--QuoteEEnd-->
    I imagine they are talking about doing GC in a background thread while the main thread does the rendering. Our engine does the same thing essentially, except the main thread is doing the GC and the background thread is doing the rendering. I think you would run into a lot of problems if you tried to run the GC while any Lua code was executing in another thread.

    That's interesting news about the changes to luajit, though sad to see they are so far off.
  • MCMLXXXIVMCMLXXXIV Join Date: 2010-04-14 Member: 71400Members
    edited January 2011
    I've run a quick and dirty test with maesse's tweak. I tested it by creating a local server, waiting for the game to stabilise in the ready room, then joining the alien team and running around a bit biting things. Note that my PC only has 2GB of RAM, which may explain the differences between maesse's test and mine.

    I noticed that there was quite a lot more hitching with maesse's code initially, but once it settled down the experience did seem to be smoother. I'm not sure how to interpret these results. I don't know enough about Lua or its GC routines. Hope it helps though?

    maesse, if you'd like a copy of the log files that I generated just send me a PM and I'll mail them over.
  • AsranielAsraniel Join Date: 2002-06-03 Member: 724Members, Playtest Lead, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow, Subnautica Playtester, Retired Community Developer
    Max: sorry, forgot the references. First quote came from:
    <a href="http://lua-users.org/lists/lua-l/2010-11/msg00599.html" target="_blank">http://lua-users.org/lists/lua-l/2010-11/msg00599.html</a>

    second one from:
    <a href="http://lua-users.org/lists/lua-l/2011-01/msg01238.html" target="_blank">http://lua-users.org/lists/lua-l/2011-01/msg01238.html</a>
  • maessemaesse Join Date: 2010-04-08 Member: 71213Members
    I got MCMLXXXIV's results and added them to the plot:
    <a href="http://i.imgur.com/JGJRz.png" target="_blank">Graph</a>

    I removed a few (3-4) big spikes, to make the graph more clear
  • MCMLXXXIVMCMLXXXIV Join Date: 2010-04-14 Member: 71400Members
    edited January 2011
    Wow, looking at the frame times I definitely think my pc is in need of an upgrade.

    I think the graph makes the effect quite clear. You can see the positive impact on framerate that I noticed (it seems even more pronounced on the graph of my results, perhaps due to the performance limitations of my hardware?). I think those spikes might have been important though - I did notice very bad hitching just as the map loaded (which would have sent the frame times through the roof for one or two frames)!
  • AsranielAsraniel Join Date: 2002-06-03 Member: 724Members, Playtest Lead, Forum Moderators, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Shadow, Subnautica Playtester, Retired Community Developer
    i would like to say that the little garbage collector change suggested improved my performance soo much. The game feels much smoother now.

    That particular patch is probably not the "correct" way to do it, but it shows a nice hint on how to improve performance
  • MwiftMwift Join Date: 2003-10-25 Member: 21936Awaiting Authorization, Members
    Also brought my game up to a "playable" level. I was running on lowest details with very poor FPS - now it's at least playable after doing this change(to 200).
  • JerunkJerunk Join Date: 2002-11-22 Member: 9659Members
    Did the change to 200, noticed a slight improvement in smoothness after a very jerky initial start.
  • slimeslime Join Date: 2010-07-14 Member: 72352Members
    FYI, LuaJIT 2.0.0-beta6 is out, and it includes the new FFI library (among other things). <a href="http://luajit.org/ext_ffi.html" target="_blank">http://luajit.org/ext_ffi.html</a>
  • extolloextollo Ping Blip Join Date: 2010-07-16 Member: 72457Members
    edited February 2011
    <!--quoteo(post=1833192:date=Feb 18 2011, 01:57 AM:name=slime)--><div class='quotetop'>QUOTE (slime @ Feb 18 2011, 01:57 AM) <a href="index.php?act=findpost&pid=1833192"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->FYI, LuaJIT 2.0.0-beta6 is out, and it includes the new FFI library (among other things). <a href="http://luajit.org/ext_ffi.html" target="_blank">http://luajit.org/ext_ffi.html</a><!--QuoteEnd--></div><!--QuoteEEnd-->

    interesting. this looks like a more generic solution than the specialized nonstandard vector patch that was pointed out earlier. should slow down the rate
    of GC growth by allowing lots of tiny objects to be created without generating garbage. needs special script coding, which is a little disadvantage, but it could work.

    still would want GC improvements since mods will probably end up not coding to minimize garbage. 2.0* is supposed to eventually have GC fixes too though.

    * Edit: actually updated GC will be in luajit 2.1 which will probably be in 2012 or so. But on the positive front FFI is to support some c++ structures & possibly native vector ops soon.
  • J.D.J.D. Join Date: 2011-06-16 Member: 104695Members
    <!--quoteo(post=1826292:date=Jan 22 2011, 12:20 PM:name=maesse)--><div class='quotetop'>QUOTE (maesse @ Jan 22 2011, 12:20 PM) <a href="index.php?act=findpost&pid=1826292"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->Try adding this to Client.lua inside OnClientUpdate, after "is not Client.IsRunningPrediction":


    collectgarbage("step", 75) // somewhere between 75-100
    collectgarbage("stop")<!--QuoteEnd--></div><!--QuoteEEnd-->


    <!--quoteo(post=1827640:date=Jan 26 2011, 05:58 PM:name=maesse)--><div class='quotetop'>QUOTE (maesse @ Jan 26 2011, 05:58 PM) <a href="index.php?act=findpost&pid=1827640"><{POST_SNAPBACK}></a></div><div class='quotemain'><!--quotec-->1. Open up ns2/lua/Client.lua
    2. Search for function OnUpdateClient
    3. Skip the first line (if not Client.GetIsRunningPrediction)
    4. Add in these two lines:
    collectgarbage("step", 75)
    collectgarbage("stop") // Because step leaves the GC running, this needs to be after each step call. Derp.

    The argument to step (75) decides how much work the GC will do. 75 is good enough for singleplayer, but for online you probably need something like 200. If it is too low, the memory will keep rising and it will crash after some time (around the 2gb mark, on my pc)

    If you also want to record frametimes, I did that by adding this line after the two other:
    Shared.Message(string.format("%s", deltaTime))
    This will print the delta to the console, so it will end up in the logfile - from there you can copy the numbers and plot them. NS2 puts the logfile inside c:/users/<yourprofile>/AppData/Roaming/Natural Selection 2/log.txt<!--QuoteEnd--></div><!--QuoteEEnd-->

    Hey :) Been playing around with this lately, currently its set at 200. I do notice a bit of an fps improvement but I feel like it throws off the hitreg and/or makes things twitchy and delayed. Could this cause this or am I just imagining it? Thanks in advance.

    JD
  • ZurikiZuriki Join Date: 2010-11-20 Member: 75105Members
    Imagining. GC is completely separate from the hitreg. It might cause slow down if the delay is too high (backlog) and it'll cause big hitches when the GC does a cleanup because there is more to clear out.
  • fsfodfsfod uk Join Date: 2004-04-09 Member: 27810Members, NS2 Developer, Constellation, NS2 Playtester, Squad Five Blue, Squad Five Silver, Squad Five Gold, Subnautica Playtester, NS2 Community Developer, Pistachionauts
    I think a backlog hurts most a when the GC starts a new mark/propagate phase where it has to traverse the whole object tree from all the valid root objects which you can't be done incrementally unlike sweep phase where the GC just walks along the big linked list of the all lua the objects it can stop at any point on the list if its freed-ed enough memory.

    The next version of lua(5.2) does have a generational GC which I think takes advantage of the fact freshly allocated objects are automatically marked as a unreachable until they get stored in a table. A generational sweep can take advantage of this fact by only traversing the big linked list of all objects until it finds objects from the previous mark phase. which does give me idea of a mini generational sweep/reuse of Vectors after each call of ProcessMove
Sign In or Register to comment.