GetAllScriptActors() are evil, or why hydra spam kills a server

kingmob · April 2011

<div class='quotetop'>QUOTE (Soylent_green @ Apr 7 2011, 06:26 PM) <a href="index.php?act=findpost&pid=1840208"><{POST_SNAPBACK}></a></div><div class='quotemain'>... To avoid new[] and delete[](expensive!) ...</div>

new and delete overhead can be managed by a memory pool.
A memory pool will gain you the benefit of contiguous chunks as well as lessening fragmentation.
Some OS have a small block heap manager which for smaller requests perform memory pool like operations.

But they are very popular in real time systems, virtual machines, and games.

Alot of these ideas are good ones, don't go shooting down others ideas.
Instead if your idea is better focus on its strengths.
Feel free to ask direct questions if you don't understand the benefits.

The final solution may be a combination of the different ideas...or maybe something out of the blue.

Either way this chatter has gotten me excited.

Lemming Jesus · April 2011

I've found the solution to all of our Hydra lag problems. It took a long time to do, but I got it.

We just offload the work to the player's brain by making the commander and gorges aim them! I've solved the problem with zero lines of code and now we can have all the hydras we want.

Screenshot of it in action:

<a href="http://img853.imageshack.us/i/hydrarender800x906copy.png" target="_blank">http://img853.imageshack.us/i/hydrarender800x906copy.png</a>

Hugh · April 2011

I love this thread. This is the best kind of thread. If this thread made up a jumper, I would wear that jumper, and tell people it was made of awesome thread.

Thaldarin · April 2011

<div class='quotetop'>QUOTE (CoolCookieCooks @ Apr 7 2011, 10:42 PM) <a href="index.php?act=findpost&pid=1840212"><{POST_SNAPBACK}></a></div><div class='quotemain'>What.

I felt clever for understanding this thread, then I got to these last posts and my brain melted.</div>

+1

Tig · April 2011

i'd just like to come back and say, matso, you f****** rule dude. while we may have not seen any huge performance leap with your code fix, i've seen a noticeable LACK of performance drop when people spam hydras and turrets. so that "hydra optimization (create 5+)" being accepted on the tracker? thats all you man.

well done, sir, well done.

MurphyIdiot · April 2011

We got Matso's changes into the build :)

With the recent optimizations to entity querying ("give me all the MACs") and Matso's work, these targeting entities are much improved.

Thanks Matso!

Game-Sloth · April 2011

Could someone explain why GameScriptActor's could not simply all run on a separate thread or core. Pretty much, everyone has a multicore processor now and it seems that this would be a very simple approach to distributing the server processing load.

Soylent_green · April 2011

<div class='quotetop'>QUOTE (Game-Sloth @ Apr 19 2011, 11:45 PM) <a href="index.php?act=findpost&pid=1841635"><{POST_SNAPBACK}></a></div><div class='quotemain'>Could someone explain why GameScriptActor's could not simply all run on a separate thread or core. Pretty much, everyone has a multicore processor now and it seems that this would be a very simple approach to distributing the server processing load.</div>

Do you mean why the individual turrets and hydras etc. cannot run their LUA code on different cores in parallel? It's a lot of work; the low hanging fruit is optimization because that's comparatively easy and might be able to realize similar gains.

I can list some of the problems if you like.

Hydras and turrets should wait until the player positions have updated before they fire, otherwise they're sometimes not going to fire even though the player is in view(was occluded last tick), sometimes fire even though the player is occluded(was not occluded last tick) and they're going to use last tick's position and velocity to determine where to aim(the longer ticks take, the more useless they become at hitting anything). If something is attached to another object(say, a gorge is riding a MAC or a train, or whatever) then the train would always have to update before the gorge; if the order is random the gorge will skip back and forth, sometimes having a position relative to last tick's train position and sometimes a position relative to this tick's train position.

All these things can't run in parallel. There are some serialization bottlenecks. A typical solution might be to sort these objects into 'buckets'; these buckets are processed sequentially, but within each bucket you can update objects in any order or in parallel, they're independent on one another. Serialization means that you have to stop and wait until all threads are finished with a bucket until you can proceed to the next.

If threads just grab from the bucket, two threads might occasionally update the same object; this can be relatively harmless(a player moving twice as far in one frame, an object not getting updated as a result of an index being incremented twice) or it can result in nonsense that crashes the game(e.g. if intermediate values are stored in the object being updated rather than in a temporary variable and two threads are in a race condition. An index might point to the last object; a thread determines that the index is not out of bounds, but before it accesses the object the index is updated, so that its now out of bounds and the thread is accessing nonsense). Locks and atomic instructions can deal with this, but they're expensive. You can have a single thread try and apportion the work and farm it out to worker threads; but if done unintelligently one thread might recieve a list of expensive hydras while another gets res-towers and other light-weight stuff.

Efficient multithreading is a pain in the ass and you don't do it unless you have to.

jkflipflop · April 2011

<div class='quotetop'>QUOTE (xposed- @ Apr 6 2011, 09:25 PM) <a href="index.php?act=findpost&pid=1840124"><{POST_SNAPBACK}></a></div><div class='quotemain'>Given Max is quite frankly, a genius, I'm pretty sure he knows about this already. Was an interesting read though.</div>

Being a genius doesn't make one all-knowing.

matso · April 2011

<div class='quotetop'>QUOTE (Soylent_green @ Apr 20 2011, 11:59 AM) <a href="index.php?act=findpost&pid=1841688"><{POST_SNAPBACK}></a></div><div class='quotemain'>Do you mean why the individual turrets and hydras etc. cannot run their LUA code on different cores in parallel? It's a lot of work; the low hanging fruit is optimization because that's comparatively easy and might be able to realize similar gains.

I can list some of the problems if you like.

*snipped*

Efficient multithreading is a pain in the ass and you don't do it unless you have to.</div>

Definitely. OTOH, with single core speed not increasing in speed much, you will need to go multithreading if you want performance.

Of course, the spark engine is already multithreaded, its just that its more for convinience than performance (though there are
some performance gains by using a dedicated rendering thread on the client side).

The kind of MT the original poster was would require a quite different architecture, with lots of really weird concepts thrown in that would be quite difficult for most modders to understand.

And things are buggy enough in a serial architecture - multi-threaded architectures are notoriously difficult to debug, because they become indeterministic - if you run things twice with exactly the same setup, a serial architecture will (well ... SHOULD) produce the same result.

A MT architecture might not, as you have a random factor outside your control, namely how the OS happened to schedule your threads.

All that being said, I love tinkering with MT. Stretches your brain a bit ... in as many dimensions as you have threads :-)

MOOtant · April 2011

"MT" gets buggy only if you do concurrency instead of parallelism. For example your single core is concurrent because it receives clock signals pushing SR/D/whatever latches. But that concurrency and time detail is abstracted away. Threads are concurrency primitive and not a parallelism primitive. You don't want to be thinking about 4 threads doing turret vs enemies visibility checks. You want to map function of being visible over set of potential enemies and then fold it taking closest one. <a href="http://en.wikipedia.org/wiki/Monoid" target="_blank">Monoids</a> are practically everywhere and having thread boilerplate achieves nothing.

To sum up: threads = bad, parallel fold/map and the family = good.

Soylent_green · April 2011

<div class='quotetop'>QUOTE (matso @ Apr 21 2011, 05:30 AM) <a href="index.php?act=findpost&pid=1841770"><{POST_SNAPBACK}></a></div><div class='quotemain'>All that being said, I love tinkering with MT. Stretches your brain a bit ... in as many dimensions as you have threads :-)</div>

I have a feeling you'd like VHDL. VHDL describes hardware; it compiles to a low-level description of gates and circuits. FPGAs are chips consisting of arrays of programmable gates that can be configured and wired toghether by software to act like any hardware you've designed in VHDL(as long as there are enough logic cells on the FPGA to instantiate your design).

Making hardware is a whole nother level of nasty. You are not moving sequentially through a program; everything you make operates in parallel with some D flip-flops and one or more clock signal. You can still and will implement designs that will arrange fairly large numbers of gates sequentially, but what that actually means is that if an input signal changes it will trickle through a network of gates until the output eventually changes; if you try to operate the FPGA at a too high clock frequency the result may not trickle out the end before the result is latched to a flip-flop(typically on the rising edge of the clock signal).

If you have to have multiple clock domains that operate asynchronously things get even more complicated. Flip-flops have two well-defined states('0' and '1') but if the input changes during the setup and hold time the flip-flop can hover between '0' and '1' for an indeterminate amount of time before settling on either.

You can also have race conditions; where say, the low bits of a counter have been updated but the high bits have not changed yet. Incrementing '0111' by 1 and passing the signal asynchronously to another chip may yield '0100' instead of either '0111' or '1000'(i.e. if the two least significant bits had changed, but the two most significant bits had not yet changed). There's a curious scheme to deal with this; it is possible to encode numbers using gray code, where it is guaranteed that adding or subtracting one will only change a single bit at a time.

4-bit gray-code would look like this:
<div class='codetop'>CODE</div><div class='codemain'> (binary -> gray)
0000 -> 0000
0001 -> 0001
0010 -> 0011
0011 -> 0010
0100 -> 0110
0101 -> 0111
0110 -> 0101
0111 -> 0100
1000 -> 1100
1001 -> 1101
1010 -> 1111
1011 -> 1110
1100 -> 1010
1101 -> 1011
1110 -> 1001
1111 -> 1000</div>

A fully custom ASIC may only cost a dollar and be just as powerful, but to make the first unit will have a one-time cost of a few million dollars and a turn-around time of weeks. The cool thing about FPGAs is that you can buy just one for a few hundred dollars and instantiate any design you've made with a turn-around time of a few minutes.

Skie · April 2011

Well I didn't expect to find VHDL talk on NS2 forums of all places.
I've got an Altera DE2 FPGA board from the university and I work in ASIC design and IP verification myself.

Either way, this update did neatly remove some of the jerkiness. The FPS is pretty good with my current rig, but it would be wonderful to get rid off those odd lag-spikes that drop fps very low for fractions of seconds.

Soylent_green · April 2011

<div class='quotetop'>QUOTE (Skie @ Apr 21 2011, 11:37 AM) <a href="index.php?act=findpost&pid=1841790"><{POST_SNAPBACK}></a></div><div class='quotemain'>Well I didn't expect to find VHDL talk on NS2 forums of all places.
I've got an Altera DE2 FPGA board from the university and I work in ASIC design and IP verification myself.</div>

Cool. How are you liking it?

I've only had a brief exposure to FPGAs and VHDL, but it was enough to realise that I much prefer software(especially when we're talking about something that is essentially a hobby).

Still, for sheer hair-pulling insanity I don't think you can beat the intel guys who made the photoresist masks for processors up to and including the 8086; they did it by manually cutting tiny strips out of rubylith masking film.

spellman23 · April 2011

Down with VHDL. Verilog for life!!!!

spellman23 · April 2011

Looks like matso's improvements are in the patch.

Crispy · April 2011

If someone hasn't already suggested it, can you not cut down the number of possible targets based on location?

This is based on something a programmer walked me through at work for a slightly different problem, but perhaps it could apply here:

<ol type='1'><li>Split the entire level into a top-down grid, with each segment a 1:1 square of length 'x', with 'x' sensibly tweaked for the best performance/eficacity. NS2 has little to no level-over-level, so calculating cube volumes should not be necessary, plus we want this calculation to be as efficient as possible. You can probably map the grid from the existing map origin (0,0).</li><li>Calculate which 'segments' of the grid the maximum range of the structure is able to 'touch'</li><li><i>Within these segments only</i>, list all the targets the structure is able to attack, regardless of range</li><li>At this stage, remove static structures beyond the structures maximum range (and remember to remove them from subsequent calculations unless they themselves move somehow)</li><li>Raytrace to all the remaining valid targets</li><li>Maybe caching objects that enter and leave the 'touched segments' would be a further optimisation, maybe it would actually be worse, a programmer can make that call</li></ol>

Anyone care to feed back on that approach?

matso · April 2011

You always have to balance complexity with performance. The first approach you write should always be the simplest. That allows you to get to a working program the fastest, which will allow you to identify where the bottlenecks are. They are seldom where you think...

The solution you propose is pretty much the standard solution if you have large amounts of objects and need to quickly find out which ones are in range. I'd do something like if I had to deal with 10000 objects or more ... but for an NS2 game which deals with about 200-300 objects, it would be overkill.

Steinhauer · April 2011

<div class='quotetop'>QUOTE (Soylent_green @ Apr 7 2011, 06:00 PM) <a href="index.php?act=findpost&pid=1840215"><{POST_SNAPBACK}></a></div><div class='quotemain'>No. The goal is to handle a reasonable number of turrets and hydras as fast as possible(say, 50 of each). Optimizing for 1000 hydras and turrets at the expense of being slower at handling 100 is not a good trade off.</div>

Then why not determine a threshold of number of hydras/turrets/crags in the game at which the server switches from the old script to the new one? Maybe have it set like a home thermostat where there is a midrange that the server does no bother switching in (example: once hitting 60, switch to the new script; once dropping back down to 40, switch back to the old script. Obviously those aren't appropriate numbers though.)

Also this seems to revolve a lot around the idea of static and non-static actors. If I remember correctly, won't most alien structures be mobile? Wouldn't that mean optimization for the sentries and crags would be nearly nullified once moving alien structures are implemented? Or would you just switch the alien structures from the static to the non-static list while it's in transit?

...I had something else to add, but it slipped my mind, so maybe I'll come back and edit this post.

Mkilbride · April 2011

Oh, is that why performance isn't degrading?

Nice job! Matso should get his copy of NS2 for free, i.e, refund him, or give him a additional copy to give away. I mean, he is afterall ,doing your job. :P

kabab · April 2011

<div class='quotetop'>QUOTE (Steinhauer @ Apr 24 2011, 04:36 PM) <a href="index.php?act=findpost&pid=1842165"><{POST_SNAPBACK}></a></div><div class='quotemain'>Then why not determine a threshold of number of hydras/turrets/crags in the game at which the server switches from the old script to the new one? Maybe have it set like a home thermostat where there is a midrange that the server does no bother switching in (example: once hitting 60, switch to the new script; once dropping back down to 40, switch back to the old script. Obviously those aren't appropriate numbers though.)

Also this seems to revolve a lot around the idea of static and non-static actors. If I remember correctly, won't most alien structures be mobile? Wouldn't that mean optimization for the sentries and crags would be nearly nullified once moving alien structures are implemented? Or would you just switch the alien structures from the static to the non-static list while it's in transit?

...I had something else to add, but it slipped my mind, so maybe I'll come back and edit this post.</div>
Cause its not worth the effort....

There are much better bang for buck optimisations to be done once the engine is running fast enough it wont' make to much of a difference it think....

Crispy · April 2011

<div class='quotetop'>QUOTE (matso @ Apr 23 2011, 11:37 PM) <a href="index.php?act=findpost&pid=1842132"><{POST_SNAPBACK}></a></div><div class='quotemain'>You always have to balance complexity with performance. The first approach you write should always be the simplest. That allows you to get to a working program the fastest, which will allow you to identify where the bottlenecks are. They are seldom where you think...

The solution you propose is pretty much the standard solution if you have large amounts of objects and need to quickly find out which ones are in range. I'd do something like if I had to deal with 10000 objects or more ... but for an NS2 game which deals with about 200-300 objects, it would be overkill.</div>I think this comes down to the 'rules of optimising' (attributed to Michael Jackson).

Rule 1. Don't
Rule 2. Don't yet (for experts only)

---

This list below expands on the idea above: that it's not worth wasting time on small, focused optimisations until you have a bigger overview of how the game performs. It's better to spend a small time making a large performance gain thyan a lot of time making a small performance gain.

1. Make it work.
2. Make it right (the code is readable [uses IntentionRevealingNames] and every idea is expressed OnceAndOnlyOnce).
3. Make everything work.
4. Make everything right.
5. Use the system and find performance bottlenecks.
6. Use a profiler in those bottlenecks to determine what needs to be optimized.
7. Make it fast. You maintained unit tests, right? Then you can refactor the code mercilessly in order to improve the performance.

On the above list, NS2 is at #3. This is why, in case anyone is wondering, there are areas of the game that are left unoptimised. Max likely committed this change because it took him no time to make and because, as Matso states, it's a simple but very effective optimisation. Because it's simple, it doesn't risk touching a lot of areas of the game so it is a less risky change. Max can also read a quick discussion with several examples of it being benchmarked in real-world scenarios, which give the change validity (and also save Max from doing the same tests).

matso · April 2011

<div class='quotetop'>QUOTE (Steinhauer @ Apr 24 2011, 01:36 AM) <a href="index.php?act=findpost&pid=1842165"><{POST_SNAPBACK}></a></div><div class='quotemain'>Also this seems to revolve a lot around the idea of static and non-static actors. If I remember correctly, won't most alien structures be mobile? Wouldn't that mean optimization for the sentries and crags would be nearly nullified once moving alien structures are implemented? Or would you just switch the alien structures from the static to the non-static list while it's in transit?</div>

Yea. Right now, whips are counted as mobiles because they can be, and I didn't want the extra complexity of having to shift entities from immobile to mobile status. What I would do if you get a lot of stuff that _may_ move, but usually doesn't, is to put them in a third category (mobile/possible/static). The possibles gets stored with their last position, and if they are mobile. The first time each tick a target is asked for, you scan through all the possibly mobiles and check if they have moved. If that changes their status to mobile, it would be added to the mobile list.

The cool thing is that that would be it - you don't need to rebuild the static lists for entities, you can wait until they are forced to do that anyhow. Sooner or later someone does something to force a rebuild anyhow, and in the meantime all you loose is one extra trace for those that had it on their static list... and with static lists usually of length zero anyhow...

Once the entity stops moving and hasn't moved for say 10 sec or so, you flip its mode into static (and now you have to rebuild the static lists).

This strategy is something you could actually use for MAC/Drifters right now, because Drifters/Mac usually stand still for long periods of time. But.. not enough of them - or of whips - to bother. You wouldn't notice any performance difference.

PS. Feeling Agile, Crispy? :-) DS

Steinhauer · April 2011

"Possible" category sounds like a good idea.

That is, unless I decide to be an ass who makes a boatload of MACs and hides them in the abyss :P

ScardyBob · April 2011

<div class='quotetop'>QUOTE (Steinhauer @ Apr 25 2011, 05:05 PM) <a href="index.php?act=findpost&pid=1842367"><{POST_SNAPBACK}></a></div><div class='quotemain'>"Possible" category sounds like a good idea.

That is, unless I decide to be an ass who makes a boatload of MACs and hides them in the abyss :P</div>

You should see how 20 MACs moving at once kills server performance. Besides hydra spam and whip falling in the void, its the most effective method of tanking the server.

Harimau · April 2011

<div class='quotetop'>QUOTE (Crispy @ Apr 24 2011, 06:32 PM) <a href="index.php?act=findpost&pid=1842179"><{POST_SNAPBACK}></a></div><div class='quotemain'>I think this comes down to the 'rules of optimising' (attributed to Michael Jackson).</div>
This guy?
<img src="http://3.bp.blogspot.com/_0zY1_cw8_p0/TCTuSzeZ62I/AAAAAAAAAao/MkjiZW6j8OI/s1600/michael-jackson.jpg" border="0" class="linked-image" />

Skiddywinks · April 2011

<div class='quotetop'>QUOTE (Harimau @ Apr 26 2011, 10:22 AM) <a href="index.php?act=findpost&pid=1842438"><{POST_SNAPBACK}></a></div><div class='quotemain'>This guy?
*snip*</div>

Was thinking the same!

Vic · May 2011

I've noticed a strange behavior in build 175. I dropped a couple hydras in Rockdown west, with the power node still up, but they wouldn't attack it. However once a marine came in west, they started attacking it all of a sudden.

I think this might be a more general issue, I've had the same happen with sentries not attacking hydras in main (presumably also because there were no more aliens in there), while the hydras kept pounding away at the structures and marines in there.

In other words, props for all your great contributions and making this game more enjoyable for all of us :)

matso · May 2011

<div class='quotetop'>QUOTE (Vic @ May 4 2011, 06:26 AM) <a href="index.php?act=findpost&pid=1843946"><{POST_SNAPBACK}></a></div><div class='quotemain'>I've noticed a strange behavior in build 175. I dropped a couple hydras in Rockdown west, with the power node still up, but they wouldn't attack it. However once a marine came in west, they started attacking it all of a sudden.

I think this might be a more general issue, I've had the same happen with sentries not attacking hydras in main (presumably also because there were no more aliens in there), while the hydras kept pounding away at the structures and marines in there.

In other words, props for all your great contributions and making this game more enjoyable for all of us :)</div>

TargetCache in 175 traced improperly when checking for static targets - if the path to a target (or from a shooter) were blocked by anything (like a player, say - the fat arse of a gorge being the usual culprit) when it was dropped, then it marked it as being out of line of sight, and so wouldn't be a target (until the static list was rebuilt for some reason).

Should be fixed in 176 (where it ignores everything but fixed walls when tracing).

Vic · May 2011

Cheers for that, does it mean they can finally see/shoot through railings as well? :)

GetAllScriptActors() are evil, or why hydra spam kills a server

Comments