Proposal for abuse-proof skill system.

moultano · February 2014

Update: I wrote up a much easier to read version of it with some extensions Read this doc instead of all the stuff below.

Warning: math ahead.

I have a proposal for a skill system based on ELO but tweaked for ns2. (I work at Google writing search ranking systems, so I know a little bit how to do this sort of thing.) For the time being I've avoided adding anything having to do with commander skill, instead treating them like just another player, but that would be easy to add.

I believe this formula to be mostly abuse-proof, by which I mean that the only incentive each player has is to play fair games (i.e. not stacked) and to try to win them.

The general framework is to predict the probability of the outcome of the game based on the skill of the players. The update rule for the players' skill levels is determined by the gradient of the error on that prediction.

Notation:
P is the probability of a victory by team 1.
Game_outcome is a variable that is 1 if team 1 wins, 0 if team 1 loses.
s_i is the skill of the ith player.
logit(x) is the logisitic function, log(p / (1-p))
t_i be the time spent in the round, (probably an exponential weighting of each second so that being in the round at the beginning counts more than being in the round at the end. Integrals over exponents are easy, so you can implement this via t_i = a^(-time_the_player_entered) - a^(- time_the_player_quit) where a is some constant that makes sense.)

logit(P) = (SUM_over_team1(t_i * s_i) - SUM_over_team2(t_i * s_i)) / SUM(t_i)
                + logit(winrate_of_team_1s_race)

The error on the prediction is Game_outcome - P

If you’re familiar with the math of logistic regression, the update rule for players on team 1 becomes

s_i <- s_i + t_i / SUM(t_i) * (Game_outcome - P)

and the inverse of this for team 2.

What does this mean in english? We predict the outcome of the game based on the players’ skill and how long they each played in the game. After the game, we update each player’s skill by the product of what fraction of the game they played, and the difference between our prediction and what actually happened.

This system has a bunch of desirable properties:

1. If one team smashes another, but the teams were stacked to make that outcome almost certain, nobody’s skill level changes at the end of the round. Only unexpected victories change skill levels.

2. Nothing you do during the round other than winning the game has any effect on your skill level. This means that there’s no incentive to rack up kills at the end of the game rather than finishing it, or to play skulk instead of gorge, or to go offense rather than build.

3. The effect on your score is determined by the time you spent playing the round rather than your points, with the beginning of the game weighted much higher than the end. This means that it doesn’t harm you to join a game that is already lost, because you’ll have played for a very small fraction of the weighted time spent in the game.

Soul_Rider · February 2014

I don't follow the maths, but the English explanation at the bottom gets my vote

NeXuS · February 2014

While I am a math nerd, I highly doubt many people here will understand advanced statistics to a point that this makes sense. Whether it be they are too young with no college education yet, or just the fact that you wrote formulas using text making a little hard to follow. Is there no way to copy and paste your propsed formula from say Microsoft Word or something to clean it up a bit?

moultano · February 2014

I'll try to do that. I should be able to render it as images from something. It's basically in latex notation if you're familiar.

NeXuS · February 2014

Yeah. I didn't take many statistics classes besides what was required for my major. I focused more on calculus.

I had heard of a proposition for skill that didn't really follow a prediction so to speak. Basically, you averaged the skill of each team. Then depending on the size of the delta, you would gain or lose skill based on a multiplier. So if the delta was high and you lost/performed poorly, you skill would drop more than if the teams were even.

The tough part is how do you measure skill in NS2? Someone can gorge, do nothing but support with usually a mediocre KDA but high score. So basing skill off KDA wouldn't benefit. On the other hand, someone that YOLOs the entire game by himself and doesn't help the team with anything else other than kills shouldn't gain skill because he failed to build structures or weld teammates.

Draconis · February 2014

Time spent is an abusable metric. A player afking random games will gain points in your proposed solution.

For instance, i could launch NS2 and join an unadmined server leave a book on my F1 key which is bound to j1 and leave my computer opened a day and gain points.

Roobubba · February 2014

YES! This is precisely the right way to go about the problem. The probabilistic approach based solely on outcomes makes total sense for NS2.

Two thumbs up from Roo

Benson · February 2014

Draconis wrote: »

Time spent is an abusable metric. A player afking random games will gain points in your proposed solution.

For instance, i could launch NS2 and join an unadmined server leave a book on my F1 key which is bound to j1 and leave my computer opened a day and gain points.

Then perhaps "round time" shuold be used? While a person can still be AFK, there is a strong likelyhood they will be kicked from the active round. Maybe even have a limitation of at least 12 people being in the round for it to count?

soccerguy243 · February 2014

Okay, if UWE ( @hugh ) doesn't pick what the guy from google is putting down that would be a serious blunder.

soccerguy243 · February 2014

Draconis wrote: »

Time spent is an abusable metric. A player afking random games will gain points in your proposed solution.

For instance, i could launch NS2 and join an unadmined server leave a book on my F1 key which is bound to j1 and leave my computer opened a day and gain points.

how will that make a victory... by yourself? You join a match in play... stay afk and get kicked, either automatically, or by vote.
Your trick is hardly abusing the proposed system.

Draconis · February 2014

What i was saying with that example is that only time spent and the outcome is taken into account. The fact that you actually contribute to the success of your team is irrelevant, and can be abused. I didnt thought of getting vote kicked ( not all servers have automatic afk kick for instance ) when writing this example, but i think you got the gist of it. You could also troll a stacked marine team by com'ing and recycling stuff to let your mate on the other team win points.

I am not saying i condone this way of thinking, but it is just examples on how to abuse the proposed system.

BeigeAlert · February 2014

I would also like to see this system take into account the team sizes. As I understand it, this game is optimized to be the most balanced between 6-8 players per team... and it seems to skew marine more and more the higher up that player count goes. For example, the 42-slot server (yes, THAT one

) tends to skew marine win because half the team goes grenade launcher. In a smaller team, that would be very, very bad as there would be almost no players left to defend the gls... but with this many people, the gl becomes extremely practical (and overpowered imo). So perhaps these matches, while incredibly fun and different at times, should have much less weight than a serious 8v8 match.

Nordic · February 2014

Because you are afk'ing whatever team you are on is more likely to lose, although less so on high player count servers, it should still show.

Disclaimer

moultano · February 2014

Draconis wrote: »

What i was saying with that example is that only time spent and the outcome is taken into account. The fact that you actually contribute to the success of your team is irrelevant, and can be abused.

The player is having an effect on the outcome by being afk though. You're making it more likely that your team will lose because they are down a player, so you'd expect to lose points over time. If you are afking and the rest of the team is Godar, then sure, you'll benefit, but I don't think this is a common case.

BeigeAlert wrote: »

I would also like to see this system take into account the team sizes. As I understand it, this game is optimized to be the most balanced between 6-8 players per team... and it seems to skew marine more and more the higher up that player count goes.

That makes sense, and is easy to incorporate. You just change your prediction for the winrate of each race based on the game size, and the rest of the model works as expected.

Nordic · February 2014

How will it work with varying game sizes. Say the game starts as a 4v4 with people seeding and ends as a 12v12 30 minutes in. Maybe a less extreme case of starting as a 9v9 and turning into a 12v12.

SanCo · February 2014

Why? The skill system is pointless regardless unless something new is coming with the patch.

amoral · February 2014

the only problem issue I see, is that it isn't really accomplishing anything a straight elo system isn't. the constraining factor is always going to be, "given enough time" and "your friends are better or worse than you". first is that any El
elo system based on wins and losses should approach the correct rankings givenenough games, but we might not have the ppopulation to get there. the second being this does nothing to accurately rank people who play with other people exclusively.

OnosFactory · February 2014

I would heart that post if I weren't limited to Awesome; "Only unexpected victories change skill levels" - groovy, applicable to modern gaming though? Ahhh well, I heart UWE too.
"or to go offense rather than build." Is this applicable to modern 20 player public servers? "very small fraction of the weighted time spent in the game" Thank you, I thought the opposite.

Roobubba · February 2014

amoral wrote: »

the only problem issue I see, is that it isn't really accomplishing anything a straight elo system isn't. the constraining factor is always going to be, "given enough time" and "your friends are better or worse than you". first is that any El
elo system based on wins and losses should approach the correct rankings givenenough games, but we might not have the ppopulation to get there. the second being this does nothing to accurately rank people who play with other people exclusively.

This is where it is clever to use the probabilistic approach. Those people who only ever play comp are not in fact excluded, provided that they play with / against some people who don't exclusively play comp games. It will take longer to get reliable rankings, sure, but that is a cheap price to pay. It may also be reasonable to make special provisions for games running the NSL mod, but I'm not convinced that's actually necessary.

Zinkey · February 2014

Great post, this is the only type of ranking system that would make any sense. I had a discussion along similar lines with a couple other PTs the other day looking at a system that is used for ranking and matchmaking in Xbox Live. The system is called TrueSkill and there is a lot of resources about the maths behind it you can easily find with a quick google. I did try reading though it myself but similarly to @NeXuS I skipped all the stats I could at uni in favour of anything else so more advanced stats takes a bit of time for me to digest

So @moultano Im not sure if you looked into this during your research but it may be worth a read if not, there are a lot of frameworks for rating/matchmaking systems coded up which you can play around with.

With regards to peoples comments about exploiting/breaking this system, it is pretty much impossible to have a system that isnt in some way exploitable. As long as someone knows how its calculated its not hard to work out the most effective way to score highly by just ticking the right boxes. With that being said in any game the only thing that makes sense is awarding skill for a victory, and then using the skill system to weight skill gained/lost based on how likely it was for that team to win. So (in a very crude explanation

) if you have a person/group of persons farming skill they will build up a skill level, but if they are playing games against teams with a skill level that indicates they are likely to win, their skill gain from winning is negligible. This leads to the most effective way to skill up being playing games against equal (or above) their skill level. This is what a system like Elo is designed to achieve but it falls down in stuff like NS2 as Elo is designed for chess, a 1v1 scenario. Again another reason why looking into TrueSkill can be really insightful as its designed for team games.

In short however no skill system will be perfect. But provided it rewards players for playing against the odds and doesnt penalise players for losing a game they stood no chance in, it will be a much more useful metric than what we have now.

SupaDupaNoodle · February 2014

I think this is a great idea and a great step to try and make sure skillpoints/levels are awarded justly. But I do wonder what the point of the skillpoints/levels setup is? I mean unless you are going to specifically have a skill minimum for particular servers to prevent rookie infestations at sale time. I guess the mythical matchmaking system could use the skill system too. Until that is implemented however, I don't care for nor see any use for the skill system that we have in NS2.

xen32 · February 2014

Well, looks fair

Golden · February 2014

This is the type of system that I would like to see implemented.

To all those saying that it doesn't accurately represent a players impact on the game... over time it will. With a large number of games played, a player should have a consistent impact on the games they're in and their 'skill rating' should reflect that impact when compared to other players.

@joshhh and @GORGEous will be very happy that this system is weighted. They tell me every day that it is their biggest gripe with the current system.

MuckyMcFly · February 2014

I want to believe this will work, but as a player who tends to either jump in the random portal or wait for a 'randomize ready room' vote this means on an unlucky roll streak I wont be getting any reward (not that I care about skill level) ?

Zinkey · February 2014

MuckyMcFly wrote: »

I want to believe this will work, but as a player who tends to either jump in the random portal or wait for a 'randomize ready room' vote this means on an unlucky roll streak I wont be getting any reward (not that I care about skill level) ?

In a system like this you shouldnt lose skill for playing with a team with far lower skill level than yours against a team with higher skill level. As in this scenario the expected outcome is a loss for you hence that happening is not unusual so skill ratings remain mostly unchanged. However if you do win the fact thats unexpected should reward you for winning the game against the "skill odds" and increase your ratings accordingly.

_INTER_ · February 2014

@moultano Great idea. Especially

3. The effect on your score is determined by the time you spent playing the round rather than your points, with the beginning of the game weighted much higher than the end. This means that it doesn’t harm you to join a game that is already lost, because you’ll have played for a very small fraction of the weighted time spent in the game.

I have a few questions:
- How would you initialize the whole system or any new player that join NS2? P = 0.5 (50%)? What about s_i?
- winrate_of_team_1s_race, whats that? Winrate Marine vs Alien? How would you measure that? UWE probably has the numbers for its servers, but how accurate would that be?

GISP · February 2014

Wooo Maths! - enjoy

joshhh · February 2014

Golden wrote: »

This is the type of system that I would like to see implemented.

To all those saying that it doesn't accurately represent a players impact on the game... over time it will. With a large number of games played, a player should have a consistent impact on the games they're in and their 'skill rating' should reflect that impact when compared to other players.

@joshhh and @GORGEous will be very happy that this system is weighted. They tell me every day that it is their biggest gripe with the current system.

@Golden, don't lie... you love the current system.

NS-Soldier · February 2014

people will just go back to ready room to join other team or just disconnect from server

Roobubba · February 2014

@NS-Soldier,

yeah I'm not sure you quite understood it... Besides, if you think those things don't happen now, you'd be wrong. The glory of a weighted time-in-round game is that the stats are already 90% recorded by the time you even know what the outcome is. And if it's an obvious stomp, it won't make any difference to your stats anyway (nor the stompers' stats).

Essentially: there is nothing to lose from being on the losing team in a stacked game. There is nothing to gain from being on the winning team in a stacked game. Winning against a skill-stacked opposition would give the underdogs a big boost. If you're on a skill-stacked team, best watch out for base-rushes

Careful handling of commander or part-round commanders would need to be included, though.

moultano · February 2014

Roobubba wrote: »

amoral wrote: »

the only problem issue I see, is that it isn't really accomplishing anything a straight elo system isn't. the constraining factor is always going to be, "given enough time" and "your friends are better or worse than you". first is that any El
elo system based on wins and losses should approach the correct rankings givenenough games, but we might not have the ppopulation to get there. the second being this does nothing to accurately rank people who play with other people exclusively.

This is where it is clever to use the probabilistic approach. Those people who only ever play comp are not in fact excluded, provided that they play with / against some people who don't exclusively play comp games. It will take longer to get reliable rankings, sure, but that is a cheap price to pay. It may also be reasonable to make special provisions for games running the NSL mod, but I'm not convinced that's actually necessary.

If you wanted to fix this issue (and in general increase the speed of convergence) you could run the algorithm iteratively rather than having a single update for each game. The process is.

1. Predict the outcome of all the games in the database
2. Update skill levels based on the new predictions and the actual outcomes. (and repeat.)

This will cause the skill levels to converge to the most accurate possible values given the data available. The other nice consequence is that it doesn't take many games between nsl folks and pubbers to realize that that subgraph is of much higher skill level. As the algorithm runs, skill will flow across those wins and increase the skill of the whole nsl subgraph.

There are a few issues with doing this though. The model becomes less predictable, because playing a single game changes the graph structure, so can cause swings of skill level rather than just small updates. (Even if these swings are on average more accurate.) Another issue is that you have to have appropriate priors in place so that someone who has only won games (but played few of them) doesn't end up with infinite skill.

_INTER_ wrote: »

I have a few questions:
- How would you initialize the whole system or any new player that join NS2? P = 0.5 (50%)? What about s_i?
- winrate_of_team_1s_race, whats that? Winrate Marine vs Alien? How would you measure that? UWE probably has the numbers for its servers, but how accurate would that be?

That's a good question. I'd probably initialize the skill levels to be proportional to log(1+playtime), so if you've played for 1000 hours, you're worth 3 of someone who has played for 10 hours by default, and on someone's first game, they are effectively considered afk (0 skill).

You'd want to make 0 skill the minimum I think, and this would have the nice property that when someone first joins the game, their skill can only go up, giving them a nice incentive to keep at it.

Proposal for abuse-proof skill system.

Comments