Proposal for abuse-proof skill system.
moultano
Creator of ns_shiva. Join Date: 2002-12-14 Member: 10806Members, NS1 Playtester, Contributor, Constellation, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Gold, NS2 Community Developer, Pistachionauts
Update: I wrote up a much easier to read version of it with some extensions Read this doc instead of all the stuff below.
Warning: math ahead.
I have a proposal for a skill system based on ELO but tweaked for ns2. (I work at Google writing search ranking systems, so I know a little bit how to do this sort of thing.) For the time being I've avoided adding anything having to do with commander skill, instead treating them like just another player, but that would be easy to add.
I believe this formula to be mostly abuse-proof, by which I mean that the only incentive each player has is to play fair games (i.e. not stacked) and to try to win them.
The general framework is to predict the probability of the outcome of the game based on the skill of the players. The update rule for the players' skill levels is determined by the gradient of the error on that prediction.
Notation:
P is the probability of a victory by team 1.
Game_outcome is a variable that is 1 if team 1 wins, 0 if team 1 loses.
s_i is the skill of the ith player.
logit(x) is the logisitic function, log(p / (1-p))
t_i be the time spent in the round, (probably an exponential weighting of each second so that being in the round at the beginning counts more than being in the round at the end. Integrals over exponents are easy, so you can implement this via t_i = a^(-time_the_player_entered) - a^(- time_the_player_quit) where a is some constant that makes sense.)
The error on the prediction is Game_outcome - P
If you’re familiar with the math of logistic regression, the update rule for players on team 1 becomes
and the inverse of this for team 2.
What does this mean in english? We predict the outcome of the game based on the players’ skill and how long they each played in the game. After the game, we update each player’s skill by the product of what fraction of the game they played, and the difference between our prediction and what actually happened.
This system has a bunch of desirable properties:
1. If one team smashes another, but the teams were stacked to make that outcome almost certain, nobody’s skill level changes at the end of the round. Only unexpected victories change skill levels.
2. Nothing you do during the round other than winning the game has any effect on your skill level. This means that there’s no incentive to rack up kills at the end of the game rather than finishing it, or to play skulk instead of gorge, or to go offense rather than build.
3. The effect on your score is determined by the time you spent playing the round rather than your points, with the beginning of the game weighted much higher than the end. This means that it doesn’t harm you to join a game that is already lost, because you’ll have played for a very small fraction of the weighted time spent in the game.
Warning: math ahead.
I have a proposal for a skill system based on ELO but tweaked for ns2. (I work at Google writing search ranking systems, so I know a little bit how to do this sort of thing.) For the time being I've avoided adding anything having to do with commander skill, instead treating them like just another player, but that would be easy to add.
I believe this formula to be mostly abuse-proof, by which I mean that the only incentive each player has is to play fair games (i.e. not stacked) and to try to win them.
The general framework is to predict the probability of the outcome of the game based on the skill of the players. The update rule for the players' skill levels is determined by the gradient of the error on that prediction.
Notation:
P is the probability of a victory by team 1.
Game_outcome is a variable that is 1 if team 1 wins, 0 if team 1 loses.
s_i is the skill of the ith player.
logit(x) is the logisitic function, log(p / (1-p))
t_i be the time spent in the round, (probably an exponential weighting of each second so that being in the round at the beginning counts more than being in the round at the end. Integrals over exponents are easy, so you can implement this via t_i = a^(-time_the_player_entered) - a^(- time_the_player_quit) where a is some constant that makes sense.)
logit(P) = (SUM_over_team1(t_i * s_i) - SUM_over_team2(t_i * s_i)) / SUM(t_i) + logit(winrate_of_team_1s_race)
The error on the prediction is Game_outcome - P
If you’re familiar with the math of logistic regression, the update rule for players on team 1 becomes
s_i <- s_i + t_i / SUM(t_i) * (Game_outcome - P)
and the inverse of this for team 2.
What does this mean in english? We predict the outcome of the game based on the players’ skill and how long they each played in the game. After the game, we update each player’s skill by the product of what fraction of the game they played, and the difference between our prediction and what actually happened.
This system has a bunch of desirable properties:
1. If one team smashes another, but the teams were stacked to make that outcome almost certain, nobody’s skill level changes at the end of the round. Only unexpected victories change skill levels.
2. Nothing you do during the round other than winning the game has any effect on your skill level. This means that there’s no incentive to rack up kills at the end of the game rather than finishing it, or to play skulk instead of gorge, or to go offense rather than build.
3. The effect on your score is determined by the time you spent playing the round rather than your points, with the beginning of the game weighted much higher than the end. This means that it doesn’t harm you to join a game that is already lost, because you’ll have played for a very small fraction of the weighted time spent in the game.
Comments
I had heard of a proposition for skill that didn't really follow a prediction so to speak. Basically, you averaged the skill of each team. Then depending on the size of the delta, you would gain or lose skill based on a multiplier. So if the delta was high and you lost/performed poorly, you skill would drop more than if the teams were even.
The tough part is how do you measure skill in NS2? Someone can gorge, do nothing but support with usually a mediocre KDA but high score. So basing skill off KDA wouldn't benefit. On the other hand, someone that YOLOs the entire game by himself and doesn't help the team with anything else other than kills shouldn't gain skill because he failed to build structures or weld teammates.
For instance, i could launch NS2 and join an unadmined server leave a book on my F1 key which is bound to j1 and leave my computer opened a day and gain points.
Two thumbs up from Roo
Then perhaps "round time" shuold be used? While a person can still be AFK, there is a strong likelyhood they will be kicked from the active round. Maybe even have a limitation of at least 12 people being in the round for it to count?
how will that make a victory... by yourself? You join a match in play... stay afk and get kicked, either automatically, or by vote.
Your trick is hardly abusing the proposed system.
I am not saying i condone this way of thinking, but it is just examples on how to abuse the proposed system.
Disclaimer
That makes sense, and is easy to incorporate. You just change your prediction for the winrate of each race based on the game size, and the rest of the model works as expected.
elo system based on wins and losses should approach the correct rankings givenenough games, but we might not have the ppopulation to get there. the second being this does nothing to accurately rank people who play with other people exclusively.
"or to go offense rather than build." Is this applicable to modern 20 player public servers? "very small fraction of the weighted time spent in the game" Thank you, I thought the opposite.
With regards to peoples comments about exploiting/breaking this system, it is pretty much impossible to have a system that isnt in some way exploitable. As long as someone knows how its calculated its not hard to work out the most effective way to score highly by just ticking the right boxes. With that being said in any game the only thing that makes sense is awarding skill for a victory, and then using the skill system to weight skill gained/lost based on how likely it was for that team to win. So (in a very crude explanation ) if you have a person/group of persons farming skill they will build up a skill level, but if they are playing games against teams with a skill level that indicates they are likely to win, their skill gain from winning is negligible. This leads to the most effective way to skill up being playing games against equal (or above) their skill level. This is what a system like Elo is designed to achieve but it falls down in stuff like NS2 as Elo is designed for chess, a 1v1 scenario. Again another reason why looking into TrueSkill can be really insightful as its designed for team games.
In short however no skill system will be perfect. But provided it rewards players for playing against the odds and doesnt penalise players for losing a game they stood no chance in, it will be a much more useful metric than what we have now.
To all those saying that it doesn't accurately represent a players impact on the game... over time it will. With a large number of games played, a player should have a consistent impact on the games they're in and their 'skill rating' should reflect that impact when compared to other players.
@joshhh and @GORGEous will be very happy that this system is weighted. They tell me every day that it is their biggest gripe with the current system.
In a system like this you shouldnt lose skill for playing with a team with far lower skill level than yours against a team with higher skill level. As in this scenario the expected outcome is a loss for you hence that happening is not unusual so skill ratings remain mostly unchanged. However if you do win the fact thats unexpected should reward you for winning the game against the "skill odds" and increase your ratings accordingly.
I have a few questions:
- How would you initialize the whole system or any new player that join NS2? P = 0.5 (50%)? What about s_i?
- winrate_of_team_1s_race, whats that? Winrate Marine vs Alien? How would you measure that? UWE probably has the numbers for its servers, but how accurate would that be?
@Golden, don't lie... you love the current system.
yeah I'm not sure you quite understood it... Besides, if you think those things don't happen now, you'd be wrong. The glory of a weighted time-in-round game is that the stats are already 90% recorded by the time you even know what the outcome is. And if it's an obvious stomp, it won't make any difference to your stats anyway (nor the stompers' stats).
Essentially: there is nothing to lose from being on the losing team in a stacked game. There is nothing to gain from being on the winning team in a stacked game. Winning against a skill-stacked opposition would give the underdogs a big boost. If you're on a skill-stacked team, best watch out for base-rushes
Careful handling of commander or part-round commanders would need to be included, though.
If you wanted to fix this issue (and in general increase the speed of convergence) you could run the algorithm iteratively rather than having a single update for each game. The process is.
1. Predict the outcome of all the games in the database
2. Update skill levels based on the new predictions and the actual outcomes. (and repeat.)
This will cause the skill levels to converge to the most accurate possible values given the data available. The other nice consequence is that it doesn't take many games between nsl folks and pubbers to realize that that subgraph is of much higher skill level. As the algorithm runs, skill will flow across those wins and increase the skill of the whole nsl subgraph.
There are a few issues with doing this though. The model becomes less predictable, because playing a single game changes the graph structure, so can cause swings of skill level rather than just small updates. (Even if these swings are on average more accurate.) Another issue is that you have to have appropriate priors in place so that someone who has only won games (but played few of them) doesn't end up with infinite skill.
That's a good question. I'd probably initialize the skill levels to be proportional to log(1+playtime), so if you've played for 1000 hours, you're worth 3 of someone who has played for 10 hours by default, and on someone's first game, they are effectively considered afk (0 skill).
You'd want to make 0 skill the minimum I think, and this would have the nice property that when someone first joins the game, their skill can only go up, giving them a nice incentive to keep at it.