I would suggest that OP refrain from using terms like 'abuse' otherwise @IronHorse might get confused. Thank god you didn't characterise your proposed system as 'unexploitable'.
people will just go back to ready room to join other team or just disconnect from server
This begs the question: how does the proposed system handle teamswitches or d/cs before round finish? Does it still update your score based on the time you played on the team even if you're not in that team or in the game at round end? If so, this might not be so much of a problem.
If not, measures could still be put in place to mitigate manipulation due to d/c or teamswitch - if a d/c or teamswitch is performed >x minutes/seconds before round end (resulting in a loss of your said team) then your skill will still be updated (based on the original team).
That makes sense, and is easy to incorporate. You just change your prediction for the winrate of each race based on the game size, and the rest of the model works as expected.
Definitely make maps a parameter in this too, as winrates will be different on different maps with different size teams. I.e. winrate(refinery,6,6) =/= winrate(veil,6,6) just as winrate(refinery,6,6) =/= winrate(refinery,12,12).
Also, it might be worth seeing if it's possible to incorporate skill in there too (in determining what the 'winrate' is). Competitive winrates can be different to pub winrates, not just because it's 6v6. If you can determine relatively accurately what the winrate should be based on skill levels in the server, said winrate would be more a more faithful measure (than one that doesn't consider skill) for what you're trying to acheive - which I assume is to account for the effect that being on either alien or marine team has on your chance of success.
However, I guess this may run into a few problems, one of which being that winrates based on skill might be skewed by historical data where the skill levels are not yet updated to where they need to be to accurately reflect the players' actual skill (although I guess this would become less of a problem as the skills update over time). Another issue is that adding too many parameters, especially one that is procedurally updated like skill, might overconstrain the winrate calculation (such that you might be better of just sticking with map and playercount). Having said that, I could be overcomplicating it and/or what I just said might not even make sense - need to think about it a bit more.
This begs the question: how does the proposed system handle teamswitches or d/cs before round finish? Does it still update your score based on the time you played on the team even if you're not in that team or in the game at round end? If so, this might not be so much of a problem.
That's how I would think you would do it. If you switch teams after the game has been going for a while, you'll get most of the credit for the team you were on first, since the exponentially weighted bulk of the time was played on that team. You'd just need to track every time the player joins or leaves a team.
Definitely make maps a parameter in this too, as winrates will be different on different maps with different size teams. I.e. winrate(refinery,6,6) =/= winrate(veil,6,6) just as winrate(refinery,6,6) =/= winrate(refinery,12,12).
Also, it might be worth seeing if it's possible to incorporate skill in there too (in determining what the 'winrate' is). Competitive winrates can be different to pub winrates, not just because it's 6v6. If you can determine relatively accurately what the winrate should be based on skill levels in the server, said winrate would be more a more faithful measure (than one that doesn't consider skill) for what you're trying to acheive - which I assume is to account for the effect that being on either alien or marine team has on your chance of success.
However, I guess this may run into a few problems, one of which being that winrates based on skill might be skewed by historical data where the skill levels are not yet updated to where they need to be to accurately reflect the players' actual skill (although I guess this would become less of a problem as the skills update over time). Another issue is that adding too many parameters, especially one that is procedurally updated like skill, might overconstrain the winrate calculation (such that you might be better of just sticking with map and playercount). Having said that, I could be overcomplicating it and/or what I just said might not even make sense - need to think about it a bit more.
Yeah, there's lots you could do here, but you're limited by the amount of data you have. In addition, any system like this has multiple goals: You want it to accurately track player skill. You want to encourage the right behaviors. And you want the players to trust it. The biggest downside in my mind of making it complicated is that players won't trust it as much because they can't determine just by looking at the deltas that it's doing something reasonable.
That said, the nice thing about framing it as a logistic prediction model is that you can basically throw in a term for whatever feature you want, and learn its coefficient from the data using exactly the same math you use to determine the players skill.
Great post, this is the only type of ranking system that would make any sense. I had a discussion along similar lines with a couple other PTs the other day looking at a system that is used for ranking and matchmaking in Xbox Live. The system is called TrueSkill and there is a lot of resources about the maths behind it you can easily find with a quick google. I did try reading though it myself but similarly to @NeXuS I skipped all the stats I could at uni in favour of anything else so more advanced stats takes a bit of time for me to digest So @moultano Im not sure if you looked into this during your research but it may be worth a read if not, there are a lot of frameworks for rating/matchmaking systems coded up which you can play around with.
Yes, I'm a little familiar with TrueSkill. It's unfortunately patented I believe, so I'm not inclined to learn more about it (despite it being basic statistics ... grrr.) The main "innovation" in trueskill is that it tracks both the mean and variance of the player's skill, and updates them over time. The quantitative effect of this is that when a new player joins, rather than the model saying "we think this player is bad because they are new" the model says "we have no idea whether this player is good or bad". As a result, for the first several games the new player plays, Trueskill updates the new player's skill level very quickly, but not that of their opponents. Other than that, the main difference between it and some other systems is that it uses the gaussian distribution rather than the logistic distribution (which I've used here.) I think that is a bad choice, but it's really a matter of taste.
2. When it does, make a big-ass full-page news post in-game so people are aware stacking now works against them.
3.Profit! Teams will be better balanced = game is much more fun for all involved = game has longer life.
For me, personally, I find a game enjoyable about one in five. Usually the cause of the not-fun games is skill stacking -- often not deliberately, it's worth noting. People are voting or going random. And I think that's the real gem of this proposal: it'll help ensure genuinely balanced teams.
Because ultimately we play games for fun, and when they're not fun anymore, we stop. Getting onto NS2 with a 20% chance of a good game when I could load up another game with a 80% -100% chance, and the result is -- for me at least -- I find myself playing NS2 less and less.
IronHorseDeveloper, QA Manager, Technical Support & contributorJoin Date: 2010-05-08Member: 71669Members, Super Administrators, Forum Admins, Forum Moderators, NS2 Developer, NS2 Playtester, Squad Five Blue, Subnautica Playtester, Subnautica PT Lead, Pistachionauts
Great work!
This, combined with a more restricted version of the official forced balanced teams vote would really do wonders in most pubs.
Not to mention it could seriously aid in the matchmaking system
Now lets see who we can convince to make this official..
@d0ped0g Don't worry, unlike you, moultano doesn't jump into a topic using the same terms.. but meaning something completely different than what was being discussed and then get angry when no one else is tracking. Also yes, so far his system is pretty close to "unexploitable" in the meaning that I used for 3 threads and over 15 posts. Unlike the current system
Umm, no. You just kept derailing because you didn't agree with me. It was obvious from the context how I was using a term. Tbh, I thought you were trolling. It wasted a lot of time that could have gone into constructive discussion. For what? Getting pedantic about what words mean? Even after I explicitly stated my intended definition and that it coincides with the dictionary.
Good to know that you yourself can freely characterise a skill rank system as exploitable at your own leisure (while jumping on rookie friendly servers isn't considered an ns2 'exploit') - but as soon as another forum member uses the same dictionary definition to describe a voting system you jump down their throats that they're not specifically describing something that exists on a list of exploits that could get you thrown off a server. Maybe I would have gotten a pass if I was in the PT group...
2. When it does, make a big-ass full-page news post in-game so people are aware stacking now works against them.
3.Profit! Teams will be better balanced = game is much more fun for all involved = game has longer life.
For me, personally, I find a game enjoyable about one in five. Usually the cause of the not-fun games is skill stacking -- often not deliberately, it's worth noting. People are voting or going random. And I think that's the real gem of this proposal: it'll help ensure genuinely balanced teams.
Because ultimately we play games for fun, and when they're not fun anymore, we stop. Getting onto NS2 with a 20% chance of a good game when I could load up another game with a 80% -100% chance, and the result is -- for me at least -- I find myself playing NS2 less and less.
And that makes me a sad fatty/puppy :o3
(need gorge emoticons ftw)
Yup. It needs to happen. The sooner the better. But also shouldn't be rushed. I think there's still room for discussion at how it should be implemented. Although ultimately moultano should do as he pleases with it, as he's the mastermind of the operation and as a google engineer probably knows best with regard to these type of systems (i.e. shouldn't let himself be swayed by idiots like me suggesting skill as a parameter - having thought on that it's a bad idea).
I think when/if it does get released UWE should be careful about characterising it as something that will solve balance/stacking issues immediately, as it won't plonk out accurate skill scores right away and will need time to develop a more educated approximation of players skills. I do think we will see some minor improvements to the stacking problem as people become more adamant about ensuring an evenly skilled game (as you said though, stacking isn't often deliberate, so will still happen).
If I understand it correctly, under this skill system, stacking doesn't necessarily work against a player who stacks. But a player who stacks won't be able to increase his/her skill rating. Maybe that's what you meant by 'stacking now works against them' though.
Great post, this is the only type of ranking system that would make any sense. I had a discussion along similar lines with a couple other PTs the other day looking at a system that is used for ranking and matchmaking in Xbox Live. The system is called TrueSkill and there is a lot of resources about the maths behind it you can easily find with a quick google. I did try reading though it myself but similarly to @NeXuS I skipped all the stats I could at uni in favour of anything else so more advanced stats takes a bit of time for me to digest So @moultano Im not sure if you looked into this during your research but it may be worth a read if not, there are a lot of frameworks for rating/matchmaking systems coded up which you can play around with.
Yes, I'm a little familiar with TrueSkill. It's unfortunately patented I believe, so I'm not inclined to learn more about it (despite it being basic statistics ... grrr.) The main "innovation" in trueskill is that it tracks both the mean and variance of the player's skill, and updates them over time. The quantitative effect of this is that when a new player joins, rather than the model saying "we think this player is bad because they are new" the model says "we have no idea whether this player is good or bad". As a result, for the first several games the new player plays, Trueskill updates the new player's skill level very quickly, but not that of their opponents. Other than that, the main difference between it and some other systems is that it uses the gaussian distribution rather than the logistic distribution (which I've used here.) I think that is a bad choice, but it's really a matter of taste.
You are correct it is patented so thats that out of the window. The only other thing I noticed when researching TrueSkill was how it dealt with team games. While I think your model is definitely the right kind of idea I am curious as to how you would take into account the team aspects. So this probability P of a team winning, how is this calculated and how does it deal with teams changing i.e players leaving/joining. This also complicates when you take into account commanders, obviously commanding skill would be tracked separately (possibly even Marine/Alien and MarineCom/AlienCom all separately), but how you use these skill ratings to determine P is difficult. While all of the calculations made after the game is over seem pretty solid, they do rely on P being accurate, and if this isnt the case then while its not really abusable its not really a useful rating.
Take for example a game where you have 3-4 relatively good marines vs a purely average team of aliens. If the skill rating is the team is just an average of the players (including the comm) then one would expect P to be in favor of the marines as they have the 3-4 better players. But then their commander doesnt drop any meds, no upgrades, only builds 2RTs and tries to turtle with turrets while the aliens tech up and take the rest of the map. I have seen many games play out like this where a better skilled team loses based purely on the fact their commander is off in la la land playing on his own. While yes its true this players skill is likely to be lower and this would be reflected in even a basic average skill being used for P, I would argue that the commander is the one role in the game where having someone awful completely dooms a team, unless they are Godar in a pub game could win in 60 seconds on W0A0 with no meds/ammo
In short, what are your thoughts on how P should be calculated?
IronHorseDeveloper, QA Manager, Technical Support & contributorJoin Date: 2010-05-08Member: 71669Members, Super Administrators, Forum Admins, Forum Moderators, NS2 Developer, NS2 Playtester, Squad Five Blue, Subnautica Playtester, Subnautica PT Lead, Pistachionauts
edited February 2014
@d0ped0g
This is the last time i'll be responding to your bait. Because you are obviously just derailing - why else would you call my attention using @ironhorse , to your obvious jab, in an unrelated thread?
Another user and myself were speaking about a topic.. one in which spanned multiple threads and posts. (go ahead, search my comment history)
You, obviously not tracking this, jumped into the conversation replying and defending the other user but using a completely different meaning to the word that we had been using. Any other usage of that word had zero bearing on the debate we were having.
Pretending that your mistaken understanding of the usage at hand was of no concern, by saying that it's "pedantic" to suggest otherwise, reminds me of one of my favorite movie quotes:
"I want you out of my house!"
"This is not your house."
"Don't play semantics with me!"
You thought i was a troll for continuing to argue about something you obliviously jumped into without understanding the context - and I thought you were attempting to suddenly use a different definition of a term once you were found to have lost the debate.
Long story short: Its over, so get over it and cease derailing / attempting to stir it up.
I left the conversation in hopes you saw that too.. i get you're upset by it, but 1) you can leave it in that thread instead of calling for my attention in other threads or 2) you can always PM me. Neither of those are requests.
moultanoCreator of ns_shiva.Join Date: 2002-12-14Member: 10806Members, NS1 Playtester, Contributor, Constellation, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Gold, NS2 Community Developer, Pistachionauts
edited February 2014
I wrote up a google doc where I could actually put in formulas so that it's much easier to read. I also extended it to account for commanders, different map balance, and faster convergence using a prior.
that's looking good. About priors, do we have the data in hive available to use as a training set? You might be able to use a portion of past data as a training set (ie to generate your initial values) and validate the model against other historical data before we even go live. If that validation looks good, then it might be valid to use all historical data as a training set to generate current skill values for all players as the starting point. You'd need someone at UWE to make those data available to you, though...
*What about pseudo random ELO based distribution ? One way or another somebody will do some mod for that if not UWE. What should they do (i'm not a "random mod" author) ?
*What about that guy that has 500+ hours but still don't get a single F*** about the game except "shoot the skulk" ? Han han, they are many and more than you think. It may trick the prediction into false assumptions. There will be always and forever that kind of jerks. That is why i vote on making combat an official thing. So frag counters can frag...
If one team smashes another, but the teams were stacked to make that outcome almost certain, nobody’s skill level changes at the end of the round. Only unexpected victories change skill levels.
This won't avoid stacking. This will not render this "not exiting" for some.
In fact i think you should add a "Tenacity" parameter. If the stacked team can't "finish"; the losing team still get something. Because they stayed until the end and / or they resisted facing inevitable death. They would obtain more if the skill gap is bigger between the 2 teams.
Some "stackers" start to make things longer than it is needed (like shooting opponent instead of hive / base while it is clearly unnecessary : 1% life etc.). I think that it would be better to see this kind of game (which will happen whatever the skill systems used) end fast. So nobody really gain from that except the ppl who can behave properly (at least not like jerks).
3. A good commander with a bad team vs a bad commander with a good team is expected to be an even match.
I wouldn't say that. Both teams have different needs from the commander. So this example doesn't work.
that's looking good. About priors, do we have the data in hive available to use as a training set? You might be able to use a portion of past data as a training set (ie to generate your initial values) and validate the model against other historical data before we even go live. If that validation looks good, then it might be valid to use all historical data as a training set to generate current skill values for all players as the starting point. You'd need someone at UWE to make those data available to you, though...
I would to get the data to play with. Particularly if I could validate a model that treats the commander's skill in some more complicated way relative to the team.
@d0ped0g
This is the last time i'll be responding to your bait. Because you are obviously just derailing - why else would you call my attention using @ironhorse , to your obvious jab, in an unrelated thread?
Another user and myself were speaking about a topic.. one in which spanned multiple threads and posts. (go ahead, search my comment history)
You, obviously not tracking this, jumped into the conversation replying and defending the other user but using a completely different meaning to the word that we had been using. Any other usage of that word had zero bearing on the debate we were having.
Pretending that your mistaken understanding of the usage at hand was of no concern, by saying that it's "pedantic" to suggest otherwise, reminds me of one of my favorite movie quotes:
"I want you out of my house!"
"This is not your house."
"Don't play semantics with me!"
You thought i was a troll for continuing to argue about something you obliviously jumped into without understanding the context - and I thought you were attempting to suddenly use a different definition of a term once you were found to have lost the debate.
Long story short: Its over, so get over it and cease derailing / attempting to stir it up.
I left the conversation in hopes you saw that too.. i get you're upset by it, but 1) you can leave it in that thread instead of calling for my attention in other threads or 2) you can always PM me. Neither of those are requests.
can I just call you both drama queens and be done with it?
that's looking good. About priors, do we have the data in hive available to use as a training set? You might be able to use a portion of past data as a training set (ie to generate your initial values) and validate the model against other historical data before we even go live. If that validation looks good, then it might be valid to use all historical data as a training set to generate current skill values for all players as the starting point. You'd need someone at UWE to make those data available to you, though...
I would to get the data to play with. Particularly if I could validate a model that treats the commander's skill in some more complicated way relative to the team.
Somehow i came into access of a large db with stats of various ns2 rounds called ns2stats.com . Look here https://docs.google.com/document/d/13v9TF56gqykSyg0uysIzeZlEQmJL5ge5wLfM-uRaAwo/ for the wepapi, it allows you to dump all infos you need directly from the db. ( 'PlayerRound' and 'Round' would be the tables you need mainly )
If you need futher more access you might want to contact @Sint (a.k.a Synomi). I think he will be able to help you with all data you need for this
If you need futher more access you might want to contact @Sint (a.k.a Synomi). I think he will be able to help you with all data you need for this
I pinged him to see if I can get a dump of the database.
In the meantime, I fleshed out the bit on running the algorithm iteratively and including a gamma distribution prior. This will let us get accurate skill values much more quickly.
I see no reason why this ranking system shouldn't be added to the game. It's accurate while also removing the incentives of team stacking and kill farming. It might be a bit wonky at first as it determines everyone's initial skill value but it's pretty much exactly what NS2 needs.
One possible thing that could be added: accounting for commander skill. It's not necessarily the worst thing in the world if they are treated like another player, but given their bigger influence on the game (ie a bad commander can pretty much single-handedly lose a game for a stacked team) it would be ideal for them to be treated seperately.
I'm not exactly great at coming up with these algorithms, but maybe comparing a commander's results to the skill level of his team? So a commander gains skill for winning with a less skilled team but loses skill for losing with a skilled team. And field players would be less penalized for losing if they are playing for an unskilled commander.
I see no reason why this ranking system shouldn't be added to the game. It's accurate while also removing the incentives of team stacking and kill farming. It might be a bit wonky at first as it determines everyone's initial skill value but it's pretty much exactly what NS2 needs.
One possible thing that could be added: accounting for commander skill. It's not necessarily the worst thing in the world if they are treated like another player, but given their bigger influence on the game (ie a bad commander can pretty much single-handedly lose a game for a stacked team) it would be ideal for them to be treated seperately.
I'm not exactly great at coming up with these algorithms, but maybe comparing a commander's results to the skill level of his team? So a commander gains skill for winning with a less skilled team but loses skill for losing with a skilled team. And field players would be less penalized for losing if they are playing for an unskilled commander.
Scaling the gain or loss in commander/player skill rating based on the average skill rating of the team actually sounds like a great way to rank commanders, and not punish everyone on the team for losing with a new/bad commander.
It would be hard to rank commanders in any other way than win lose due to all the variables (fights won/lost, medpack/ammo requests met quickly) of commanding being dependant (mostly) on the players on your team, which may be bad, or spam requests that would skew any metric.
Yes, one of my favorite aspects of this skill system is that it only relies on wins vs. losses and everything else (points, kills, etc.) is discounted. Ultimately NS2 comes down to winning or losing, and it's possible to help your team win despite a weak "statistical" performance. Likewise a player could have a great individual performance by the numbers but do nothing to help his team win. So I would definitely say that commanders, like field players, should be judged only on wins vs losses relative to the skill of the two teams
I'm not exactly great at coming up with these algorithms, but maybe comparing a commander's results to the skill level of his team? So a commander gains skill for winning with a less skilled team but loses skill for losing with a skilled team. And field players would be less penalized for losing if they are playing for an unskilled commander.
I think the way to do this would be to make the prediction for the outcome of the game proportional to the product of the commander skill and the team skill. This makes it so that a team with both a good commander and good players is predicted to perform much much better than a team that is lacking either.
This is the sort of thing though that you have to do some analysis in advance to see whether it actually predicts game outcomes though. If it turns out to be a bad model then the outcome becomes somewhat exploitable by finding the situations in which the model deviates from reality.
It's definitely a bit tricky to compare the value of commanders to the value of a team. For example, as you point out in the google doc, a good commander with a bad team will likely lose to a bad commander with a good team, since there isn't a whole lot a commander can do to beat out a major skill imbalance. However, if the bad commander is really, really bad to the point of borderline incompetence (not knowing which structures do what) he can lose the game for his team.
Also, I would say that most would probably agree that marine commander requires more skill and is more essential to victory than alien commander, although I don't know if this needs to be accounted for.
I suppose in the ideal world with infinite data points, every player could have 2 commander skill scores and 2 player skill scores (for each team). I think that would probably be too many parameters for the amount of real-world data we have, though. You've used a single player skill parameter with a global team modifier as a way around this for the players, and I don't think it's too unreasonable to do the same for commanders, really. The possibility of players being godlike marine commanders but terribad alien commanders (or vice versa) is remote at best. Of course, having a commander skill and a player skill per player doubles the number of parameters, but is a necessary addition to the model given the vast difference between the RTS and FPS parts of the game as you've noted (anyone remember fana dropping 2 more armouries in flight control instead of an IP? ).
I don't have an idea of the number of observations we have (nor the number of parameters, how many players have there been for which these values would need to be determined?) to get a feel for the observation/parameter ratio. It's also confounded somewhat by there being a relationship between games/rounds/players, meaning it's probably nigh on impossible to determine the true observation/parameter ratio.
I suppose in the ideal world with infinite data points, every player could have 2 commander skill scores and 2 player skill scores (for each team). I think that would probably be too many parameters for the amount of real-world data we have, though. You've used a single player skill parameter with a global team modifier as a way around this for the players, and I don't think it's too unreasonable to do the same for commanders, really. The possibility of players being godlike marine commanders but terribad alien commanders (or vice versa) is remote at best. Of course, having a commander skill and a player skill per player doubles the number of parameters, but is a necessary addition to the model given the vast difference between the RTS and FPS parts of the game as you've noted (anyone remember fana dropping 2 more armouries in flight control instead of an IP? ).
I don't have an idea of the number of observations we have (nor the number of parameters, how many players have there been for which these values would need to be determined?) to get a feel for the observation/parameter ratio. It's also confounded somewhat by there being a relationship between games/rounds/players, meaning it's probably nigh on impossible to determine the true observation/parameter ratio.
Well actually in the ideal world we would have a score set for each combination of player/commander, team, map, and playercount. Really though other than perhaps commanding I don't see it having a large enough negative impact over the ranking system to matter. A player who plays primarily aliens will probably have a higher rank because aliens are usually stronger, but ultimately the higher rank will cause him to be matched against better marines and result in an even win chance.
I suppose in the ideal world with infinite data points, every player could have 2 commander skill scores and 2 player skill scores (for each team). I think that would probably be too many parameters for the amount of real-world data we have, though. You've used a single player skill parameter with a global team modifier as a way around this for the players, and I don't think it's too unreasonable to do the same for commanders, really. The possibility of players being godlike marine commanders but terribad alien commanders (or vice versa) is remote at best. Of course, having a commander skill and a player skill per player doubles the number of parameters, but is a necessary addition to the model given the vast difference between the RTS and FPS parts of the game as you've noted (anyone remember fana dropping 2 more armouries in flight control instead of an IP? ).
I don't have an idea of the number of observations we have (nor the number of parameters, how many players have there been for which these values would need to be determined?) to get a feel for the observation/parameter ratio. It's also confounded somewhat by there being a relationship between games/rounds/players, meaning it's probably nigh on impossible to determine the true observation/parameter ratio.
Well actually in the ideal world we would have a score set for each combination of player/commander, team, map, and playercount. Really though other than perhaps commanding I don't see it having a large enough negative impact over the ranking system to matter. A player who plays primarily aliens will probably have a higher rank because aliens are usually stronger, but ultimately the higher rank will cause him to be matched against better marines and result in an even win chance.
Plus I believe the system already factors the historical win rates of the two races into its probabilistic determination of win chance.
@d0ped0g
This is the last time i'll be responding to your bait. Because you are obviously just derailing - why else would you call my attention using @ironhorse , to your obvious jab, in an unrelated thread?
Another user and myself were speaking about a topic.. one in which spanned multiple threads and posts. (go ahead, search my comment history)
You, obviously not tracking this, jumped into the conversation replying and defending the other user but using a completely different meaning to the word that we had been using. Any other usage of that word had zero bearing on the debate we were having.
Pretending that your mistaken understanding of the usage at hand was of no concern, by saying that it's "pedantic" to suggest otherwise, reminds me of one of my favorite movie quotes:
"I want you out of my house!"
"This is not your house."
"Don't play semantics with me!"
You thought i was a troll for continuing to argue about something you obliviously jumped into without understanding the context - and I thought you were attempting to suddenly use a different definition of a term once you were found to have lost the debate.
Long story short: Its over, so get over it and cease derailing / attempting to stir it up.
I left the conversation in hopes you saw that too.. i get you're upset by it, but 1) you can leave it in that thread instead of calling for my attention in other threads or 2) you can always PM me. Neither of those are requests.
Eh, I'd personally call that an incredibly inaccurate retelling of events (for one, the argument was over a post I made that was completely independent of this prior discussion - I was not "jumping into" anything) but I don't wanna shit up this thread anymore with any he-said horse-said. There's been enough derailing over what started as a simple one-liner. So yeah, moving on.
Comments
This begs the question: how does the proposed system handle teamswitches or d/cs before round finish? Does it still update your score based on the time you played on the team even if you're not in that team or in the game at round end? If so, this might not be so much of a problem.
If not, measures could still be put in place to mitigate manipulation due to d/c or teamswitch - if a d/c or teamswitch is performed >x minutes/seconds before round end (resulting in a loss of your said team) then your skill will still be updated (based on the original team).
Definitely make maps a parameter in this too, as winrates will be different on different maps with different size teams. I.e. winrate(refinery,6,6) =/= winrate(veil,6,6) just as winrate(refinery,6,6) =/= winrate(refinery,12,12).
Also, it might be worth seeing if it's possible to incorporate skill in there too (in determining what the 'winrate' is). Competitive winrates can be different to pub winrates, not just because it's 6v6. If you can determine relatively accurately what the winrate should be based on skill levels in the server, said winrate would be more a more faithful measure (than one that doesn't consider skill) for what you're trying to acheive - which I assume is to account for the effect that being on either alien or marine team has on your chance of success.
However, I guess this may run into a few problems, one of which being that winrates based on skill might be skewed by historical data where the skill levels are not yet updated to where they need to be to accurately reflect the players' actual skill (although I guess this would become less of a problem as the skills update over time). Another issue is that adding too many parameters, especially one that is procedurally updated like skill, might overconstrain the winrate calculation (such that you might be better of just sticking with map and playercount). Having said that, I could be overcomplicating it and/or what I just said might not even make sense - need to think about it a bit more.
That's how I would think you would do it. If you switch teams after the game has been going for a while, you'll get most of the credit for the team you were on first, since the exponentially weighted bulk of the time was played on that team. You'd just need to track every time the player joins or leaves a team.
Yeah, there's lots you could do here, but you're limited by the amount of data you have. In addition, any system like this has multiple goals: You want it to accurately track player skill. You want to encourage the right behaviors. And you want the players to trust it. The biggest downside in my mind of making it complicated is that players won't trust it as much because they can't determine just by looking at the deltas that it's doing something reasonable.
That said, the nice thing about framing it as a logistic prediction model is that you can basically throw in a term for whatever feature you want, and learn its coefficient from the data using exactly the same math you use to determine the players skill.
Yes, I'm a little familiar with TrueSkill. It's unfortunately patented I believe, so I'm not inclined to learn more about it (despite it being basic statistics ... grrr.) The main "innovation" in trueskill is that it tracks both the mean and variance of the player's skill, and updates them over time. The quantitative effect of this is that when a new player joins, rather than the model saying "we think this player is bad because they are new" the model says "we have no idea whether this player is good or bad". As a result, for the first several games the new player plays, Trueskill updates the new player's skill level very quickly, but not that of their opponents. Other than that, the main difference between it and some other systems is that it uses the gaussian distribution rather than the logistic distribution (which I've used here.) I think that is a bad choice, but it's really a matter of taste.
2. When it does, make a big-ass full-page news post in-game so people are aware stacking now works against them.
3. Profit! Teams will be better balanced = game is much more fun for all involved = game has longer life.
For me, personally, I find a game enjoyable about one in five. Usually the cause of the not-fun games is skill stacking -- often not deliberately, it's worth noting. People are voting or going random. And I think that's the real gem of this proposal: it'll help ensure genuinely balanced teams.
Because ultimately we play games for fun, and when they're not fun anymore, we stop. Getting onto NS2 with a 20% chance of a good game when I could load up another game with a 80% -100% chance, and the result is -- for me at least -- I find myself playing NS2 less and less.
And that makes me a sad fatty/puppy :o3
(need gorge emoticons ftw)
This, combined with a more restricted version of the official forced balanced teams vote would really do wonders in most pubs.
Not to mention it could seriously aid in the matchmaking system
Now lets see who we can convince to make this official..
@d0ped0g Don't worry, unlike you, moultano doesn't jump into a topic using the same terms.. but meaning something completely different than what was being discussed and then get angry when no one else is tracking. Also yes, so far his system is pretty close to "unexploitable" in the meaning that I used for 3 threads and over 15 posts. Unlike the current system
Good to know that you yourself can freely characterise a skill rank system as exploitable at your own leisure (while jumping on rookie friendly servers isn't considered an ns2 'exploit') - but as soon as another forum member uses the same dictionary definition to describe a voting system you jump down their throats that they're not specifically describing something that exists on a list of exploits that could get you thrown off a server. Maybe I would have gotten a pass if I was in the PT group...
Yup. It needs to happen. The sooner the better. But also shouldn't be rushed. I think there's still room for discussion at how it should be implemented. Although ultimately moultano should do as he pleases with it, as he's the mastermind of the operation and as a google engineer probably knows best with regard to these type of systems (i.e. shouldn't let himself be swayed by idiots like me suggesting skill as a parameter - having thought on that it's a bad idea).
I think when/if it does get released UWE should be careful about characterising it as something that will solve balance/stacking issues immediately, as it won't plonk out accurate skill scores right away and will need time to develop a more educated approximation of players skills. I do think we will see some minor improvements to the stacking problem as people become more adamant about ensuring an evenly skilled game (as you said though, stacking isn't often deliberate, so will still happen).
If I understand it correctly, under this skill system, stacking doesn't necessarily work against a player who stacks. But a player who stacks won't be able to increase his/her skill rating. Maybe that's what you meant by 'stacking now works against them' though.
You are correct it is patented so thats that out of the window. The only other thing I noticed when researching TrueSkill was how it dealt with team games. While I think your model is definitely the right kind of idea I am curious as to how you would take into account the team aspects. So this probability P of a team winning, how is this calculated and how does it deal with teams changing i.e players leaving/joining. This also complicates when you take into account commanders, obviously commanding skill would be tracked separately (possibly even Marine/Alien and MarineCom/AlienCom all separately), but how you use these skill ratings to determine P is difficult. While all of the calculations made after the game is over seem pretty solid, they do rely on P being accurate, and if this isnt the case then while its not really abusable its not really a useful rating.
Take for example a game where you have 3-4 relatively good marines vs a purely average team of aliens. If the skill rating is the team is just an average of the players (including the comm) then one would expect P to be in favor of the marines as they have the 3-4 better players. But then their commander doesnt drop any meds, no upgrades, only builds 2RTs and tries to turtle with turrets while the aliens tech up and take the rest of the map. I have seen many games play out like this where a better skilled team loses based purely on the fact their commander is off in la la land playing on his own. While yes its true this players skill is likely to be lower and this would be reflected in even a basic average skill being used for P, I would argue that the commander is the one role in the game where having someone awful completely dooms a team, unless they are Godar in a pub game could win in 60 seconds on W0A0 with no meds/ammo
In short, what are your thoughts on how P should be calculated?
This is the last time i'll be responding to your bait. Because you are obviously just derailing - why else would you call my attention using @ironhorse , to your obvious jab, in an unrelated thread?
Another user and myself were speaking about a topic.. one in which spanned multiple threads and posts. (go ahead, search my comment history)
You, obviously not tracking this, jumped into the conversation replying and defending the other user but using a completely different meaning to the word that we had been using. Any other usage of that word had zero bearing on the debate we were having.
Pretending that your mistaken understanding of the usage at hand was of no concern, by saying that it's "pedantic" to suggest otherwise, reminds me of one of my favorite movie quotes:
"I want you out of my house!"
"This is not your house."
"Don't play semantics with me!"
You thought i was a troll for continuing to argue about something you obliviously jumped into without understanding the context - and I thought you were attempting to suddenly use a different definition of a term once you were found to have lost the debate.
Long story short: Its over, so get over it and cease derailing / attempting to stir it up.
I left the conversation in hopes you saw that too.. i get you're upset by it, but 1) you can leave it in that thread instead of calling for my attention in other threads or 2) you can always PM me. Neither of those are requests.
Go read it here: https://docs.google.com/document/d/1KMVLpPvEwFtoieTsTqeD8xBheFFEuK4PqjiyzGYSWbU/edit#
@Acedude .. get over here and check this out 0.0
that's looking good. About priors, do we have the data in hive available to use as a training set? You might be able to use a portion of past data as a training set (ie to generate your initial values) and validate the model against other historical data before we even go live. If that validation looks good, then it might be valid to use all historical data as a training set to generate current skill values for all players as the starting point. You'd need someone at UWE to make those data available to you, though...
*What about pseudo random ELO based distribution ? One way or another somebody will do some mod for that if not UWE. What should they do (i'm not a "random mod" author) ?
*What about that guy that has 500+ hours but still don't get a single F*** about the game except "shoot the skulk" ? Han han, they are many and more than you think. It may trick the prediction into false assumptions. There will be always and forever that kind of jerks. That is why i vote on making combat an official thing. So frag counters can frag...
This won't avoid stacking. This will not render this "not exiting" for some.
In fact i think you should add a "Tenacity" parameter. If the stacked team can't "finish"; the losing team still get something. Because they stayed until the end and / or they resisted facing inevitable death. They would obtain more if the skill gap is bigger between the 2 teams.
Some "stackers" start to make things longer than it is needed (like shooting opponent instead of hive / base while it is clearly unnecessary : 1% life etc.). I think that it would be better to see this kind of game (which will happen whatever the skill systems used) end fast. So nobody really gain from that except the ppl who can behave properly (at least not like jerks).
I wouldn't say that. Both teams have different needs from the commander. So this example doesn't work.
I would to get the data to play with. Particularly if I could validate a model that treats the commander's skill in some more complicated way relative to the team.
can I just call you both drama queens and be done with it?
Somehow i came into access of a large db with stats of various ns2 rounds called ns2stats.com . Look here https://docs.google.com/document/d/13v9TF56gqykSyg0uysIzeZlEQmJL5ge5wLfM-uRaAwo/ for the wepapi, it allows you to dump all infos you need directly from the db. ( 'PlayerRound' and 'Round' would be the tables you need mainly )
If you need futher more access you might want to contact @Sint (a.k.a Synomi). I think he will be able to help you with all data you need for this
I pinged him to see if I can get a dump of the database.
In the meantime, I fleshed out the bit on running the algorithm iteratively and including a gamma distribution prior. This will let us get accurate skill values much more quickly.
One possible thing that could be added: accounting for commander skill. It's not necessarily the worst thing in the world if they are treated like another player, but given their bigger influence on the game (ie a bad commander can pretty much single-handedly lose a game for a stacked team) it would be ideal for them to be treated seperately.
I'm not exactly great at coming up with these algorithms, but maybe comparing a commander's results to the skill level of his team? So a commander gains skill for winning with a less skilled team but loses skill for losing with a skilled team. And field players would be less penalized for losing if they are playing for an unskilled commander.
Scaling the gain or loss in commander/player skill rating based on the average skill rating of the team actually sounds like a great way to rank commanders, and not punish everyone on the team for losing with a new/bad commander.
It would be hard to rank commanders in any other way than win lose due to all the variables (fights won/lost, medpack/ammo requests met quickly) of commanding being dependant (mostly) on the players on your team, which may be bad, or spam requests that would skew any metric.
I think the way to do this would be to make the prediction for the outcome of the game proportional to the product of the commander skill and the team skill. This makes it so that a team with both a good commander and good players is predicted to perform much much better than a team that is lacking either.
This is the sort of thing though that you have to do some analysis in advance to see whether it actually predicts game outcomes though. If it turns out to be a bad model then the outcome becomes somewhat exploitable by finding the situations in which the model deviates from reality.
Also, I would say that most would probably agree that marine commander requires more skill and is more essential to victory than alien commander, although I don't know if this needs to be accounted for.
I don't have an idea of the number of observations we have (nor the number of parameters, how many players have there been for which these values would need to be determined?) to get a feel for the observation/parameter ratio. It's also confounded somewhat by there being a relationship between games/rounds/players, meaning it's probably nigh on impossible to determine the true observation/parameter ratio.
Well actually in the ideal world we would have a score set for each combination of player/commander, team, map, and playercount. Really though other than perhaps commanding I don't see it having a large enough negative impact over the ranking system to matter. A player who plays primarily aliens will probably have a higher rank because aliens are usually stronger, but ultimately the higher rank will cause him to be matched against better marines and result in an even win chance.
Plus I believe the system already factors the historical win rates of the two races into its probabilistic determination of win chance.
Eh, I'd personally call that an incredibly inaccurate retelling of events (for one, the argument was over a post I made that was completely independent of this prior discussion - I was not "jumping into" anything) but I don't wanna shit up this thread anymore with any he-said horse-said. There's been enough derailing over what started as a simple one-liner. So yeah, moving on.
Get a room you two. :P
Interest is fading fast because pub games are so frequently imbalanced, and pub is the majority of the game for getting and keeping new players.