Too many biais (whether they can be controlled or not) :
-Commander (bad com/good com? yourself as a com?)
-Lifeform played
-Alien or marine side
-Map and spawn point
-Recent performance/not playing for a while
-Server/closed community
-Group of friends playing together
-Ping
-closed round or stomp (+/- game duration and kill graph)
...
NS2 uses an amateur system with only 1 main variable (W/L) and 1 secondary variable (the skill of the opposite team) without any other adjustement.
Nordic thinks this system is good enough. The experience ingame tell us the opposite. You don't make a good MMR with only W/L ratio, if it was that easy devs woulnd't loss that much time in them, tweaking them each updates.
This system is like making a conclusion on an univariate analysis with dozens of confounding factors without using a multivariate analysis.
Because so far it's been shown to be more accurate than Microsoft's patented system, albeit slower to achieve that accuracy.
Surely since you feel this isn't a good MMR you have a better suggestion?
Tinki has a point. I think you should have separate scores for each faction, each class, each map and as I stated before, even each server for it to be truly accurate.
It's like if you played quake duels, you wouldn't expect your duel rating to be the same as your instagib rating. The two game modes, can for all intends and purposes, be considered different games entirely, even though they are slightly similar.
In the same way, playing the Lerk and playing the Fade, requires completely different approaches from the players. So you shouldn't expect a single rating to be accurate in both classes.
But W/L is STILL the best metric to determine the ranks..
I have used the chess analogy before, because I think it's accurate. Chess is, for all intends and purposes, a solved game. No human has beaten the latest iterations of Stockfish (chess a.i.). You could litterally detect every mistake a player makes during a game (in fact, chess sites DO), and rate him based on the amount of mistakes he makes. And you could make it accurate as fuck..
Yet, the chess community chooses to rate based on W/L in what we know as ELO. The reason: Because it works.
But notice. Your blizz and classical rating, are not the same... Because for all intends and purposes, blizz and classical are two different games, although slightly similar.
-Map and spawn point
-Ping
-closed round or stomp (+/- game duration and kill graph)
-Recent performance/not playing for a while
-Server/closed community
-Group of friends playing together
Every game has problems with these. Skill systems don't need to account for them because over a long enough play time they don't matter. They won't matter in the long because those variables are mostly consistent.
Some of those problems other games are able to handle better. Specifically a stomp round, closed communities, and party systems. None of these really can be addressed because of our small playerbase. As I have been saying, the small playerbase is the root cause of the problem, not hive. This is true with all of these.
-Commander (bad com/good com? yourself as a com?)
-Lifeform played
-Alien or marine side
These problems are more or less unique to NS2. They also are easily fixed assuming we ever get hive 2.0. As I have mentioned, hive 2.0 is planned to have separate skill ratings for marines, aliens, commanders, and khammanders. This is a flaw, but not a flaw with the underlying skill system.
NS2 uses an amateur system with only 1 main variable (W/L) and 1 secondary variable (the skill of the opposite team) without any other adjustment.
As I understand it, you can try to apply more variables to a model to gain greater accuracy but unless you use them correctly it can actually increase inaccuracy. The more common way to make a model is to use a few really good variables that you can easily make into a mostly accurate model. (W/L) is the best variable to use in this regard because it can indirectly measure every variable that can lead to a victory. (K/D) is also a good variable, but it already captured in (W/L). The hive skill system is a model. As George Box once said, "All models are wrong."
We are not going to get a model much better than hive. Microsofts TrueSkill, the industry best, is no better than hive. Riot games uses a similar system for League of Legends.
I don't know what you guys expect. Some super complex impossible formula? We are never going to have a perfect skill system. What we have is the best we can get. Sure, it has room to improve, and I have linked proposed solutions that are ready to go in game. This does not mean the underlying skill system is useless.
It is not just your experience. I am told the end game feedback has shown that the biggest cause of a bad game is poor balance. I am not disputing that there is a bad experience with balance. I see it, you see it, everyone sees it.
As I have said many times in this thread, hive is not the cause of the poor experience. It is our small playerbase combined with a high skill ceiling game. When the highest skill levels of players have no choice but to play in the same servers as the lowest skill level of players it does not make for a good experience. How could it? The high skill players are annoyed or bored when playing with lower skilled players, and low skilled players don't like being stomped.
Our small playerbase makes it impossible to find near skilled games. If NS2 had enough players so that all skill brackets could find near skilled games, there would be a good experience. If there was that large of a playerbase, we could have matchmaking. The root cause of nearly every problem NS2 has is a small playerbase.
Tinki has a point. I think you should have separate scores for each faction
I have been making that point for years, at least for each faction. I never went further than that.
Making the skill system account for each lifeform, each map, and each spawn point on every server would make hive take ages longer than it currently does to be useful. The current implementation could be improved and made faster with the addition of adagrad as proposed by moultano. I question whether or not adagrad would be able to speed that up enough to speed up for what you propose.
And just for fun, and only slightly related to this thread I am going to put this quote here because Blrg is agreeing with at least some of the things I am saying.
I would like to add that everyone understands that only having one singular score is problematic (as opposed to have different ones for marines/aliens/...) and nobody disputes that.
If you have played games where your team gets dominated by the other team then you're most likely to rate the match played with few stars. The reason for this is imbalanced teams. Even with balance mods it's hard to make teams balanced. Newer players may quit playing after a few stacked matches. I know I wanted to.
A good way to fix this:
Server side setting for handicap on.
Every 5 minutes check if a team has double the pres and tres of the other team.
If so then award everyone on the losing team +50 pres and tres.
The losing team will probably still lose, but they will have much more fun playing and then they will rate the game much higher.
Tdlr - "TEAM HANDICAP AWARD +50 RES"
We're considering a mechanic where 5 minutes into the game, it just randomly declares a team the winner with a 50-50 chance. That should make everything perfectly 50% balanced.
( Obligatory: Yes this is sarcasm... because some people just can't detect it very well... )
That would make aliens overpowered with their 1 min into the game game-ending baserushes!
I suggest making the timeperiod shorter, maybe 20 seconds or so.
What if time suddenly stops? Then the RNG won't work and will always have the same result! We really must defend against these edge case scenarios.
Every game has problems with these. Skill systems don't need to account for them because over a long enough play time they don't matter. They won't matter in the long because those variables are mostly consistent.
in 1000 years maybe
Working as an abstract concept doesn't mean it actually works
The thing is, hive was designed when ns2 had a much bigger playerbase, but now we have a small playerbase, so adapt it to the new needs!!!
Don't wait for the playerbase to grow if 90% of games are shitty because unbalanced.
How it works? Simple, no super complicated algorithm, mix KDR + Winrate + Playtime all / 3 and it would make more accurately balanced teams.
Does more game knowledge increase the chance of someone winning? If the answer is yes, then it is indirectly measured by hive.
Basically you're saying that if you match team of vets (2k hour) vs team of newbs (100 hours) and every single player has 1000 elo each team have equal chance of wining.
laughable
If you could hypothetically find a team of veterans (2k hours) and a team of newbs (100 hours) and they all had 1000 skill, this is what would happen. The veterans, being veterans, would win more. A LOT more. The veterans hive skill would go up, while the newbs hive skill down. Game knowledge was just measured.
There is nothing wrong with hive in this scenario. Hive is just learning. All skill systems must learn. Even the supposed industry best skill system, called TrueSkill, takes an average of 90 games to find a statistically significant skill value. We both agree that hive learns a too slow right now, but there is a proposed fix for that. Is it a flaw of the whole hive skill system? No.
Yes there is something wrong, its that the chance of win should be 50/50 with equal elo
So thanks for acknowledging my point.
Does more game knowledge increase the chance of someone winning? If the answer is yes, then it is indirectly measured by hive.
Basically you're saying that if you match team of vets (2k hour) vs team of newbs (100 hours) and every single player has 1000 elo each team have equal chance of wining.
laughable
If you could hypothetically find a team of veterans (2k hours) and a team of newbs (100 hours) and they all had 1000 skill, this is what would happen. The veterans, being veterans, would win more. A LOT more. The veterans hive skill would go up, while the newbs hive skill down. Game knowledge was just measured.
There is nothing wrong with hive in this scenario. Hive is just learning. All skill systems must learn. Even the supposed industry best skill system, called TrueSkill, takes an average of 90 games to find a statistically significant skill value. We both agree that hive learns a too slow right now, but there is a proposed fix for that. Is it a flaw of the whole hive skill system? No.
Yes there is something wrong, its that the chance of win should be 50/50 with equal elo
So thanks for acknowledging my point.
What is wrong? Please be specific. I don't see anything wrong with this.
Here take these 12 players.
Bob
Bab
Bub
Bom
Bam
Bot
Gob
Gab
Gub
Gom
Gam
Got
Six of those players are veterans with 2k hours, six are newbs with 100 hours. All you get are those names and you know they each have 1000 hive skill.
You can't can you. Hive gets no more information than that. As far as you know, and as far as hive knows, there is a 50/50 chance of each team winning. Lets say a game was won, B team won. Now B team has six players with 1050 hive skill, and team G has 950 hive skill. They play another game with the same teams. Team B wins again. Team B still has 1050 hive skill, and team G still has 950 hive skill. Hive did not give or take any hive skill from either team because it knew team B would win. It was not a surprise. Hive only gives and takes skill when it is is given an unexpected result, such as 950 hive skill team G beating 1050 hive skill team B.
That may have been an overly simplified version of hive, but I think it makes my point. At least I hope I made my point. Nothing is wrong with hive in the scenario you mention. If you disagree please be specific as to why.
Every game has problems with these. Skill systems don't need to account for them because over a long enough play time they don't matter. They won't matter in the long because those variables are mostly consistent.
in 1000 years maybe
Are you saying every game would take 1000 years because they all deal with those same problems?
Microsoft patented TrueSkill system, the supposed industry best, takes on average 90 games before it has a statistically significant skill rating. It can take far longer than 90 games too. That is the industry best. On average a round of NS2 takes 15 minutes. 90 games at 15 minutes would be 22.5 hours of gameplay.
NS2 currently takes at least 200 games before it has a reasonable skill rating. That is 3000 hours of gameplay. I know this is way too long. Hive takes 969 games to be really accurate, which nobody can reasonably be expected to get to. I said early on in this thread that one problem with hive is how long it takes for it to find a reasonable skill rating. This can be easily fixed with the addition of adagrad, as I have said multiple times in this thread. Adagrad would make it so hive finds a reasonable skill value in as soon as 10 games. The underlying hive skill system is not useless.
The thing is, hive was designed when ns2 had a much bigger playerbase, but now we have a small playerbase, so adapt it to the new needs!!!
Don't wait for the playerbase to grow if 90% of games are shitty because unbalanced.
Look at that. I am making progress. In your own way you have said that hive does not work currently because of the small playerbase.
Still though, hive would not be useless even if we have 2 players total. It would be a waste of time to use it, but the system still works the exact same.
How it works? Simple, no super complicated algorithm, mix KDR + Winrate + Playtime all / 3 and it would make more accurately balanced teams.
First of all, one of these is not like the other. KDR and Winrate are values typically between 0 and 2. Some super skilled people, or stompers, manage to get higher values than 2. Playtime is a value between 0 and infinity basically.
Lets take your formula and apply it to three real players.
Player 1: 1.47 W/L, 2.36 KDR, and 46.63 hours of playtime. 1768 hive
Player 2: 0.78 W/L, 0.69 KDR, and 638.23. 255 hive
Player 3: 0.98 W/L, 0.49 KDR, and 201.67. 1202 hive
Your formula gives them the following skill values:
Player 1: (1.47+2.36 46.63)/3 = 16.82
Player 2: (0.78+0.69+638.23)/3 = 213.23
Player 3: (0.98+0.49+201.67)/3 = 67.71
Assuming a higher skill value is better, does Player 2 look like the most skilled of those three players? Now how well does this system work with a player as skilled as Tane playing on an alt account? Hive doesn't do much better with alt-accounts but it does do better. It doesn't. If you are going to use time played as a determinate of skill, you need to use it a different way.
In that massive post I showed I a picture of 2025 players ranked 1 to 2025 by their hive skill, KDR, W/L, and SpM. In it you can visually see that hive does a fairly good job and ranking similarly to those metrics. I don't think you looked at it. This time I did the same thing but included time but only with 99 randomly selected players over 100 hours recorded. I also included time in the picture until
In this picture you can see that hive ranks players 1 to 99 similarly to KDR, W/L, SpM, and Time. You can also see there are players who hive seems to get wrong. Nobody is saying hive is perfect, but I am saying it does a fairly good job. At least at ranking players. This is even more apparent in the version I did with 2025 players. If you want to look at that, go read that massive post.
Btw, is there any reason these fixes for hive haven't been tackled yet by the devs?
Originally it was because the CDT didn't have access to hive. Then as they got paid, they had other priorities for phase 1 and phase 2 development. Now they are in phase 3 and can tackle bigger projects like this. So I am hoping these will get put in sooner rather then later. They do know about the fixes.
Does more game knowledge increase the chance of someone winning? If the answer is yes, then it is indirectly measured by hive.
Basically you're saying that if you match team of vets (2k hour) vs team of newbs (100 hours) and every single player has 1000 elo each team have equal chance of wining.
laughable
If you could hypothetically find a team of veterans (2k hours) and a team of newbs (100 hours) and they all had 1000 skill, this is what would happen. The veterans, being veterans, would win more. A LOT more. The veterans hive skill would go up, while the newbs hive skill down. Game knowledge was just measured.
There is nothing wrong with hive in this scenario. Hive is just learning. All skill systems must learn. Even the supposed industry best skill system, called TrueSkill, takes an average of 90 games to find a statistically significant skill value. We both agree that hive learns a too slow right now, but there is a proposed fix for that. Is it a flaw of the whole hive skill system? No.
Yes there is something wrong, its that the chance of win should be 50/50 with equal elo
So thanks for acknowledging my point.
What is wrong? Please be specific. I don't see anything wrong with this.
Here take these 12 players.
Bob
Bab
Bub
Bom
Bam
Bot
Gob
Gab
Gub
Gom
Gam
Got
Six of those players are veterans with 2k hours, six are newbs with 100 hours. All you get are those names and you know they each have 1000 hive skill.
You can't can you. Hive gets no more information than that. As far as you know, and as far as hive knows, there is a 50/50 chance of each team winning. Lets say a game was won, B team won. Now B team has six players with 1050 hive skill, and team G has 950 hive skill. They play another game with the same teams. Team B wins again. Team B still has 1050 hive skill, and team G still has 950 hive skill. Hive did not give or take any hive skill from either team because it knew team B would win. It was not a surprise. Hive only gives and takes skill when it is is given an unexpected result, such as 950 hive skill team G beating 1050 hive skill team B.
That may have been an overly simplified version of hive, but I think it makes my point. At least I hope I made my point. Nothing is wrong with hive in the scenario you mention. If you disagree please be specific as to why.
Every game has problems with these. Skill systems don't need to account for them because over a long enough play time they don't matter. They won't matter in the long because those variables are mostly consistent.
in 1000 years maybe
Are you saying every game would take 1000 years because they all deal with those same problems?
Microsoft patented TrueSkill system, the supposed industry best, takes on average 90 games before it has a statistically significant skill rating. It can take far longer than 90 games too. That is the industry best. On average a round of NS2 takes 15 minutes. 90 games at 15 minutes would be 22.5 hours of gameplay.
NS2 currently takes at least 200 games before it has a reasonable skill rating. That is 3000 hours of gameplay. I know this is way too long. Hive takes 969 games to be really accurate, which nobody can reasonably be expected to get to. I said early on in this thread that one problem with hive is how long it takes for it to find a reasonable skill rating. This can be easily fixed with the addition of adagrad, as I have said multiple times in this thread. Adagrad would make it so hive finds a reasonable skill value in as soon as 10 games. The underlying hive skill system is not useless.
The thing is, hive was designed when ns2 had a much bigger playerbase, but now we have a small playerbase, so adapt it to the new needs!!!
Don't wait for the playerbase to grow if 90% of games are shitty because unbalanced.
Look at that. I am making progress. In your own way you have said that hive does not work currently because of the small playerbase.
Still though, hive would not be useless even if we have 2 players total. It would be a waste of time to use it, but the system still works the exact same.
How it works? Simple, no super complicated algorithm, mix KDR + Winrate + Playtime all / 3 and it would make more accurately balanced teams.
First of all, one of these is not like the other. KDR and Winrate are values typically between 0 and 2. Some super skilled people, or stompers, manage to get higher values than 2. Playtime is a value between 0 and infinity basically.
Lets take your formula and apply it to three real players.
Player 1: 1.47 W/L, 2.36 KDR, and 46.63 hours of playtime. 1768 hive
Player 2: 0.78 W/L, 0.69 KDR, and 638.23. 255 hive
Player 3: 0.98 W/L, 0.49 KDR, and 201.67. 1202 hive
Your formula gives them the following skill values:
Player 1: (1.47+2.36 46.63)/3 = 16.82
Player 2: (0.78+0.69+638.23)/3 = 213.23
Player 3: (0.98+0.49+201.67)/3 = 67.71
Assuming a higher skill value is better, does Player 2 look like the most skilled of those three players? Now how well does this system work with a player as skilled as Tane playing on an alt account? Hive doesn't do much better with alt-accounts but it does do better. It doesn't. If you are going to use time played as a determinate of skill, you need to use it a different way.
In that massive post I showed I a picture of 2025 players ranked 1 to 2025 by their hive skill, KDR, W/L, and SpM. In it you can visually see that hive does a fairly good job and ranking similarly to those metrics. I don't think you looked at it. This time I did the same thing but included time but only with 99 randomly selected players over 100 hours recorded. I also included time in the picture until
In this picture you can see that hive ranks players 1 to 99 similarly to KDR, W/L, SpM, and Time. You can also see there are players who hive seems to get wrong. Nobody is saying hive is perfect, but I am saying it does a fairly good job. At least at ranking players. This is even more apparent in the version I did with 2025 players. If you want to look at that, go read that massive post.
Wow ... keyboard hero of the internet
My mind is blowing right now >:D Litterally UWE should hire this guy to make new BeeHIVE system
Btw, is there any reason these fixes for hive haven't been tackled yet by the devs?
Originally it was because the CDT didn't have access to hive. Then as they got paid, they had other priorities for phase 1 and phase 2 development. Now they are in phase 3 and can tackle bigger projects like this. So I am hoping these will get put in sooner rather then later. They do know about the fixes.
@Nordic I'd save your breath from now on, you clearly have more knowledge regarding the intricacies of the Hive system than the majority of the users on this forum including myself.
It's no worth bloodying your forehead on your keyboard from people that apparently know than you due to a "gut" feeling.
MephillesGermanyJoin Date: 2013-08-07Member: 186634Members, NS2 Map Tester, NS2 Community Developer
just my 2 cents here:
In TAW we are balancing the teams out of a database we built for ourself.
In values in those databases are:
- skill (same value for marines and aliens)
- main lifeform
- secondary lifeform
- commander skill
- can/wants to command
The values for the skills are handpicked btw. So we have a couple of guys in TAW who rate the skill in of those guys and then we use those values. This system has been proven to work rly well more often than not.
In TAW we are balancing the teams out of a database we built for ourself.
In values in those databases are:
- skill (same value for marines and aliens)
- main lifeform
- secondary lifeform
- commander skill
- can/wants to command
The values for the skills are handpicked btw. So we have a couple of guys in TAW who rate the skill in of those guys and then we use those values. This system has been proven to work rly well more often than not.
And yet, even assuming the most perfect evaluation system, you still have to solve your best players going commander/gorge/unfamiliar lifeform, inherent inbalance in available players present on server and commanders that severely handicap the team.
It is probably more productive to try and cultivate a community that concedes when the game is obviously unbalanced and starting a new game to get another try. Just normal forced random works. New members can either fit in or go elsewhere and continue crying about stacks.
Every game has problems with these. Skill systems don't need to account for them because over a long enough play time they don't matter. They won't matter in the long because those variables are mostly consistent.
Ok i'll give you a little exemple (spawnpoint biais):
Game 1 : Tram, alien spawn in shipping, even skills --> aliens get stomp
Game 2 : Veil, exact same team but this time as marines, against an alien spawn in pipeline --> marine stomp
You have in 2 games a 50% winrate, but you don't take in account the fact that these games were sh-it.
But you are right over a long enough play time the sh-it games tend to don't matter, because you will have 50 sh-it loses for 50 sh-it wins.
Stats are here to help you use the data, but you still need to apply them to a reality.
In the end I disagree when you clear hive from the balance problem, because it's at the center of it. The exemple i gave can be fixed even with a low player base.
NB : I really hope the rating system wasn't add to evaluate the stomp games for this purpose or else it was just a long detour.
If you guys didn't see it I posted this thread on map statistics. It includes team win rates by map spawns, or spawn point biases.
I know map spawn imbalances exist. For some maps they are pretty bad.
Have you seen any other skill system for map spawn points? I am not entirely opposed to it, because there are map imbalances, and spawn point imbalances. There is a tradeoff here. If you were to balance for map spawn points you would increase accuracy at the cost of how fast hive can rate a players skill. Microsofts Trueskill takes on average 90 games to find a statistically significant skill value. If you did that for each spawn, it would be 90*(the number of map spawns).
My point is that if you had a large enough playerbase, you could find near skilled games. Tinki, if you played a game with 11 other players of equal skill to yourself it would be a much better experience than what you could find in pubs right now.
But you are right over a long enough play time the sh-it games tend to don't matter, because you will have 50 sh-it loses for 50 sh-it wins.
Stats are here to help you use the data, but you still need to apply them to a reality.
In the end I disagree when you clear hive from the balance problem, because it's at the center of it. The exemple i gave can be fixed even with a low player base.
@Tinki, you don't need to tell me what stats are and are not. This is from another thread on another subject, but it is related.
50/50 win rates are an objective measure to strive for. They are good to strive for, but you can not use them alone. Ideally you would want 50/50 win rates at all skill levels, within certain factors such as round time and player counts. You should pay close attention to the highest skill level because it should have the greatest deviation. 50/50 win rates are useful to strive for.
Balance is much more complex than 50/50 win rates. You want the game to be fun. 50/50 win rates mean nothing if every win is a stomp or a base rush. That is not what I consider fun.
At the same time 75% alien win rates are no good, even if the game is fun. Sure 75% alien win rates might be ok if you play each team equally, but that is not feasible in ns2.
Having 50/50 win rates, or at least close to it, is a good sign but it does not mean that balance is good.
Statistics, by definition never show the whole story. They are supposed to be representative. A lot of it comes down to how to interpret the statistics.
Hive is not the core issue here, it is the low player count combined with a high skill ceiling creating the poor experience.
On average a round of NS2 takes 15 minutes. 90 games at 15 minutes would be 22.5 hours of gameplay.
NS2 currently takes at least 200 games before it has a reasonable skill rating. That is 3000 hours of gameplay. I know this is way too long. Hive takes 969 games to be really accurate, which nobody can reasonably be expected to get to.
Nordic I think you got your hours and minutes mixed up in one of these examples here.
90 * .25hrs[15 min] is 22.5 hours but 200 * .25 = only 50 hours, or 3000 minutes, nowhere near as bad as what you're claiming. Where does the 969 number come from?
I'm also curious how in your random sampling of hive members you are accounting for the massive tracking bug that affects many players. To be specific, my Hive account was affected in the following way. For about a year my W/L, score, rounds played, time spent in team and hive skill continued to be tracked accurately (at least, it seemed that way to me) but my KDR and assists froze. Because of this I think that I, or anyone else similarly affected, would seriously skew the results in any comparison like your 2025 player or 100 player posts above where you are correlating KDR to hive skill.
Have you seen any other skill system for map spawn points? I am not entirely opposed to it, because there are map imbalances, and spawn point imbalances. There is a tradeoff here. If you were to balance for map spawn points you would increase accuracy at the cost of how fast hive can rate a players skill. Microsofts Trueskill takes on average 90 games to find a statistically significant skill value. If you did that for each spawn, it would be 90*(the number of map spawns).
@Nordic
Is there a reason it cannot just weight the MMR gained/lost per round based on spawn point bias (from say, a periodically generated table of W/L deviations from the global average)? Doesn't seem like there would be a need for separate spawnpoint-based player MMR values.
On average a round of NS2 takes 15 minutes. 90 games at 15 minutes would be 22.5 hours of gameplay.
NS2 currently takes at least 200 games before it has a reasonable skill rating. That is 3000 hours of gameplay. I know this is way too long. Hive takes 969 games to be really accurate, which nobody can reasonably be expected to get to.
Nordic I think you got your hours and minutes mixed up in one of these examples here.
90 * .25hrs[15 min] is 22.5 hours but 200 * .25 = only 50 hours, or 3000 minutes, nowhere near as bad as what you're claiming. Where does the 969 number come from?
I'm also curious how in your random sampling of hive members you are accounting for the massive tracking bug that affects many players. To be specific, my Hive account was affected in the following way. For about a year my W/L, score, rounds played, time spent in team and hive skill continued to be tracked accurately (at least, it seemed that way to me) but my KDR and assists froze. Because of this I think that I, or anyone else similarly affected, would seriously skew the results in any comparison like your 2025 player or 100 player posts above where you are correlating KDR to hive skill.
Nice catch on the math.
The 969 number is from some modeling Moultano did. Look at the thread I linked that had the explanation of adagrad.
I don't account for that. How I got my random number is this. I took all the players who had played a game between January 1st, 2016 until February 12, 2016. In excel I assigned some players a value of 1, and some players a value of 0, so that the 1 players were evenly spaced. Is it a perfect way to get a random sample? No. Is it a perfect analysis? No. It really just made that to give people a visual measure compared to a mathematical one.
If the KDR values are skewed because of that freeze, I have no way to know.
Have you seen any other skill system for map spawn points? I am not entirely opposed to it, because there are map imbalances, and spawn point imbalances. There is a tradeoff here. If you were to balance for map spawn points you would increase accuracy at the cost of how fast hive can rate a players skill. Microsofts Trueskill takes on average 90 games to find a statistically significant skill value. If you did that for each spawn, it would be 90*(the number of map spawns).
@Nordic
Is there a reason it cannot just weight the MMR gained/lost per round based on spawn point bias (from say, a periodically generated table of W/L deviations from the global average)? Doesn't seem like there would be a need for separate spawnpoint-based player MMR values.
I honestly do not know. I took the suggestion at face value. That actually doesn't sound half bad. That is more a question for Moultano.
@Sen I have given more thought onto your proposition that the hive data is inconsistent. I may not have a way to know which players were frozen, but I do have something to compare it with.
I don't know if you read the massive post I wrote further back in this thread, but I do compare hive data to another data source. You will have to read the massive post to understand this table in context, but take a look at TF KDR and H KDR on the right side.
The TF KDR is the KDR value from data collected on Tactical Freedom. Hive KDR is the KDR value from data collected with hive. Hive has been collecting data for far longer than Tactical Freedom has, but hive also has had database issues.
The differences can't really be explained easily. Hive having freezes could explain some of that, but we don't know. I know my KDR on hive is higher than it is on TF. That is because TF typically has higher skilled players than the average pub of NS2. TF is often more of a challenge for me.
Even though there are differences, they are close. At this point this is showing the weakness of this form of analysis. We are comparing two things in rank, and then comparing another two things by rank to check accuracy.
I've heard other developers like to hide or generalize skill ratings due to the trend of players obsessing over numbers. If numbers get hidden in the future, I'd still like to see some more general/obscure form of ranking. Here's my character from this old game called Puzzle Pirates I played ages ago. The skill reporting they use is called Puzzle Standing. Essentially, I was really good at this puzzle called Rigging, so the game assigned my skill relative to others in the server (ocean) as a Grand-Master. Nobody knew what percentages or numbers that meant, but they did know that this kid could really set sail. Naturally, since I was the best at that puzzle, I played it the most and got a lot of experience, shown as Paragon. A similar system could work for Hive. Show skill and hours as vague ranges/words, but not the numbers, as they don't have as much of a meaning due to their oscillation.
Comments
-Commander (bad com/good com? yourself as a com?)
-Lifeform played
-Alien or marine side
-Map and spawn point
-Recent performance/not playing for a while
-Server/closed community
-Group of friends playing together
-Ping
-closed round or stomp (+/- game duration and kill graph)
...
NS2 uses an amateur system with only 1 main variable (W/L) and 1 secondary variable (the skill of the opposite team) without any other adjustement.
Nordic thinks this system is good enough. The experience ingame tell us the opposite. You don't make a good MMR with only W/L ratio, if it was that easy devs woulnd't loss that much time in them, tweaking them each updates.
This system is like making a conclusion on an univariate analysis with dozens of confounding factors without using a multivariate analysis.
Because so far it's been shown to be more accurate than Microsoft's patented system, albeit slower to achieve that accuracy.
Surely since you feel this isn't a good MMR you have a better suggestion?
-Map and spawn point
Those 2 should be the easiest to fix, giving you have the data per map and spawn. But Hive 2.0 is here for that
Also I was responding to others saying W/L ratio was enough, I have no envy to work for UWE.
It's like if you played quake duels, you wouldn't expect your duel rating to be the same as your instagib rating. The two game modes, can for all intends and purposes, be considered different games entirely, even though they are slightly similar.
In the same way, playing the Lerk and playing the Fade, requires completely different approaches from the players. So you shouldn't expect a single rating to be accurate in both classes.
But W/L is STILL the best metric to determine the ranks..
I have used the chess analogy before, because I think it's accurate. Chess is, for all intends and purposes, a solved game. No human has beaten the latest iterations of Stockfish (chess a.i.). You could litterally detect every mistake a player makes during a game (in fact, chess sites DO), and rate him based on the amount of mistakes he makes. And you could make it accurate as fuck..
Yet, the chess community chooses to rate based on W/L in what we know as ELO. The reason: Because it works.
But notice. Your blizz and classical rating, are not the same... Because for all intends and purposes, blizz and classical are two different games, although slightly similar.
Some of those problems other games are able to handle better. Specifically a stomp round, closed communities, and party systems. None of these really can be addressed because of our small playerbase. As I have been saying, the small playerbase is the root cause of the problem, not hive. This is true with all of these.
These problems are more or less unique to NS2. They also are easily fixed assuming we ever get hive 2.0. As I have mentioned, hive 2.0 is planned to have separate skill ratings for marines, aliens, commanders, and khammanders. This is a flaw, but not a flaw with the underlying skill system.
As I understand it, you can try to apply more variables to a model to gain greater accuracy but unless you use them correctly it can actually increase inaccuracy. The more common way to make a model is to use a few really good variables that you can easily make into a mostly accurate model. (W/L) is the best variable to use in this regard because it can indirectly measure every variable that can lead to a victory. (K/D) is also a good variable, but it already captured in (W/L). The hive skill system is a model. As George Box once said, "All models are wrong."
We are not going to get a model much better than hive. Microsofts TrueSkill, the industry best, is no better than hive. Riot games uses a similar system for League of Legends.
I don't know what you guys expect. Some super complex impossible formula? We are never going to have a perfect skill system. What we have is the best we can get. Sure, it has room to improve, and I have linked proposed solutions that are ready to go in game. This does not mean the underlying skill system is useless.
In that massive post I think I have shown that the system works at least "good enough."
It is not just your experience. I am told the end game feedback has shown that the biggest cause of a bad game is poor balance. I am not disputing that there is a bad experience with balance. I see it, you see it, everyone sees it.
As I have said many times in this thread, hive is not the cause of the poor experience. It is our small playerbase combined with a high skill ceiling game. When the highest skill levels of players have no choice but to play in the same servers as the lowest skill level of players it does not make for a good experience. How could it? The high skill players are annoyed or bored when playing with lower skilled players, and low skilled players don't like being stomped.
Our small playerbase makes it impossible to find near skilled games. If NS2 had enough players so that all skill brackets could find near skilled games, there would be a good experience. If there was that large of a playerbase, we could have matchmaking. The root cause of nearly every problem NS2 has is a small playerbase.
I have been making that point for years, at least for each faction. I never went further than that.
Making the skill system account for each lifeform, each map, and each spawn point on every server would make hive take ages longer than it currently does to be useful. The current implementation could be improved and made faster with the addition of adagrad as proposed by moultano. I question whether or not adagrad would be able to speed that up enough to speed up for what you propose.
And just for fun, and only slightly related to this thread I am going to put this quote here because Blrg is agreeing with at least some of the things I am saying.
I would like to add that everyone understands that only having one singular score is problematic (as opposed to have different ones for marines/aliens/...) and nobody disputes that.
What if time suddenly stops? Then the RNG won't work and will always have the same result! We really must defend against these edge case scenarios.
in 1000 years maybe
Working as an abstract concept doesn't mean it actually works
The thing is, hive was designed when ns2 had a much bigger playerbase, but now we have a small playerbase, so adapt it to the new needs!!!
Don't wait for the playerbase to grow if 90% of games are shitty because unbalanced.
How it works? Simple, no super complicated algorithm, mix KDR + Winrate + Playtime all / 3 and it would make more accurately balanced teams.
Yes there is something wrong, its that the chance of win should be 50/50 with equal elo
So thanks for acknowledging my point.
You are totally right. We need an acausal skill system. Moultano/Nordic, fix this already.
Here take these 12 players.
Six of those players are veterans with 2k hours, six are newbs with 100 hours. All you get are those names and you know they each have 1000 hive skill.
You can't can you. Hive gets no more information than that. As far as you know, and as far as hive knows, there is a 50/50 chance of each team winning. Lets say a game was won, B team won. Now B team has six players with 1050 hive skill, and team G has 950 hive skill. They play another game with the same teams. Team B wins again. Team B still has 1050 hive skill, and team G still has 950 hive skill. Hive did not give or take any hive skill from either team because it knew team B would win. It was not a surprise. Hive only gives and takes skill when it is is given an unexpected result, such as 950 hive skill team G beating 1050 hive skill team B.
That may have been an overly simplified version of hive, but I think it makes my point. At least I hope I made my point. Nothing is wrong with hive in the scenario you mention. If you disagree please be specific as to why.
Are you saying every game would take 1000 years because they all deal with those same problems?
Microsoft patented TrueSkill system, the supposed industry best, takes on average 90 games before it has a statistically significant skill rating. It can take far longer than 90 games too. That is the industry best. On average a round of NS2 takes 15 minutes. 90 games at 15 minutes would be 22.5 hours of gameplay.
NS2 currently takes at least 200 games before it has a reasonable skill rating. That is 3000 hours of gameplay. I know this is way too long. Hive takes 969 games to be really accurate, which nobody can reasonably be expected to get to. I said early on in this thread that one problem with hive is how long it takes for it to find a reasonable skill rating. This can be easily fixed with the addition of adagrad, as I have said multiple times in this thread. Adagrad would make it so hive finds a reasonable skill value in as soon as 10 games. The underlying hive skill system is not useless.
Look at that. I am making progress. In your own way you have said that hive does not work currently because of the small playerbase.
Still though, hive would not be useless even if we have 2 players total. It would be a waste of time to use it, but the system still works the exact same.
First of all, one of these is not like the other. KDR and Winrate are values typically between 0 and 2. Some super skilled people, or stompers, manage to get higher values than 2. Playtime is a value between 0 and infinity basically.
Lets take your formula and apply it to three real players.
Player 1: 1.47 W/L, 2.36 KDR, and 46.63 hours of playtime. 1768 hive
Player 2: 0.78 W/L, 0.69 KDR, and 638.23. 255 hive
Player 3: 0.98 W/L, 0.49 KDR, and 201.67. 1202 hive
Your formula gives them the following skill values:
Player 1: (1.47+2.36 46.63)/3 = 16.82
Player 2: (0.78+0.69+638.23)/3 = 213.23
Player 3: (0.98+0.49+201.67)/3 = 67.71
Assuming a higher skill value is better, does Player 2 look like the most skilled of those three players? Now how well does this system work with a player as skilled as Tane playing on an alt account? Hive doesn't do much better with alt-accounts but it does do better. It doesn't. If you are going to use time played as a determinate of skill, you need to use it a different way.
In that massive post I showed I a picture of 2025 players ranked 1 to 2025 by their hive skill, KDR, W/L, and SpM. In it you can visually see that hive does a fairly good job and ranking similarly to those metrics. I don't think you looked at it. This time I did the same thing but included time but only with 99 randomly selected players over 100 hours recorded. I also included time in the picture until
In this picture you can see that hive ranks players 1 to 99 similarly to KDR, W/L, SpM, and Time. You can also see there are players who hive seems to get wrong. Nobody is saying hive is perfect, but I am saying it does a fairly good job. At least at ranking players. This is even more apparent in the version I did with 2025 players. If you want to look at that, go read that massive post.
Btw, is there any reason these fixes for hive haven't been tackled yet by the devs?
Wow ... keyboard hero of the internet
My mind is blowing right now >:D Litterally UWE should hire this guy to make new BeeHIVE system
Alright, thanks.
It's no worth bloodying your forehead on your keyboard from people that apparently know than you due to a "gut" feeling.
In TAW we are balancing the teams out of a database we built for ourself.
In values in those databases are:
- skill (same value for marines and aliens)
- main lifeform
- secondary lifeform
- commander skill
- can/wants to command
The values for the skills are handpicked btw. So we have a couple of guys in TAW who rate the skill in of those guys and then we use those values. This system has been proven to work rly well more often than not.
Calling @Neoken for further details
Wow, really? that process must takes hours!
It is probably more productive to try and cultivate a community that concedes when the game is obviously unbalanced and starting a new game to get another try. Just normal forced random works. New members can either fit in or go elsewhere and continue crying about stacks.
Ok i'll give you a little exemple (spawnpoint biais):
Game 1 : Tram, alien spawn in shipping, even skills --> aliens get stomp
Game 2 : Veil, exact same team but this time as marines, against an alien spawn in pipeline --> marine stomp
You have in 2 games a 50% winrate, but you don't take in account the fact that these games were sh-it.
But you are right over a long enough play time the sh-it games tend to don't matter, because you will have 50 sh-it loses for 50 sh-it wins.
Stats are here to help you use the data, but you still need to apply them to a reality.
In the end I disagree when you clear hive from the balance problem, because it's at the center of it. The exemple i gave can be fixed even with a low player base.
NB : I really hope the rating system wasn't add to evaluate the stomp games for this purpose or else it was just a long detour.
@Yojimbo There is dozen of them !
I know map spawn imbalances exist. For some maps they are pretty bad.
Have you seen any other skill system for map spawn points? I am not entirely opposed to it, because there are map imbalances, and spawn point imbalances. There is a tradeoff here. If you were to balance for map spawn points you would increase accuracy at the cost of how fast hive can rate a players skill. Microsofts Trueskill takes on average 90 games to find a statistically significant skill value. If you did that for each spawn, it would be 90*(the number of map spawns).
My point is that if you had a large enough playerbase, you could find near skilled games. Tinki, if you played a game with 11 other players of equal skill to yourself it would be a much better experience than what you could find in pubs right now.
@Tinki, you don't need to tell me what stats are and are not. This is from another thread on another subject, but it is related.
Hive is not the core issue here, it is the low player count combined with a high skill ceiling creating the poor experience.
Nordic I think you got your hours and minutes mixed up in one of these examples here.
90 * .25hrs[15 min] is 22.5 hours but 200 * .25 = only 50 hours, or 3000 minutes, nowhere near as bad as what you're claiming. Where does the 969 number come from?
I'm also curious how in your random sampling of hive members you are accounting for the massive tracking bug that affects many players. To be specific, my Hive account was affected in the following way. For about a year my W/L, score, rounds played, time spent in team and hive skill continued to be tracked accurately (at least, it seemed that way to me) but my KDR and assists froze. Because of this I think that I, or anyone else similarly affected, would seriously skew the results in any comparison like your 2025 player or 100 player posts above where you are correlating KDR to hive skill.
Is there a reason it cannot just weight the MMR gained/lost per round based on spawn point bias (from say, a periodically generated table of W/L deviations from the global average)? Doesn't seem like there would be a need for separate spawnpoint-based player MMR values.
Nice catch on the math.
The 969 number is from some modeling Moultano did. Look at the thread I linked that had the explanation of adagrad.
I don't account for that. How I got my random number is this. I took all the players who had played a game between January 1st, 2016 until February 12, 2016. In excel I assigned some players a value of 1, and some players a value of 0, so that the 1 players were evenly spaced. Is it a perfect way to get a random sample? No. Is it a perfect analysis? No. It really just made that to give people a visual measure compared to a mathematical one.
If the KDR values are skewed because of that freeze, I have no way to know.
I honestly do not know. I took the suggestion at face value. That actually doesn't sound half bad. That is more a question for Moultano.
I don't know if you read the massive post I wrote further back in this thread, but I do compare hive data to another data source. You will have to read the massive post to understand this table in context, but take a look at TF KDR and H KDR on the right side.
The TF KDR is the KDR value from data collected on Tactical Freedom. Hive KDR is the KDR value from data collected with hive. Hive has been collecting data for far longer than Tactical Freedom has, but hive also has had database issues.
The differences can't really be explained easily. Hive having freezes could explain some of that, but we don't know. I know my KDR on hive is higher than it is on TF. That is because TF typically has higher skilled players than the average pub of NS2. TF is often more of a challenge for me.
Even though there are differences, they are close. At this point this is showing the weakness of this form of analysis. We are comparing two things in rank, and then comparing another two things by rank to check accuracy.