In general, the competition scoring system is designed to provide information about the player’s skill development by combining the information in the new game outcomes with the competitor’s skill. Most of these systems are used to rank two-player competition, but recently ranking systems evolving multiplayer games have been introduced. There are several ranking systems that have been used in educational evaluations, including the Ingo system, the Elo system, the Glicko system, and the EdoRod system and TrueSkill. With these methods, stakeholders can rank the university, and each university administrator uses the student’s performance to accept the student into the university. After this section, this chapter will describe the similarities and differences of some well-known ranking systems for two-player games, and target on the TrueSkill system, which is designed for multiplayer games that university group network will be mentioned afterwards.
Ingo system is one of the first ranking systems to produce mathematical ratings. It was developed in 1948 by Anton Hoesslinger. The system calculates player’s ranking based on performance of average player. Unlike other systems, Ingo system lower scores revealed better performance. The limitation of this system is that Ingo’ rating is dependable with subjective ranking of players and a player still get rating points although a player loses in every game.
Elo system was named after its creator Arpad Elo in 1995 and adopted by the World Chess Federation in 1970. It is the most widely used system in competitive games such as chess. Unlike the Ingo system, in the Elo system, a higher scores means the better performance. The performance is measured by wins, losses and draws. It infers that player’s performance distribution follows either a logistic or a normal distribution. The performance rating of a player is a function of competitor rating. The rating is adjusted higher if the real score is higher than that player’s expected score while the rating is adjusted lower if a player’s real score is less than that player’s expected value. However, there are some limitations of the Elo as the following.
- Elo uses player’s most new ratings as the current rating even though the player has not gone up against in the contest for a very long time.
- Elo treats draws as a half win and half loss. This is not fair because draws can imply similar strength values between players but a win or a loss show that one player is better or worse than the others but can’t know how much better or worse. Therefore, Elo draws do not tell much information as they could.
- Elo assumes player’ performance is normally distributed random variables, with the same standard deviation. It sets this variance as a constant and does not try to infer it from the data.
Glicko System Glicko System was developed by Glickman in 1999 and was the first system using the Bayesian rangking system. It is based on Bradley-Terry paired-comparison method. It assumes that the skill of the players follows a Gaussian distribution. The rating in Glicko system is computed in the same way as to Elo system but it also includes the rating deviation (RD). RD is a standard deviation that measures the uncertainty of the rating. However, Glicko system’s rating may not change precisely because the RD is small making it can’t capture the true change in skills of players who compete often.
The Edo system was developed since 2004 by Rod Edwards. Like the Glicko System, Edo is based on the Bradley-Terry model but has been applied only to historical games from 19th century. The system treats the same player at two different years as two different players. Although it estimates separated players better than Glicko, it still has some limitations. The limitations are Edo does not provide both posterior distributions over skills and explicitly model draws because it is not a full Bayesian model.
TrueSkill ranking system
Many rating systems have been developed to games and sports that have two players or teams but there have not been many popular systems that can work with diverse amount of players. In 2005 Microsoft developed TrueSkill which a rating system improving on the older system. It can manage ranking players on their Xbox Live platform. The ranking system purposes are to identify and track the skills of players in games to match them into the right matches. It has the ability to pull out the individual player’s skill from team results. In case of TrueSkill functions, the ranking system skill is characterized by two numbers µ and σ, which are updated based on the outcome of a match. µ is the average skill of the player and σ is the degree of uncertainty in player’s skill and is changed dynamically before each match, the initial values used for run are a mean (μ) of 25 and standard deviation (σ) of 25/3. TrueSkill calculates and updates rating after each game by using Bayesian approximation method. In a game, each player is assumed to have a prior skill with mean and standard deviation with a prior normal distribution over the skills in the league.
Gaussian distribution is also called the “Normal distribution” and is often described as a “bell-shaped curve”. Normal distribution are meaningful in statistics and is used represent actual valued of random variables because the basic problem of statistical inference is to infer on an unknown distribution based on a sample that is believed to come from that distribution.
Normal curves are defined by two values
- The average value called mean which is represented by μ (mu).
- The standard deviation indicates how disperse the data is ,represented by σ (sigma). When calculating the standard deviation, the general result is68% of values are within 1 standard deviation of the mean. 95% of values are within 2 standard deviation of the mean.99.7% of values are within 3 standard deviation of the mean. At -3σ away from the mean on the left side in the normal curve, most of the area under the curve is to the right of this point. TrueSkill algorithm uses the -3σ mark as a conservative estimate (computing your skill). A player might be better than conservative estimate but most expected not worse than this value. It’s a fixed number for comparing among the players and is useful for leaderboard.