The observant amongst you may have noticed the Pouch's YARS rating system is currently down. This was partly technical because the way it was coded placed great strains on the diplom.org machine but also because certain people questioned the meaningfulness of the YARS ratings. This article introduces a new Diplomacy rating system, based on the ELO chess methodology, that should be up and running at the Pouch by the time the next issue of the Zine hits your Web browser.
In many ways, both YARS and the Hall of Fame work well in their intent to judge the best and most prolific players. However for us mere mortals the ratings are not comparable. For instance, in YARS a slightly negative rating may indicate frequent above average play or dreadful but infrequent play. What we need is a rating system that only judges ability, not the number of games played.
EIDRaS is an abbreviation of ELO Inspired Diplomacy Rating System, as you may have guessed it thus has the following advantageous ELO properties:
Players of similar ability have similar ratings allowing GMs to designate games for players rated 2000+ or between 1000 and 1400. Players joining such games will be guaranteed a more even quality and thus hopefully better quality game.
The difference in rating between two players is indicative of how their performances should differ if paired in the same game. For instance if you were rated 200 above France, you'd be expected to score half a point more and this would be the case if you were rated 3000 and France 2800 or you 800 to her 600.
The abilities of opponents affect your rating. Beating up on a bunch of newbies will do your rating a lot less good than soloing against quality opposition. Conversely, losing to poor players will inflict greater damage on your rating.
Recent results affect your rating more than old ones, hence the ability to improve, or regress, is recognised and with time those awful early results will stop dragging your rating down.
Players will tend toward their true rating from 1000 over time, hence the top ratings will not be filled with players who have played one or two games but got a solo, nor will it reward playing extra games once a relatively accurate rating is established.
The rating system includes map variants on an equal basis.
Each player has a rating R and each game that rating changes by an amount ΔR (delta R) depending on the result, S, and the expected result, X, which itself depends on how you compare with the opposition ratings wise. Your rating should be considered as an approximate measure of your ability which with added results becomes more accurate. The K factor represents this by reducing the degree of rating change as the number of games you have played increases. The formula is simply
ΔR = K(S  X)
S = W.n ÷ N is the game score where n is number of players, N the number of winners, and W is 1 if you won/draw or 0 if you didn't.
X is your expected result depending on your rating compared to your opponents. X = n.e^[c.R] ÷ Σ_{j}(e^[c.(R_{j})] ) where c = 0.002 and the R_{j}'s are the ratings of every player in the game including yourself. If more than one player played a nation, the time weighted average is used for R_{j}. For the mathematically inclined this becomes the ELO formula when n=2. [1]
K is a rating change factor. It is a measure of how much more accurate your rating has become as a result of this game information. Hence it depends on number of games you have played before, the press settings and how many of your opponents were provisionally rated. K is calculated by the formula K=(max(50s/(g+5),s) where g is the number of games played; s is given by max(f÷3, p.f) where p is the fraction of provisionally rated opponents and f a press factor taking the value of 20 for partial press; 15 for broadcast only and 10 for nopress. For real time (RT) judge games, f=f4
The system will be seeded by a iterative method. All players are estimated to be worth 1000 rating points and HOF results are put through the above formulae to generate ratings. The output ratings are then used as new estimates and results fed through the system again. This is repeated until the variation in output ratings is small.
Newbies start with a 1000 point rating which will vary as per the above formulae. For the first seven games their ratings will be considered provisional and have less effect on the changes of fellow players rating via the K factor.
If more than one player plays a nation, the nation's rating is
assumed to be the time weighted average of the players concerned.
(Time being measured by the number of movement seasons each was at
the helm.)
Abandonments ratings will change by: ΔR
= min(0, tΔR÷(t+T))
where t is the number of seasons you played, T the number you
missed. It is not the place of an ability rating system to hurt the
undedicated however annoying they are, but I don't think they can
benefit either. It has proven very difficult to find a fair formula
for replacements so their rating is unaffected by such games. Like
old age this is not ideal, just better than the alternative.
A group of seven established dippers play three games, one right after the other. Note that the players take into each game the rating they hold as a result of the last game. Here are the results:
Name 
Initial Rating 
After ABC draw 
After D solo 
After ABCD draw 

Another Stabber 
1300 
1319 
1290 
1299 
Bobby Bull 
1000 
1032 
1015 
1135 
Cannon Fodder 
800 
837 
826 
850 
Dave Decent 
1400 
1366 
1475 
1471 
Elaine Egotist 
900 
888 
875 
864 
Fluent Liar 
1100 
1082 
1064 
1047 
Gil Gullible 
1200 
1177 
1156 
1135 
Note how the ratings of A, B, and C, who all achieved the same results, converge, and similarly for E, F and G. Against this opposition, D really needs to win, which still has a healthy effect on his rating, but the fourway draw actually leads to a rating decline for D because he should have been able to achieve better.
EIDRaS has been developed by George Heintzelman and myself. Thanks to Brahm Dorst for initiating the rec.games.diplomacy newsgroup thread that buoyed us into action and to all the r.g.d contributors for helping to shape the system. Thanks to Manus for agreeing to host EIDRaS at the Pouch and offering to help code it up (ready, Manus?)
[1] Actually the constant is different, we use the natural logarithm and X averages 1 rather than 0.5 for chess so you cannot use this to compare your chess vs. Diplomacy ability, but the form is essentially identical.

Tony Nichols 
If you wish to email feedback on this article to the author, click on the letter above. If that does not work, feel free to use the "Dear DP..." mail interface.