Wednesday, February 13, 2008

Judges' Contest 15th Place

Now that a fairly reasonable number of games have been eliminated from the GOTY contest, it seems like a fine idea to try to truly gauge how accurately each of the judges is actually performing. While I've been critiquing both their rankings and commentary every week, some debate seems to have sprung up both amongst the judges themselves and the fans as to which judges are really ranking the games most correctly. I'm definitely not opposed to such debates, but as usual why not try to ascribe a more concrete way to really determine that?

So we'll have a "judges' contest" to help us all decide who the most accurate judge actually is. The scoring for this will be fairly simple. For each eliminated game, however far a judge was in his or her individual ranking from its overall ranking, they will receive that many points and obviously the lowest number of points is the goal (e.g. for the game ranked 15th, if a judge ranked it 14th or 16th they get one point, 13th or 17th two points, and so on). If two judges are tied then whoever ranked the most recent eliminated game closest will be considered ahead (and the second most recent if tied at the most recent, etc.).

I will also be track how the two people (that I know of) who were brave enough to make their own rankings despite not being judges, myself (my rankings can be found at the bottom of this post and note that I will be using my own individual rankings when I track my results, not my predictions) and FM Braden Bournival (whose rankings are here). I will just be noting our results at the bottom since this is designed mostly to be a contest for bragging rights amongst the judges alone and also because he and I are both at somewhat of an inherent disadvantage as our rankings cannot affect a games' overall ranking the way the judges' can (though for me, being at a disadvantage to start will make my inevitable triumph at the end of this contest even sweeter). So without further ado here are how things stand right now (in parentheses I will note each judges' individual rankings for the games in order, beginning with 20th).

1st Place: Dennis Monokroussos (20, 13, 19, 18, 12, 17): 14 Points

2nd Place: Alex Shabalov (20, 12, 16, 15, 17, 10): 17 Points

3rd Place: Robby Adamson (20, 12, 19, 16, 17, 6): 19 Points

4th Place: Jennifer Shahade (18, 20, 8, 12, 17, 19): 23 Points

5th Place: Ron Young (20, 19, 13, 9, 7, 18): 25 Points

and of course, the two lowly non-judges

Arun Sharma (20, 19, 18, 11, 12, 15): 10 Points

Braden Bournival (19, 20, 9, 17, 7, 4): 32 Points

I'll be updating this every week so that you all can keep in tune with who will wind up being the best judge in the end!


Bionic Lime said...

Isn't this contest actually, "Which judge is the most like the average of all the judges?" rather than the best judge?

Anonymous said...

You know, I never thought of that, but you are right!

Arun Sharma said...

To a point that's true; obviously the judge who is closest to the average rankings will be the one who wins this.

The other way to look at it is this. The reason we have five judges is to smooth out the variance. Simply put, there will be some "mistakes" (of course it's all subjective so that's somewhat a matter of perspective) made and having five of them rank the games should smooth that out. Under that assumption, the overall ranking for the judges should be the "most correct" ranking for the game as a whole (naturally you can still debate if that's actually true, but if Greg and I didn't at least believe that to a reasonable extent we wouldn't have chosen that format for GOTW and GOTY). So of course, once the overall ranking becomes the "correct" one, the judge who is closest to those rankings should have had the most accurate picks.

Again you can certainly debate the merit of whether the person who scores the best in this contest really had the "most accurate" rankings, but there seemed to have been some discussion about that very topic so it seemed like a good idea to try to measure it, and I'm fairly sure the way we chose is about as good as you can get in that regard.

Dennis Monokroussos said...

Since Greg gave us the freedom to vote based on criteria of our own choosing (and no meta-criteria for the criteria, even if he understandably believed that we would all be guided by relatively similar values), there is no "most correct" ranking. I generally chose based on the quality of the game and the opposition, but occasionally other factors came into consideration (e.g. excitement, the duration over which the game remained competitive).

Incidentally, the "judgments" of my chess engines barely featured in my rankings. After making three or four passes through the games to rank them, I only then took a last, fairly quick glance through them with my computer to make sure that I hadn't made any gross errors in evaluating a game's quality.

Arun Sharma said...

Of course, as I mentioned above, there is no truly "most correct" ranking, this whole process is subjective to a large degree.

Nevertheless, at the same time, when having a contest to measure the "best game", I don't really see any better way to fairly determine that other than the democratic type process that we have here. Again, by getting several voters the hope is that what is chosen overall is as "correct" as that term can allow in this type of competition.

As I said, you can certainly debate if this process for measuring the "best judge" really has any validity, we just decided to do it to have an additional attraction to this contest especially since Greg and I had both heard more than a few comments from fans regarding the way certain judges ranked certain games.