Chess Tempo

Username:
Password:
/ Register

User Details

Username:
Blitz Rating:
Standard Rating:
Logout
December 02, 2008, 02:31:50 pm *
Welcome, Guest. Please login or register.
News: SMF - Just Installed!
 
Pages: [1]
Print
Author Topic: Global fit to ratings bell curve  (Read 239 times)
revenant
Full Member
***
Posts: 166


View Profile
« on: August 15, 2008, 01:30:07 am »

Hi Richard, in recent threads you've mentioned that changes in one's absolute numerical rating in any given time period might or might not reflect a corresponding change in one's skill level since the ratings distribution of the whole user population tends to drift and migrate upward or downward as circumstances change in the server code, rating policies, and the inclusion of new problems in the problem set.  The percentile indicators on our stats pages go partway toward clearing this up, but to help us judge our true performance more easily, had you considered constraining the ratings of all problems and all users to some fixed bell curve?  (i.e. a fixed median of say 1500 and a fixed standard deviation of whatever it is now, like 200? 400?)

Making the change would require regular updates & adjustments to all the ratings.  Maybe it could be carried out once a day, or once an hour, or maybe there's some way to code the change immediately on completion of any given problem by any user.  Although I suspect some tricky math bugs might then tend to creep in.  The whole idea might be unworkable or fundamentally flawed.  Just throwing it out there.

Does Mark Glickman's thesis & documentation on the Glicko rating algorithm cover the topic of reliable & equitable ways to constrain a Glicko-rated system in such a fashion?  I think part of the reason he invented the algorithm in the first place was to deal with issues like closed-pool ratings inflation & deflation in ways the more primitive Elo can't.

Come to think of it, do the current ratings of the whole population conform to a bell curve at all?  Or is the curve stretched out on one end more than the other (e.g. by discarding low rated problems)?  It would also be interesting to know the average difference between problems' blitz ratings and their standard ratings, then sort on the criterion of that difference.  I'll bet the outliers would turn out to reveal insights into the human mind's peculiar strengths and weaknesses in this rather peculiar game.

If there's no way to make the global ratings conform, you could also consider putting up a sort of system-wide stats page that tracks the progress of the global curve from day to day in much the same way the graphs on the individual user's stats page track their individual ratings.  That way if a user comes back to Chess Tempo after a month away and sees that their percentile has dropped, they can check the global graph and realize they can probably make up the difference by forging ahead on solving.

Otherwise, I suspect that a number of users may be needlessly discouraged or put off from the site because they get the impression they're running on a treadmill that has suddenly ratcheted up to an inhuman speed.  When they get that feeling they won't necessarily report it or complain to anyone, they might just silently fall off the edge of the earth and you never see them again.  The opposite problem can also occur if there is a sudden global migration downward.  High-rated users might see their shiny new higher percentile and feel no incentive to solve any more problems for fear of losing their points.
Logged
richard
Administrator
Hero Member
*****
Posts: 990



View Profile
« Reply #1 on: August 15, 2008, 01:17:38 pm »

Hi Revenant,

Some interesting thoughts.  Forcing conformance to the bell curve is an intriguing idea but I think it might be a case of the cure being worse than the disease from some peoples point of view.  I can imagine some might overlook the mathematical beauty of such a system and simply ask, "What the hell did you do to my rating?" :-) Disbelieving heathens they may be, but users do get (overly?) attached to their ratings and having the system automatically wiggle them when they are even logged in might be seen as well....rude :-)

I haven't read Glickman's thesis directly , only his water down "glicko for dummies" descriptions on his site (which btw I'm only just smart enough to understand :-)  ).  To my knowledge he doesn't talk about forcing compliance to a curve (but as I haven't read the raw thesis, he may well mention more advanced ideas there).  I think in theory glicko should (given infinite time) arrive at the theoretical curve.  The system here has a few things that make it a bit tricky to analyse in terms of a standard rating pool.  For example there are some asymmetries between problems and people that don't exist in the real world. In a normal chess game both sides of the board are under the same conditions but the problems here really have their hands tied in some ways.  In standard for example the user can take as long as they like. Obviously the longer someone takes the more they are likely to see, so in a sense they can vary how tough they are by varying how long they spend on each problem (others have noted this as a problem with standard mode in general).  Now the poor old problem has only one gear of difficulty which is set as soon as the moves of the problem are defined.  I haven't tried collecting the data yet, but I'd be interested in seeing the graph of average time taken per problem across all users over time.  I suspect that one aspect of the tendency for standard ratings to grow is that once users get to a point where they are having trouble moving their rating higher, they begin taking longer (as the only way to get higher ratings at that point is to improve accuracy) and taking longer is an accuracy "quick fix".  Of course this eventually runs into diminishing returns, usually (I guess) as users begin to hit the edge of how far forward they can accurately calculate.  Anyway, that is a long and rambling paragraph so I'll finish it now :-)

I don't currently have graphs of the user population's rating distribution.  I do have this for problems however and they are not completely bell like. They are fairly bell like, but not quite as bell like as the curves shown on CTS.  There is a definite skew to the left in the problem rating distribution (which isn't that surprising given that the highest rated users can achieve a 90% success rate - a rate that is only possible due to lack of higher rated problems , glicko would be given even the highest rated players 50% success rates if it had more hard problems to present to them).

I'm hoping that rating and selection algorithms will not require as many changes as they have undergone recently , so hopefully that will settle down.  Also as the problem set grows the impact of introducing new problems becomes much less (especially if they are all generated with roughly the same generator code , i.e. not to many more fancy new "alternatives" features.).  Once these things have settled down I'll be in a better position to assess things like long term rating drift (which I still believe has a natural ceiling).

Regards,
Richard.
Logged
tmr
Jr. Member
**
Posts: 58


View Profile
« Reply #2 on: August 17, 2008, 06:43:56 pm »

I think a bell curve adjustment makes a lot of sense for a performance tracking perspective.  I'm not so sure about a dynamic adjustment but one that limits the ever increasing ratings level would be helpful.  As it is with problem set changes and the current rating methodology, increasing ratings at the upper level are driven in part by the number of problems done.  The bias isn't great (I think it's about 0.1 points per problem done on average, problem set changes aside) but with a large number of problems it can add up.  With a bell curve the top rating would be fixed.  That is the natural rating level has been forced on the problem set rather than having it be achieved over time by users doing problems.

While I think Richard is right that there is a natural upper level for ratings, with the current method this level will be affected by the user population over time and in standard mode by how much time folks are willing to spend on solving any particular problem. 

I don't think this level has been reached at least for the standard mode, especially considering the problem set changes.  Adding new problems will likely push reaching a natural level further out.  The last two problem set changes have resulted in about 150 point increase in standard mode ratings, at least for those ratings at around the 1900-2100 level.  The bell curve approach would solve the ratings changes with problem set changes, whether small or large. 

Logged
richard
Administrator
Hero Member
*****
Posts: 990



View Profile
« Reply #3 on: August 19, 2008, 04:49:56 pm »

I'd still like to wait a bit longer to see where the current problem set settles at, but this is certainly something worth looking at.

Richard.
Logged
Pages: [1]
Print
Jump to: