|
richard
|
 |
« Reply #1 on: August 15, 2008, 01:17:38 pm » |
|
Hi Revenant,
Some interesting thoughts. Forcing conformance to the bell curve is an intriguing idea but I think it might be a case of the cure being worse than the disease from some peoples point of view. I can imagine some might overlook the mathematical beauty of such a system and simply ask, "What the hell did you do to my rating?" :-) Disbelieving heathens they may be, but users do get (overly?) attached to their ratings and having the system automatically wiggle them when they are even logged in might be seen as well....rude :-)
I haven't read Glickman's thesis directly , only his water down "glicko for dummies" descriptions on his site (which btw I'm only just smart enough to understand :-) ). To my knowledge he doesn't talk about forcing compliance to a curve (but as I haven't read the raw thesis, he may well mention more advanced ideas there). I think in theory glicko should (given infinite time) arrive at the theoretical curve. The system here has a few things that make it a bit tricky to analyse in terms of a standard rating pool. For example there are some asymmetries between problems and people that don't exist in the real world. In a normal chess game both sides of the board are under the same conditions but the problems here really have their hands tied in some ways. In standard for example the user can take as long as they like. Obviously the longer someone takes the more they are likely to see, so in a sense they can vary how tough they are by varying how long they spend on each problem (others have noted this as a problem with standard mode in general). Now the poor old problem has only one gear of difficulty which is set as soon as the moves of the problem are defined. I haven't tried collecting the data yet, but I'd be interested in seeing the graph of average time taken per problem across all users over time. I suspect that one aspect of the tendency for standard ratings to grow is that once users get to a point where they are having trouble moving their rating higher, they begin taking longer (as the only way to get higher ratings at that point is to improve accuracy) and taking longer is an accuracy "quick fix". Of course this eventually runs into diminishing returns, usually (I guess) as users begin to hit the edge of how far forward they can accurately calculate. Anyway, that is a long and rambling paragraph so I'll finish it now :-)
I don't currently have graphs of the user population's rating distribution. I do have this for problems however and they are not completely bell like. They are fairly bell like, but not quite as bell like as the curves shown on CTS. There is a definite skew to the left in the problem rating distribution (which isn't that surprising given that the highest rated users can achieve a 90% success rate - a rate that is only possible due to lack of higher rated problems , glicko would be given even the highest rated players 50% success rates if it had more hard problems to present to them).
I'm hoping that rating and selection algorithms will not require as many changes as they have undergone recently , so hopefully that will settle down. Also as the problem set grows the impact of introducing new problems becomes much less (especially if they are all generated with roughly the same generator code , i.e. not to many more fancy new "alternatives" features.). Once these things have settled down I'll be in a better position to assess things like long term rating drift (which I still believe has a natural ceiling).
Regards, Richard.
|