|
revenant
|
 |
« on: July 25, 2008, 11:20:14 am » |
|
Hi, Richard. When I first happened onto Chess Tempo last month, I was intrigued by the idea that a chess problem could be treated as an opponent with a ratable strength. Simple and ingenious. But why not take things a logical step further and consider each *position* reached within any given problem to be an opponent in its own right, and rate it separately from the other positions?
In this idea, each complete problem would still be an entity in the system's database, with a move sequence the human player is required to follow. The interface would also remain almost entirely the same. The way it chains from one position to the next is already very nice (and in my opinion, a resounding triumph of programming that demonstrates why we have computers in the first place). The difference from what happens now would be mainly on the server side, with an internal rating calculation taking place after each move by the player rather than just at the culmination of the problem.
Forgive me if you've heard the idea before and rejected it. It seems pretty obvious, but in perusing the Forum "Site Feedback" topics I don't see it directly mentioned. What I do see is a good deal of discussion about the perceived injustice of being marked wrong for a whole problem when you guess the first few moves but miss the last one. I believe rating each position would resolve this issue to the satisfaction of both sides of the debate.
In the hypothetical, newly coded system, suppose you successfully figure out the first two moves of a complex tactical sequence. You are rewarded with a plump rating increase for each move, factoring the amount of time you took into the calculation just as always. Then say you miss a mate in 1 at the third step (or worse, allow a mate in 1) because you got careless. Considered by itself, that particular position is of course a weak "opponent", so you pay the penalty of dropping a bunch of points, negating your accomplishments a moment earlier.
This would increase the over-the-board training value of the site by giving us humans an incentive to stay focused. At the same time, we would no longer have a basis for complaint that the computer is somehow being unfair. The positive and negative "feedback" from the system to the user would be distributed in a more equitable, appropriate way.
Another heretofore thorny question, regarding the number of moves to include in a given problem, goes away or becomes merely aesthetic. You could follow the natural desire to go ahead and make each problem's chain as long as it needs to be for the player to see the true point of the combination. This is already an area where Chess Tempo really shines versus a more cryptic, pared-down site like CTS. It could shine even brighter.
From the user's perspective, there would be no more experience of "Hmm, I guessed that one right, but why was it right?! Oh well, move on to the next one..." Also no more disgruntled "I guessed wrong, all that work for nothing!" when the red "Failed" flag appears. (Sometimes it's like I've just been shot!) Instead, a more conducive feeling of "Well, all right, chess is a hard game and I did my best."
From the system's perspective, many perfectly good problems which were culled from the set in previous runs could be brought back in because you no longer need to be so harsh about ambiguities and alternatives. A vague position may be assigned an artificial rating (or even no rating at all!), while the no-nonsense positions before and after it in the sequence of a problem will tend to settle onto accurate ratings more quickly than they currently do when they are first pushed into the arena.
As you continue to introduce new features from your "to do" list in the months to come and you are wondering how to implement them, I think your choices as a programmer will become much clearer if you are rating positions and not problems. There still might not be one final "right" approach in any case, but at least the system will be more amenable to the sort of intuition that pipes up says, "Aha! *That's* how it should work!"
By analogy, similar gains were achieved in evolutionary biology when scientists began to view the gene and the chromosome, rather than the species or the individual animal, as the conceptual "unit" of evolution and the beneficiary of natural selection. Perhaps in chess each tactical element (fork, deflection, etc.) is like a gene, each position is like a chromosome, and each problem or game is like a whole animal.
At this stage in its life cycle, could the Chess Tempo "ecosystem" of competing and coexisting creatures be reprogrammed in the way I am suggesting? As a C/Perl/shell programmer who plays chess as a hobby, I do understand how difficult it can be to overhaul a mature system. However, if the source code is already well modularized, you might be surprised at how little of it you actually need to revamp.
Many thanks for a useful and enjoyable site.
|
|
|
|
|
Logged
|
|
|
|
|
drahacikfm
|
 |
« Reply #1 on: July 25, 2008, 12:18:00 pm » |
|
An interesting idea, somewhat related to a suggestion I made to Richard a couple months ago: When you have a Mate-in-5 problem, doesn't that also mean you have 4 more problems?: A Mate-in-4 after one move has been played, a Mate-in-3 after two moves have been played, etc. So you could theoretically add 4 more separate problems to the problem set, and each problem would obtain its own rating according to how difficult it was.
As for rating each position within one single problem, and awarding or taking away rating points for each move, I see these arguments against:
1) You reward guessing on the first move, because you can gain rating points even if you did not see the whole idea. This encourages people to move faster on the first move, before they see everything.
2) In real life, I don't know how many games I have lost because I played like a genius for 30 moves, making all kinds of deep positional moves with great plans, etc. Then blow the whole game by one tactical mistake. In real life, you get a big fat zero in the crosstable and lose rating points, for that one mistake on move 31, even though you were a genius for the first 30 moves. No partial credit in tournaments, or even in blitz games.
A couple points I didn't understand:
"many perfectly good problems which were culled from the set in previous runs could be brought back in because you no longer need to be so harsh about ambiguities and alternatives."
I don't think that assigning ratings to each position allows this. You still need the same restrictions on alternates and allowed problems. For example, Richard now looks at the top 4 moves in a position. If 3 are winning, and the 4th is not, then move 1 is the solution, moves 2 and 3 are alternates. If all 4 are "winning", then that problem is thrown out, because there is no way to know if the 5th and 6th moves are also winning. Assigning ratings to each position is not going to make that ambiguity go away. It would still be a horrible problem/position if it was included in the set and has 6 winning moves, but you get marked wrong for the 4th, 5th and 6th best moves.
"From the user's perspective, there would be no more experience of "Hmm, I guessed that one right, but why was it right?! Oh well, move on to the next one..." "
Assigning a rating to each position within a problem doesn't address the issue that some problems here have solutions with only one move. That's a separate issue relating to the pruning code for deciding when to terminate a solution. I don't see how adding ratings to each move will produce more moves in the solution, or make it easier to add more moves to the solution.
|
|
|
|
|
Logged
|
FIDE Master Drahacik
|
|
|
|
revenant
|
 |
« Reply #2 on: July 25, 2008, 01:23:10 pm » |
|
Hi drahacikfm, good questions and food for thought. Taking them one by one, I think the math would work out such that incremental guessing is neither rewarded nor penalized. What will happen is that if the key move of a particular position (such as a flashy queen sacrifice) is easy to guess because we know a tactic is there, low-rated players may gain points from *that position* and it will settle at an appropriately low rating, but when they have to prove they understand the followup positions they will guess wrong and the net rating gain over the course of the problem will be no different than under the current system. We aren't changing the fundamental difficulty of the problems, we're just employing a finer granularity in calculating their ratings, like noticing that a dog's "bark is worse than his bite".
The experience of losing a long tournament game because of one oversight is one with which I am certainly also dismally familiar (though not at master level of course -- in fact I never made it past 1800 USCF when I was active 15 years ago). The question in this context however is how much and in what ways Chess Tempo should approximate real-life OTB play. I liked your reporting in another thread about doing hundreds of the mate-in-x problems before a tourney. CT lets you choose a subset of the chess universe to play around in, then when you're done you go out in the world newly armed and feeling ready. True, partial credit is not given IRL (unless you want to count the half-point for a draw). However, that doesn't mean CT should be trying to impart the same lesson. It has plenty of other good lessons it is giving very well, especially a sense of the wonderful magic inherent in many positions.
Ambiguities and alternatives: OK, say we have a neato problem that unfortunately got thrown out because at some point there were 3 moves that Toga evaluated as being within .2 pawns worth of each other. And say the other positions in the problem sequence were actually very cut-and-dried, where not even masters with different styles would have any disagreements about what to do, and potentially valuable tactical themes could have been shown. Bring the problem back in, but when the user gets to the vague position, don't bother rating their try. Just tell them "Good, you made a good move, but what I'm really interested in is what you would do about *this*," and bump the display forward to the next position as usual. Like a chess teacher, not a chess opponent trying to glom the full point off you in a last-round money game. :-)
My point about the number of moves to include in a problem wasn't that a computer can automatically tell Richard how many are needed (it can't, only humans can decide that in the end). Rather, it was that if the system is rating positions and not problems it frees him up to include as many moves as he likes without worrying that a problem will become too diluted or hard to rate. Just stop cranking the handle when there is no more chess knowledge to be wrung out of the problem. Heck, you could even change the "Last problem for session" checkoff box to a "Last *move* for session".
It's nice that someone thought these various ideas were worth considering. Hope I'm making some sense and that more dialogue is forthcoming. It's the end of my day and I'm ready to go to sleep. :-)
|
|
|
|
|
Logged
|
|
|
|
|
richard
|
 |
« Reply #3 on: July 25, 2008, 04:09:27 pm » |
|
Hi revenant,
I also find this an interesting idea (I've done some work wih "genetic algorithms" and similar approaches to problem space searching in the past so I like the evolutionary metaphors [or is it an analogy - I've never had a strong grasp of the difference :-) ] ).
I do share a couple of Drahacik's concerns: 1) I tend to agree with the "in real games you do lose when you make a mistake" argument (or to steal your analogy, the genes don't propagate to the next generation if the organism carrying them does something silly :-) ). I think there is some evidence that training done in situations that best match the target skill are most likely to cross-over to improvement's in the skill. Of course there are lots of examples of useful non-specific training drills from other endeavors that probably improve performance , so I don't think this one is a show-stopper. IMO finding the initial sacrifice but not the follow should if anything be punished MORE not less, as for me it is exactly these kinds of things that can lose me games (oh and the usual run of the mill non-sacrifice blunders don't help me either :-) ). In essence if I can't find the follow up then the sacrifice is simply a blunder.
2) I'm still not 100% sure I understand the "handle ambiguity better " part of the idea either. Are you suggesting that if there was alternatives a,b,c and the user chose say b or c then the problem should continue down those paths? If so then I should point out that this can get VERY expensive in terms of the time taken to generate a single problem. I still need to perform ambiguity checking as I can only look at the N best moves and if all N are good then I don't know if there was a move that was N+1 best that should also be allowed. Doing this checking down every branch can explode fairly quickly (even allowing only the 3 best). One of the reasons for the current "alternative move" approach is that it limits every sub-branch off the main line to depth 1. If I followed the alternatives at each point then I need to follow their children's alternatives etc etc. I agree that it would be nice to wander through all "good" branches of a problem (like a pleasant meandering walk through the forest :-) ), but the potential branches in the path make this computationally very very expensive.
I could probably say more, but it is after midnight here so I better get some sleep before I lapse even further into incoherency :-)
Regards, Richard.
|
|
|
|
|
Logged
|
|
|
|
|
revenant
|
 |
« Reply #4 on: July 25, 2008, 10:29:49 pm » |
|
Not at all; you're both raising some very good points that test my (possibly mistaken) assumptions. The "exploding tree branches" argument seems to refute my suggestion of restoring nice problems that happened to have too many alternatives. Some aspects of the system are best kept as simple as possible. However, so far in this thread, I have the feeling we're all "treading water" in the sense that we have each have a worrisome concern and an intuition about how the system should work, but none of us can prove his idea is effective because we don't have enough data from a live system.
What we have instead is a huge amount of data (1 1/2 years of history on your hard disk) from one exceptionally robust system, Chess Tempo, that has implemented at times one and at times another policy for calculating ratings and serving up problems. We're like doctors trying to diagnose a patient who seems to be healthy but is showing some odd symptoms we'd like to investigate. For example, in a recent thread we were talking about the ratings inflation that was concomitant with the new policy of allowing alternatives and giving them a "Look for a better move" flag. The gain was very pronounced on my own subsequent rating graph. Before the changeover in June, you predicted the effect but you couldn't be sure how large the average gain would be (100 points? 200?) or how fast the community of users would settle onto their new ratings.
So, may I point out something else I've observed from combing through a subset of the available data? Using your newly offered premium features, I created a problem set with one criterion: "Problems I always got wrong". I named it "Rats!" :-) The set has 1077 problems out of the 2900 or so I've attempted. I sorted the set by standard rating and have now begun working my way up from the lowest-rated problems in the list, reexamining them to see if I could figure out what was wrong in my "chess move generation engine" (i.e., humble brain) and whether I can correct it.
What I've found so far (the lowest 40 or 50) is that most of them were simple slap-forehead blunders but a few of these problems exhibit something strange: 22481, 34724, 36491, 38978, and 39132 are rather mysterious positions whose low standard ratings belie their depth. They also seem to have a greater than normal disparity between their blitz and standard rating. (Richard, can you help quantify this? What is the statistical correlation between the two and what is their average difference in the problem set at the moment?)
Look at 36491. Standard Rating 1328.6, Blitz Rating 2277.8 (!), and 8 (!) wrong first moves attempted so far. The answer, 1... Qb3, is anything but obvious. It seems to be aimed at supporting the progress of the passed c4-pawn (else the computer as white would not feel compelled to chop off the pawn with its next move 2. Nxc4), but how many people who guessed the "right answer" knew why it was right?
I think if CT were to rate positions and not problems, it would go a long way toward solving the mystery. We would get to see how difficult the initial position really is and how difficult the followups are by comparison. When all you have is one blurry rating for a complex tactical sequence comprising 1 or 2 hard moves and 1 or 2 easy ones, it's hard to learn anything useful from the problem or determine how to modify one's strategies in play. Whereas if the system could show me "OK, that first position now has a Standard Rating of 2100 but the followup positions are 1500", I would know what parts of my game need work. (Note that I'm *not* saying the interface should display any of the positions' ratings while the user is actually attempting the problem; that gives away too much information. Show the rating calculations at the end of the problem, same way as now.)
What will always be true is that no matter what rating formulas Chess Tempo implements, the "playing field" (i.e. the total problem set) is the same for all the users at any given time and so the bell curves give us a precise idea of our strengths relative to each other. You can and should have a feeling of freedom to try out new features and practices that will make the site even more useful as a training tool. Rating positions and not problems might give you the extra data you need to make those improvements in a surefooted way.
|
|
|
|
|
Logged
|
|
|
|
|
richard
|
 |
« Reply #5 on: July 26, 2008, 03:41:15 am » |
|
Hi Revenant,
If I read you correctly, one of the things you are looking for from a more position versus problem approach is better visibility of how parts of the problem differ in difficulty. I think some of this could be achieved by better use of the "mistakes" data. Essentially for all problem attempts I know the problem's rating, the user's rating and the mistake made and on what move it was made. This is enough data to produce a rating for each position in the problem without any other changes. I guess a simple way of doing this would be to just present a table with move number and rating (probably blitz and standard) as columns. This is a less radical approach which presents some of the information you'd like to see without fundamentally changing the way problems are marked.
Prioritizing work at the moment is a tricky process, with lots of factors involved, including how much benefit they are to users, how long they would take to implement, would they encourage a lot more premium users and how much fun they are for me to implement.
At the moment I'm putting a bit more weight on breadth rather than depth so features that add incremental benefits are less likely to get short term attention than a feature which offers a completely new feature.
Regards, Richard.
|
|
|
|
|
Logged
|
|
|
|
|
revenant
|
 |
« Reply #6 on: July 26, 2008, 05:07:54 am » |
|
Wow Richard. I'm really glad to see the idea make it onto the "to do" list in any form. Every issue I've raised here in the Forum has evoked a timely and reasonable response. You must have been joking about needing to go to sleep, because I don't see how you ever can! :-)
Seriously, though, I suspect that if you ever do manage to clear all the major items off your list, it will be because at some point you realize a full redesign and rewrite is required so you go ahead and do it. If and when you make that decision, maybe that will be the time to come back and revisit the idea of rating positions.
|
|
|
|
|
Logged
|
|
|
|
|
richard
|
 |
« Reply #7 on: July 26, 2008, 07:35:53 am » |
|
re: sleeping
:-)
I think all developers would love to have the time to redesign and rewrite projects after they've got the first working version. There are so many things you don't fully appreciate when writing the code the first time and then when users actually start using the thing well...
The trick is finding the breathing space to actually do the rewrite. Then it becomes an issue of being able to justify the rewrite from a commercial point of view. Usually things don't end up getting rewritten until development on the old code gets so difficult that the rewrite starts to make more and more sense, and I'm not at that level with the chess tempo code yet. I *have* reached that level with the generator, it has parts that are way more complicated and fragile than they need to be, so I'm planning a complete generator rewrite before making any further major changes to the generator (there may still be some small bug fix changes before I get around to the rewrite).
Regards, Richard.
|
|
|
|
|
Logged
|
|
|
|
|
slacker00
|
 |
« Reply #8 on: July 26, 2008, 08:32:34 am » |
|
revenant, I like the idea. Anything that can help me dive deeper and deeper into a complicated position is what I enjoy most right now. I feel like it really helps my chess "deepest" thinking, rather than simply testing me on how fast I can find a mate-in-2 or other simple tactic, which also has it's merits, but it doesn't interest me as much at the moment as tearing apart the most difficult problems. So, to use drahacikfm's example of a how a mate-in-4 also contains a mate-in-3, a mate-in-2, etc, will we split up these sub-problems and put each sub-problem into the problem set as well? It would be interesting to see what the problem rating of these "sub"-problems would be in relation to the parent problem. I was kinda thinking along these lines tonight when I'd finished a 4 move problem. I always go back over the problem to see what I was thinking at each stage and how I might have thought differently to more easily find the answer. I thought about how each of the 4 different moves all posed unique obstacles. With this new idea, I might get an even deeper understanding at each step of the problem. Although, as cool as this idea sounds, I don't envy the task of the programmer charged with implementing the code. lol. richard's to-do list must be a mile long by now. 
|
|
|
|
|
Logged
|
|
|
|
|
drahacikfm
|
 |
« Reply #9 on: July 26, 2008, 01:41:46 pm » |
|
I think Richard is planning to display the move list of the problem after it is solved. And he said he has all the information stored now to calculate a rating for each position in the problem based on the standard and blitz ratings of all the people who have attempted the problem so far.
So a hybrid solution to revenant's suggestions is to keep the calculation of user ratings the way they are now, based on whether the user got the whole problem correct or not. But in the move, list, display ratings for each move in the solution. Basically it would tell you how difficult that move is.
An example display, for a problem where the user is Black, could be:
Stand. Blitz Attempts/Success 1. ... Nb5 2130 2420 52 / 18 2. axb5 Bxb5+ 1580 1860 18 / 16 3. Bd7 Qa4 1930 2110 16 / 9
whole problem: 2210 2530 52 / 9
The attempts and successes for each position is also interesting to include. I guess the attempts at the second move would be equal to the successes at the first move, etc.
|
|
|
|
« Last Edit: July 26, 2008, 01:45:16 pm by drahacikfm »
|
Logged
|
FIDE Master Drahacik
|
|
|
|