Chess Tempo

Username:
Password:
/ Register

User Details

Username:
Blitz Rating:
Standard Rating:
Logout
September 06, 2008, 02:11:39 am *
Welcome, Guest. Please login or register.
News: SMF - Just Installed!
 
Pages: [1]
Print
Author Topic: Problem 38375  (Read 274 times)
tomohawk
Newbie
*
Posts: 10


View Profile
« on: June 23, 2008, 03:48:13 am »

I believe that this problem has at least two solutions. My solution:

1.g5 Ke6 (thus far matching the computer's solution) 2.Nh6 Bf7 (forced or the B is lost) 3.Ra6+ Kd5 (3...Ke7 4.Ra7+ wins the B) 4.Ra5+ Ke6 5.Re5+! and 6.Nxf7 wins the B anyway.
Logged
richard
Administrator
Hero Member
*****
Posts: 766



View Profile
« Reply #1 on: June 23, 2008, 05:03:45 am »

Nh6 is certainly more than adequate here.  Toga gave this line +2.72 which was unfortunately 0.03 away from being deemed an alternative.  Toga does upgrade this to over +3, but not for several minutes, which is longer than I look at each position when doing ambiguity checking.  I'll definitely be lowering the threshold, perhaps as low as +2.00 on the next verification run.

Incidentally after 2. Nh6 toga prefers black to play 2...Rxh6, probably as it has seen the line your refer to and prefers to get some compensation.

As a long time CTS user, how often do you find winning alternatives there compared to the latest problem set here? I notice looking at the crafty analysis they show, that CTS seems to have no winning material cutoff threshold at all, sometimes allowing positions where the second best move is >+4,  but perhaps they have manually removed a lot of these from the set so the impact is not noticeable in day to day play? Or maybe they have drifted up to very high rated problems which hardly anyone sees due to the very strict time controls making it hard to reach higher rating levels?

Regards,
Richard.

Logged
tomohawk
Newbie
*
Posts: 10


View Profile
« Reply #2 on: June 23, 2008, 11:44:26 am »

My experience with respect to duals in the CTS server is that at most 2% of the problems have duals. There doesn't seem to be a cutoff (over the last few days, for example, I noticed there is one problem where the best move is +6.5 while the second best is +4.8, according to Crafty, while another problem is mate in two, but there is a forced mate in three that is marked as wrong), but I have noticed that occasionally problems get "fixed". And yes, some of the higher-rated problems have duals and that undoubtedly is why (in part) their ratings are so high.

I personally think the cutoff for problems should be one of either:

1) +3 or greater. If some solution leads to +3 or more, that solution should be good enough to win pretty much 100% of the time as a practical matter regardless of whether there are alternatives that are +10 or forced mate or whatever.

2) at least +1.5 more than the alternatives. In other words in order for the right answer to be clearly right it should be clearly better than the alternatives. If the best answer is +.7 and the second best answer is -.4, I consider that too little to be relevant - I don't believe that for most people these sorts of things are why they win or lose chess games. Plus there is no guarantee that the computer's positional evaluations are that accurate.

Incidentally, I was thinking about 2...Rxh6 last night just as I was falling asleep and that does seem much tougher than the line I proposed. Clearly 2.Rd8 is a much more aesthetic and cleaner solution. I cannot offhand think of a way of eliminating the dual solution, which is too bad as the position looks almost like a study!

Just my two cents.
Logged
richard
Administrator
Hero Member
*****
Posts: 766



View Profile
« Reply #3 on: June 23, 2008, 12:41:23 pm »

My experience with respect to duals in the CTS server is that at most 2% of the problems have duals. There doesn't seem to be a cutoff (over the last few days, for example, I noticed there is one problem where the best move is +6.5 while the second best is +4.8, according to Crafty, while another problem is mate in two, but there is a forced mate in three that is marked as wrong), but I have noticed that occasionally problems get "fixed". And yes, some of the higher-rated problems have duals and that undoubtedly is why (in part) their ratings are so high.

Thanks very much for the feedback, such information is very helpful in terms of future problem set improvements.

2% fits what I hear from others, that the CTS problem set is quite clean in terms of duals, although I still wonder at how they achieve that with no cutoffs.  When I had less stringent alternative checking (I've always had alternative checking but due to a mistake in my algorithm way too many problems were escaping the alternative check), almost 50% of the problems presented to very highly rated players were alternatives. Judging by what you and others have said, after the recent update my guess is that alternatives are probably now around 4% here. One mitigating factor is that at least none of them should be of the "I played a move that was +5 but got penalized for not seeing a +10" move.  All the positions with bad alternative moves I'm currently aware of are of the more subtle kind where the advantage is between +2 and +2.75 and ends up in a won endgame (sometimes slightly larger if you run the engine for longer - because I have to analyse every move in the move sequence I can't afford to look too long at each position ,in practice I don't see the best moves change that often after 1-2 mins).

It also worth mentioning that there are a couple of situations where alternatives are deliberately ignored, for example if there is a mate in 1 on offer and the user chooses a non-mate material winning response they will still be marked wrong. This is a stylistic thing and could be changed, but I feel that missing a mate in 1 probably justifies more than a gentle, "Good move, please keep looking" response.

I hope there are not so many of these more subtle alternatives left that the site is still too frustrating for you to receive any enjoyment/benefit from.  I think one area that CT has an edge over CTS is that CT more often forces users to prove the tactical point.  CT still has a number of positions where the user can play the first move in a sequence and get marked correct without understanding where things were going but I feel this is considerably less common here than CTS. I think another advantage is that higher rated users have the opportunity to be provided more challenging problems in standard mode. I suspect (but have no hard data) that CT overall has a larger set of very hard problems than CTS, although I'd still like to have a lot more.

I personally think the cutoff for problems should be one of either:

1) +3 or greater. If some solution leads to +3 or more, that solution should be good enough to win pretty much 100% of the time as a practical matter regardless of whether there are alternatives that are +10 or forced mate or whatever.

That is essentially what is in place now, anything +2.75 or greater is deemed an alternative. This may either lead to the problem being rejected or for the alternative to be recorded as such depending on how many other alternatives were also found.

2) at least +1.5 more than the alternatives. In other words in order for the right answer to be clearly right it should be clearly better than the alternatives. If the best answer is +.7 and the second best answer is -.4, I consider that too little to be relevant - I don't believe that for most people these sorts of things are why they win or lose chess games. Plus there is no guarantee that the computer's positional evaluations are that accurate.

This is an area that I'm going address in the next problem set verification run. Previously I had required a 1.8+ difference between best and second best move but after I started allowing alternative moves in the UI (instead of rejecting them when found [which was not often enough in the past]), this code got bypassed, so in the current set there can sometimes be small differences between the best and second best moves.  This is not as bad as it sounds as the second best move in these situations will almost always be an allowed alternative so users will not get marked wrong for it.  I'm currently generating new problems with this threshold set to just under +1, this appears to be enough to usually justify why the best move was best, and the alternatives that are close to this are usually winning enough to be marked as allowed alternatives. There is also another threshold which defines how much the "best move" must improve the position to be allowed, so for example situations where +0.7 would be an allowed tactic would only arise if the player was in a losing situation before the tactical opportunity arose.

Incidentally, I was thinking about 2...Rxh6 last night just as I was falling asleep and that does seem much tougher than the line I proposed. Clearly 2.Rd8 is a much more aesthetic and cleaner solution. I cannot offhand think of a way of eliminating the dual solution, which is too bad as the position looks almost like a study!

There are many "almost beautiful" positions that subsequent generators with stricter requirements have thrown out. It is quite sad really, but it is important that the generator can produce reasonable quality without too much human intervention, so I consider these throwaways to be reasonable collateral damage in the search for a decent quality problem set :-)

Just my two cents.

Thanks, worth much more to me than two cents :-)

Again, I hope you are able to enjoy the site in its current form and hopefully look forward to further improvements in the future.

Regards,
Richard.
Logged
tomohawk
Newbie
*
Posts: 10


View Profile
« Reply #4 on: June 23, 2008, 01:38:21 pm »

"I think one area that CT has an edge over CTS is that CT more often forces users to prove the tactical point.  CT still has a number of positions where the user can play the first move in a sequence and get marked correct without understanding where things were going but I feel this is considerably less common here than CTS."

Absolutely so. That is a really bad aspect of CTS. The very nature of these sorts of sites or similar books is that "there is always something there". When you see only one possible move that could be "something" it is hard not to just play the move and analyse the consequences later. In a game, obviously, you cannot know that there is something there, and if you play some sacrifice and then analyse it, you are very likely to lose...

So I think at the very least players should have to prove they see the whole solution.

On a related note, here is my wish list:

1) Add many problems where no forcing moves are marked correct (similar to the none of the above choice in multiple choice exams). I know many complain on CTS when the answer to some problem is a simple recapture. However, in sharp positions in a real game you cannot just know that there is something there. Part of becoming a stronger player is to better recognize when there is NO solution at all! It makes it more like a real game situation. I suspect, however, that I am a minority in this.

2) That in cases where there is more than one interesting defense, that the problem generator picks amongst those defenses at random and forces you to solve that variation. It helps to fight the problem of players memorizing (even unintentionally) the one defense that is chosen.

3) If possible, find positions that are very similar to each other, but with totally different solutions. CTS does a pretty good job of this. You see a position that looks very similar to another position, but this small change totally changes the answer. It reminds the player that small differences matter in every chess position.

Logged
richard
Administrator
Hero Member
*****
Posts: 766



View Profile
« Reply #5 on: June 23, 2008, 02:16:55 pm »

Thanks tomohawk,

1) While CT tends not to have straight recaptures it does have quite a number of hung piece problems. Whilst not having exactly the same features of a recapture they do force the user to question "is this all there is?" which I agree is a good thing as long as the user isn't getting problems like this every time.  Unfortunately at your rating level you are unlikely to see too many of these in standard mode as in standard most users end up choosing the right move in these simple one move positions so the hung piece problems get quite low ratings.  You'd probably see a few more in blitz.  I'd be cautious with actually introducing straight recaptures (they are deliberately prevented at the moment) as users barely tolerate hung piece positions and I think I might annoy more users than I'd please by allowing recaptures. One idea would be to make them optional but this is probably not viable for rated problems as it is important that users are all treated equally for ratings to be fair.

2) I hadn't thought of this idea before , but really like it. It would take a bit of work on the generator but I think the benefits would be worth it.  There would be a minor issue in transpositions making some problems look too similar , I could probably detect these and even if I didn't there are probably enough problems in the set for this type of repetition not to be too annoying. I've added this to the generator todo list.

3) Interesting, I had assumed that all CTS positions came from real games, but that sounds like they may take real positions and randomly add or remove pieces and re-search for a tactic. Currently I like the fact that all CT positions start from positions reached in real games (although the generator often finds tactics that were missed over the board which can lead to some really nice novel positions [as well as some hard to fathom computerish responses]).  However I can see these similar positions do have some training value.  I'll have a bit more of a think about this one.

Thanks again,
Richard.
Logged
Pages: [1]
Print
Jump to: