My experience with respect to duals in the CTS server is that at most 2% of the problems have duals. There doesn't seem to be a cutoff (over the last few days, for example, I noticed there is one problem where the best move is +6.5 while the second best is +4.8, according to Crafty, while another problem is mate in two, but there is a forced mate in three that is marked as wrong), but I have noticed that occasionally problems get "fixed". And yes, some of the higher-rated problems have duals and that undoubtedly is why (in part) their ratings are so high.
Thanks very much for the feedback, such information is very helpful in terms of future problem set improvements.
2% fits what I hear from others, that the CTS problem set is quite clean in terms of duals, although I still wonder at how they achieve that with no cutoffs. When I had less stringent alternative checking (I've always had alternative checking but due to a mistake in my algorithm way too many problems were escaping the alternative check), almost 50% of the problems presented to very highly rated players were alternatives. Judging by what you and others have said, after the recent update my guess is that alternatives are probably now around 4% here. One mitigating factor is that at least none of them should be of the "I played a move that was +5 but got penalized for not seeing a +10" move. All the positions with bad alternative moves I'm currently aware of are of the more subtle kind where the advantage is between +2 and +2.75 and ends up in a won endgame (sometimes slightly larger if you run the engine for longer - because I have to analyse every move in the move sequence I can't afford to look too long at each position ,in practice I don't see the best moves change that often after 1-2 mins).
It also worth mentioning that there are a couple of situations where alternatives are deliberately ignored, for example if there is a mate in 1 on offer and the user chooses a non-mate material winning response they will still be marked wrong. This is a stylistic thing and could be changed, but I feel that missing a mate in 1 probably justifies more than a gentle, "Good move, please keep looking" response.
I hope there are not so many of these more subtle alternatives left that the site is still too frustrating for you to receive any enjoyment/benefit from. I think one area that CT has an edge over CTS is that CT more often forces users to prove the tactical point. CT still has a number of positions where the user can play the first move in a sequence and get marked correct without understanding where things were going but I feel this is considerably less common here than CTS. I think another advantage is that higher rated users have the opportunity to be provided more challenging problems in standard mode. I suspect (but have no hard data) that CT overall has a larger set of very hard problems than CTS, although I'd still like to have a lot more.
I personally think the cutoff for problems should be one of either:
1) +3 or greater. If some solution leads to +3 or more, that solution should be good enough to win pretty much 100% of the time as a practical matter regardless of whether there are alternatives that are +10 or forced mate or whatever.
That is essentially what is in place now, anything +2.75 or greater is deemed an alternative. This may either lead to the problem being rejected or for the alternative to be recorded as such depending on how many other alternatives were also found.
2) at least +1.5 more than the alternatives. In other words in order for the right answer to be clearly right it should be clearly better than the alternatives. If the best answer is +.7 and the second best answer is -.4, I consider that too little to be relevant - I don't believe that for most people these sorts of things are why they win or lose chess games. Plus there is no guarantee that the computer's positional evaluations are that accurate.
This is an area that I'm going address in the next problem set verification run. Previously I had required a 1.8+ difference between best and second best move but after I started allowing alternative moves in the UI (instead of rejecting them when found [which was not often enough in the past]), this code got bypassed, so in the current set there can sometimes be small differences between the best and second best moves. This is not as bad as it sounds as the second best move in these situations will almost always be an allowed alternative so users will not get marked wrong for it. I'm currently generating new problems with this threshold set to just under +1, this appears to be enough to usually justify why the best move was best, and the alternatives that are close to this are usually winning enough to be marked as allowed alternatives. There is also another threshold which defines how much the "best move" must improve the position to be allowed, so for example situations where +0.7 would be an allowed tactic would only arise if the player was in a losing situation before the tactical opportunity arose.
Incidentally, I was thinking about 2...Rxh6 last night just as I was falling asleep and that does seem much tougher than the line I proposed. Clearly 2.Rd8 is a much more aesthetic and cleaner solution. I cannot offhand think of a way of eliminating the dual solution, which is too bad as the position looks almost like a study!
There are many "almost beautiful" positions that subsequent generators with stricter requirements have thrown out. It is quite sad really, but it is important that the generator can produce reasonable quality without too much human intervention, so I consider these throwaways to be reasonable collateral damage in the search for a decent quality problem set :-)
Just my two cents.
Thanks, worth much more to me than two cents :-)
Again, I hope you are able to enjoy the site in its current form and hopefully look forward to further improvements in the future.
Regards,
Richard.