May 03, 2024, 02:44:50 PM *
Welcome, Guest. Please login or register.
News:
Advanced search
Pages: [1] 2
Print
Author Topic: The Hardest Problem on ChessTempo  (Read 42731 times)
interlist
Hero Member
*****
Posts: 752


« on: Oct 08, 2012, 01:19:03 AM »

----------------------------------------------------------
Spoiler Alert!!
----------------------------------------------------------

OK, I've got your attention, please don't read this post if you intent to be #1 in either CT Blitz or Standard.

       

Still here?  OK, good, welcome to table of mere mortals - "Gooble, gooble, we accept you!"

Again, if you intend to be the very best you can be, perhaps you shouldn't refer to this problem or read the rest of the post...

But if you insist - here it is, the problem with the lowest percentage successful completion - 99374.   I imagine this is the hardest problem on CT, but please see below for more on that.

Me not look?  Well, are you kidding?  I immediately looked at it, given that it will take me 10 years at least to reach that level!

I didn't bother to solve it (probably knowing I couldn't!), but I did play through the six moves of the main line.  Indeed, it is a tough, tough problem with lots of seemingly obscure defensive/offensive maneuvering (for example, and quite topical given my other posts, the first moves are White attempting to set up a devastating battery on the 7th rank, and Black countering with queen moves - then White does a nice transposition and sets up a battery on the d-file which converts to the win).

So, I have some thoughts to post about ratings and the saturation effects having an unsolvable problem would have on CT.  And although this might be a good opportunity for that discussion, it would be premature, for me at least ... I haven't finished thinking it all out.

I have other more pressing questions to begin, and will get to them soon (sorry, I'm a little tired and incoherent).

First, the baseline statistics for this problem:


B  2706  0:30  4  0%
S  2422  6:09 19  5%


So, only one person (who?) has solved this problem in standard, and none in blitz.  

Here's my question to start - why does this problem have any valid time for blitz average time?  

If I remember correctly, Richard has said previously that the average times for problem are calculated only for those that successfully solved them.  Since nobody has solved this problem, how can it have any average solve time?  

Next, problem 99374 is rated a measly 2422 in standard, compared with problem 95371 which has a rating of 2650.5 (again, no one has solved this latter problem in blitz, but two people have in blitz).  Here's the highest-rated problem's baseline:


B 2709  0:30   7   0%
S 2650 6:46  18  11%


Note that the blitz average time is again 0:30, though nobody has solved it yet.

My next question is the following - does it seem reasonable that a problem be rated 200 points lower than another when just one person solved it, ie when the statistics are so meager?  Especially when the other, higer-rated problem has been solved twice as often? Shouldn't there be more viscosity in the rating system for problems with such low solve rates?  And shouldn't the system discount flukes, which could always occur with a statistical sample of just one?

Finally then, this leads me to my last question (for the moment) -

Which problem has the better claim to the title of hardest problem on CT - the problem with the highest rating or the one with the lowest percentage of completions?

Ha - chew on that!  Not really such an easy question - but, as Mulder says "The Truth is Out There"

« Last Edit: Oct 08, 2012, 02:32:27 PM by interlist » Logged

--interlist (was here)
3253
Hero Member
*****
Posts: 1291



« Reply #1 on: Oct 08, 2012, 04:37:24 AM »

the problem with the lowest percentage successful completion

Where do you find a list of percentage success rates?

Quote
baseline statistics

B  2706  0:30  4  0%
S  2422  6:09 19  5%


B 2709  0:30   7   0%
S 2650 6:46  18  11%

How do you find the baseline statistics?
Logged
richard
Administrator
Hero Member
*****
Posts: 19246



« Reply #2 on: Oct 08, 2012, 09:01:38 AM »

Blitz problems are given a default starting average time of 30 seconds, although there does seem to be one anomaly with 0% correct but 39 second average, not sure what went wrong there, I might have reset the stats on that one at some stage, but not reset the average.  The 30 seconds is replaced by the actual solve time of the first correct attempt.

It is hard to compare difficulty between blitz and standard. There are several reasons for that, but one is that more people cheat in standard, so success rates on the very hardest problems are probably skewed by engine usage. When engine users are detected, they are stopped from impacting more problem stats, but their past attempt impacts are not erased.

Problems with low attempts are dealt with different for two reasons
1) High RD reduces the impact on solvers.
2) There is a provisional system that operates on top of the high RD of new problems, which reduces rating change impact on users for the first 4 attempts (giving 20%, 40%, 60%, 80% of the full rating point adjustment to the user).

Richard.
« Last Edit: Oct 08, 2012, 09:04:42 AM by richard » Logged
interlist
Hero Member
*****
Posts: 752


« Reply #3 on: Oct 08, 2012, 11:24:24 AM »

Where do you find a list of percentage success rates?
[...]
How do you find the baseline statistics?

I find the baseline statistics just from the problem's "homepage", listed on the left as usual.  I copy the info over by hand.  I sometimes do this in problem comments for new problems, just to get a snapshot since ratings, etc. can change as statistics increase. 

The question I have here, is how do I find the RD for a problem?

As far as getting the percentage success rates I use Problems -> Problem Search. 

Richard provides a list of all the CT problems for standard/blitz/theory/practice in a table format, and you can sort by order (both ascending/descending) by clicking once or twice on any of the headers.

For instance, problem 79502 requires the greatest number of moves for its solution, 17.  Or the lowest rated problem, 52199 (which is curious, because it has ~90% success rate, yet the solution is the only legal move possible, KxQ - remember Richard doesn't fail illegal moves [currently, reading the comments I guess he did once-upon-a-time])

Actually, I was hoping to be able to do a search for a particular comment in a problem.  Is that possible Richard?

(I wanted to find problems that were examples of Alekhine's Gun/Cannon for some odd reason).


PS- I couldn't find the game the lowest-rated problem, 52199 came from, despite looking for both of these FEN's:

(Original CT FEN)   5b1r/4rk1p/p3ppN1/8/Pp1PQ3/1Pp5/2P2P1q/4R1RK w - - 0 1
(Before pre-move)   5b1r/2q1rk1p/p3ppN1/8/Pp1PQ3/1Pp5/2P2P1P/4R1RK w - - 0 1
Logged

--interlist (was here)
interlist
Hero Member
*****
Posts: 752


« Reply #4 on: Oct 08, 2012, 11:38:17 AM »

Blitz problems are given a default starting average time of 30 seconds, although there does seem to be one anomaly with 0% correct but 39 second average, not sure what went wrong there, I might have reset the stats on that one at some stage, but not reset the average.  The 30 seconds is replaced by the actual solve time of the first correct attempt.

It is hard to compare difficulty between blitz and standard. [...]

Thanks for the reply Richard.  I have a series of questions I'd like to ask, and some more discussion here (including a subtle correction to an older post) - so with some patience I hope to become a little more coherent as we go on.  

Something that struck me immediately with those two "hardest" problems is how did the blitz rating get to be higher than the current highest blitz rating on the leader board (~2700 vs 2421)?  

In fact, if I look at blitz problems and sort by decreasing rating I find a huge number of extremely high rated problems. I don't understand how their ratings can get so high.... for instance, problem 60418 has a blitz rating 3388 (9 attempts / 0%).  How can that be?

Also, if I sort blitz by decreasing average solve time it appears that Richard has put a hard limit of 300 seconds into the average - there is some sort of clipping effect.  What is going on there?

« Last Edit: Oct 08, 2012, 11:42:56 AM by interlist » Logged

--interlist (was here)
aoxomoxoa
Hero Member
*****
Posts: 933


« Reply #5 on: Oct 08, 2012, 01:17:31 PM »


In fact, if I look at blitz problems and sort by decreasing rating I find a huge number of extremely high rated problems. I don't understand how their ratings can get so high.... for instance, problem 60418 has a blitz rating 3388 (9 attempts / 0%).  How can that be?


Everytime a tactician did try to solve  60418 he/she failed. So the rating of this problem did get higher and higher. With a high RD the speed of raising was high(er) too. And as beeing a high rated problem it was served mostly to high rated tacticians.
« Last Edit: Oct 08, 2012, 01:19:47 PM by aoxomoxoa » Logged
richard
Administrator
Hero Member
*****
Posts: 19246



« Reply #6 on: Oct 08, 2012, 01:29:03 PM »

I think Aoxomoxoa accurately answered your first question.


Also, if I sort blitz by decreasing average solve time it appears that Richard has put a hard limit of 300 seconds into the average - there is some sort of clipping effect.  What is going on there?

Yes, blitz time is clipped at 300. Some kind of clipping is required, otherwise someone that spends 3 days to solve the problem (for an extreme example), due to leaving the browser open would massively skew the average. 300 was chosen as something that would be above the true average for most problems, although in theory this is probably a little short for the very hardest problems, even when solved by the very best solvers.

Regards,
Richard.
Logged
interlist
Hero Member
*****
Posts: 752


« Reply #7 on: Oct 08, 2012, 02:09:14 PM »


In fact, if I look at blitz problems and sort by decreasing rating I find a huge number of extremely high rated problems. I don't understand how their ratings can get so high.... for instance, problem 60418 has a blitz rating 3388 (9 attempts / 0%).  How can that be?


Everytime a tactician did try to solve  60418 he/she failed. So the rating of this problem did get higher and higher. With a high RD the speed of raising was high(er) too. And as beeing a high rated problem it was served mostly to high rated tacticians.

@aox

The highest rated active blitz player is rated 2421.  Intuitively, it seems to me that whatever the exact formula used, it shouldn't be generous adding points beyond that of its highest rated opponent.  Points exceeding actual practice limits should be exponentially damped (or something along those lines).

We are talking about a rating of 3300, and many many other CT blitz problems beyond 2400 (eg there are ~30 at 2800+).  Doesn't this strike you as, I want to say wrong, but perhaps that's too strong.  Nah, it's wrong.  Imagine a player, however good, getting nine wins over GM's and then having a rating of 3300.  Do you think it possible?  It would certainly cause an uproar in chess circles, at least I would hope it would.

The point - rating points exceeding the highest rating of your opponents should be modestly awarded.

Especially after the provisional rating runs out (which I guess are the first four attempts here on CT).

If I go to a Swiss Round Tournament, and all the opponents are 1800 or less, I don't think my rating should exceed 2400 (and that's already generous) even if I defeat all 50 opponents.  If I only defeat 10 (like our example problem) then we might have look at a win/loss differential of 1/10, which is 400 points, generously awarded.  So 2200 at most, and even that would be pushing it.

Recall the rating differential does feed into an exponential-like function for likelihoods.  And going from 2400 to 3300 is an astronomical jump.

Maybe the problems are rising up the ratings too fast because the provisional rating is too generous to them. Then, as in control theory, we have an overshoot problem.  This is also unfair to the upper tier of CT players, because the same problem is changing rating so rapidly that it takes a different amount of points from two players, when in all fairness, it should take the same.



Yes, blitz time is clipped at 300. Some kind of clipping is required, otherwise someone that spends 3 days to solve the problem (for an extreme example), due to leaving the browser open would massively skew the average. 300 was chosen as something that would be above the true average for most problems, although in theory this is probably a little short for the very hardest problems, even when solved by the very best solvers.

@Richard  

I agree that the clipping limit should be increased, just looking at the table listing of problems.  But it's not too severe, as it appears that only 15 blitz problems are affected.  I wonder if you were influenced by the normal blitz game time control of 5 minutes when you picked the clipping value?  Ha!

By the way, is it possible to search the problem comments?  I'm still looking for some other good examples of Alekhine's Cannon.
« Last Edit: Oct 08, 2012, 02:36:28 PM by interlist » Logged

--interlist (was here)
interlist
Hero Member
*****
Posts: 752


« Reply #8 on: Oct 08, 2012, 03:18:30 PM »

==========================
The Hardest Blitz Problem
==========================

Well, if it's not the hardest, it's in the running 98038, which has a 12-move main line - Holy smokes!  

Here's the baseline:

B 2889   0:30   3   0%
S 2443  20:58  26  19%


Given the rather low attempts, and despite its high rating, I think it fair to say this is a fairly new problem.  If it is new, than the current rating is way out of whack with the current pool of blitz players since it is over 400 points higher than the max player rating. And for a provisional rating based on only 3 attempts?  

If it's not a new problem, then the provisional rating is out of whack, because it's so high that the problem is not being served out enough.

Either way, the argument can be made that the provisional rating is out of whack.

And it must be the provisional rating used here, since 3 attempts is too low for a normal rating, correct?

I wonder about its history - what rating/RD it was introduced at, and what were the three opponent's rating/RD's?

(By the way, given the 5-min blitz limit current in use, and just glancing at the ML involved in the solution - I think this problem may remain undefeated in blitz for quite some time to come).
« Last Edit: Oct 08, 2012, 03:22:43 PM by interlist » Logged

--interlist (was here)
3253
Hero Member
*****
Posts: 1291



« Reply #9 on: Oct 08, 2012, 08:06:31 PM »

Imagine a player, however good, getting nine wins over GM's and then having a rating of 3300.

9 wins, 2500 average rating = 2635

9 wins, 2700 average rating = 2835

http://ratings.fide.com/calculator_rp.phtml


Quote
If I go to a Swiss Round Tournament, and all the opponents are 1800 or less, I don't think my rating should exceed 2400 (and that's already generous) even if I defeat all 50 opponents.  If I only defeat 10 (like our example problem) then we might have look at a win/loss differential of 1/10, which is 400 points, generously awarded.  So 2200 at most, and even that would be pushing it.

10 wins, 1800 average rating = 1950

40 wins, 1800 average rating = 2400 ~~ 15 points for each win

http://ratings.fide.com/calculator_rp.phtml
« Last Edit: Oct 09, 2012, 01:14:41 AM by 3253 » Logged
aoxomoxoa
Hero Member
*****
Posts: 933


« Reply #10 on: Oct 08, 2012, 08:37:25 PM »

@aox

The highest rated active blitz player is rated 2421.  Intuitively, it seems to me that whatever the exact formula used, it shouldn't be generous adding points beyond that of its highest rated opponent.  Points exceeding actual practice limits should be exponentially damped (or something along those lines).

There is already a lot of mathematics done on rating-systems. You may read http://en.wikipedia.org/wiki/Elo_rating_system as a beginning and then maybe this http://en.wikipedia.org/wiki/Glicko_rating_system for some details on RD's.
Empirical Rabbit is working on a better rating system in his blog.

Adding points beyond the highest opponent is necessary if the score is (way) above 50%
As you can see here: http://en.wikipedia.org/wiki/Elo_rating_system#Performance_rating a performance (score) of 99% is a ratingdifference of 677. A score of 100% ( like this one ) would be "an infinite rating-difference". But the calculation of the rating at CT is iteratively so, thanks to K and RD, the rating of this problem "happend".
Logged
richard
Administrator
Hero Member
*****
Posts: 19246



« Reply #11 on: Oct 08, 2012, 09:54:14 PM »

The highest rated active blitz player is rated 2421.  Intuitively, it seems to me that whatever the exact formula used, it shouldn't be generous adding points beyond that of its highest rated opponent.  Points exceeding actual practice limits should be exponentially damped (or something along those lines).

You are assuming this problem gained its rating when the highest rated active blitz player was 2421. That is not the case. There was an issue with inflation in blitz that was corrected a few years ago. At that time there was a number of players rated above 2700, and 5 achieved a max rating of over 3000 (all of which subsequently dropped down to below 2900 after the inflation correction was made). It is hard for that problem to correct its rating now, as it has a very low chance of serving to the current blitz population.

Quote
The point - rating points exceeding the highest rating of your opponents should be modestly awarded.

They are.

Quote
By the way, is it possible to search the problem comments?  I'm still looking for some other good examples of Alekhine's Cannon.

Yes it is, via the problem search (you can do things like create custom sets where "Boden" etc is mentioned in the comment to create motif based sets for motifs that don't have tags , but do have problem comments mentioning the motif). Unfortunately for you, this is a gold membership feature.

Regards,
Richard.
Logged
interlist
Hero Member
*****
Posts: 752


« Reply #12 on: Oct 09, 2012, 01:08:12 AM »

Imagine a player, however good, getting nine wins over GM's and then having a rating of 3300.
9 wins, 2500 average rating = 2635
9 wins, 2700 average rating = 2835
http://ratings.fide.com/calculator_rp.phtml
Quote
If I go to a Swiss Round Tournament, and all the opponents are 1800 or less, I don't think my rating should exceed 2400 (and that's already generous) even if I defeat all 50 opponents.  If I only defeat 10 (like our example problem) then we might have look at a win/loss differential of 1/10, which is 400 points, generously awarded.  So 2200 at most, and even that would be pushing it.
10 wins, 1800 average rating = 1950
40 wins, 1800 average rating = 2400 ~~ 150 points for each win   <== [ed. Did you mean 15?]
http://ratings.fide.com/calculator_rp.phtml

Thanks 3253 for providing that most informative link.  It's always nice to have real data.

I wasn't too far off, with my estimates, I'm glad to report. The FIDE formula provisional rating seems to based on a 15 point increment/win added to the average rating of your opponents, for a maximum of 40 games.  This seems quite reasonable, and limits your potential gain to 600 points over your opponents, provided you win 40 games in a row!

I will point out that the FIDE estimate only works for average ratings of your opponents between 1200 and 2900. So it would be possible to reach a 3500 rating under these provisions (that is, if you could beat Kasparov, in his prime, 40 times in a row!). 

It would also be possible to reach GM-like ratings by only playing 1800 players, provided you win enough in a row. This is probably why the GM norm requires one to beat a certain number of titled players as opponents in order to for a candidate to even be considered. (Similarly, FIDE makes requirements on unrated players to play rated players in order to qualify for a rating [I think we talked about these in an earlier thread]). 

So, under the FIDE regime a 3300 provisional rating with 9 attempts could be reached if the average opponent had a rating of 3300 - 9*15 = 3165.  Not likely, even under the older rating system of Richards.  So, it appears Richard isn't using the provisional rating system of FIDE.

Another scheme mentioned in the references aox was kind enough to provide, and one I'm familiar with from days of old, is the so called

"Algorithm of 400", see wikipedia entry ELO rating system: Performance rating:

Quote
According to this algorithm, performance rating for an event is calculated by taking the following three quantities:
  • the rating of each player beaten and adding 400
  • the rating of each player lost to and subtracting 400
  • the rating of each player drawn
and then summing these figures and dividing by the number of games played.

So, if I'm not mistaken, this limits the provisional rating to 400 points over the average of opponents in the case where every game is won. The advantage of this system is it allows very rapid adjustments of the rating - which can be useful if your running a tournament and need to do pairings of a player of unknown capability. The disadvantage of this system is that it rapidly adjusts the rating - so it is easy to undershoot/overshoot the true rating.

The ramifications of the latter case could easily fill another post, but not this one.

OK thanks for reading.
Logged

--interlist (was here)
3253
Hero Member
*****
Posts: 1291



« Reply #13 on: Oct 09, 2012, 01:14:10 AM »

Yes, 150 for 10 wins or 15 for 1 win.
Logged
interlist
Hero Member
*****
Posts: 752


« Reply #14 on: Oct 09, 2012, 02:35:00 AM »

You are assuming this problem gained its rating when the highest rated active blitz player was 2421. That is not the case. There was an issue with inflation in blitz that was corrected a few years ago. At that time there was a number of players rated above 2700, and 5 achieved a max rating of over 3000 (all of which subsequently dropped down to below 2900 after the inflation correction was made). It is hard for that problem to correct its rating now, as it has a very low chance of serving to the current blitz population.

Well, I have to make a few assumptions, if only to get the ball rolling as they say.  Of course I'm a relative newbie and don't know the full complete history of the site.  I wish I did, you must have had some very interesting times getting this show up and running.  And a lot of fun too I'll bet.  

So, I didn't know about this inflation problem.  However, I did anticipate that it might be the case, as the following quote from a later post of mine indicates:


B 2889   0:30   3   0%
S 2443  20:58  26  19%

Given the rather low attempts, and despite its high rating, I think it fair to say this is a fairly new problem.  If it is new, than the current rating is way out of whack with the current pool of blitz players since it is over 400 points higher than the max player rating. And for a provisional rating based on only 3 attempts?  

If it's not a new problem, then the provisional rating is out of whack, because it's so high that the problem is not being served out enough.

And that's an issue. As Munich has pointed out, there is a dearth of world-class problems on CT. And these highest-rated problems appear to be exactly that - the most challenging tactical problems in all of chess. So, there is the potential from this discussion to savage a substantial number of pearls from being lost (if only temporarily). At least, that is how I look at it.

Consider then, there might exist a category of problems which exhibit a hysteresis effect from an older, inflationary universe (oops - I mean rating system). A consequence of this is that there are 30 problems which have ratings above 2700 and are not being properly served to the current pool (including some problems which are rarely, if ever served anymore). There is an additional 15 problems which might be mis-rated due to a clipping effect.

So, all told, a not negligible 50 problems could be reintroduced into the system to be properly positioned for use under the current conditions of play.

Well, there's that viewpoint, and then I could just say - well that's interesting, I didn't know it worked like that, nor did I realize CT had such an interesting history.  

Quote
Quote
The point - rating points exceeding the highest rating of your opponents should be modestly awarded.
They are.

I guess the question would be, modest though the problems are rewarded points upon introduction now, is there sufficient precautions for guaranteeing fairness to the very top tier players vying to be #1 on CT.  Here I mean, suppose you introduce a problem so hard that all who have challenged it have yet to succeed in their quest - like Arthur and the sword Excaliber (we prefer the legend where Arthur pulls the sword from the rock of his own merit, and isn't handed it by the Lady of the Lake, similar considerations precluded the use of the Gordian knot mythology).

This problem, Excaliber, will defeat all mortal comers - hence all #1 contenders are bound to lose points to it (until Carlsen joins CT that is). The question is, do they all do so fairly, or is the ordering of encounter significant?  This is a quantitative question - despite the attempt of RD to correct for it, a provisional problem might have a rating adjusting so fast that the permutations of ordering can be significant.

 

What is the epsilon of this effect?  It depends on two factors, how much you lose against Excaliber, and how much you gain back/problem from a typical problem.  So, cmuroya17 and morphy1984 will be more effected than andrzejsadowsk, since the former only gain 0.2/problem (or even 0.0/problem on many recent cmuroya's problems) vs. +3/problem for Andrzej.  And if Excaliber was only recently injected into the system the former pair would be more likely to encounter it when it was lower-rated, since they are in duplicate-avoidance mode.  



Quote
Quote
By the way, is it possible to search the problem comments?  I'm still looking for some other good examples of Alekhine's Cannon.
Yes it is, via the problem search (you can do things like create custom sets where "Boden" etc is mentioned in the comment to create motif based sets for motifs that don't have tags , but do have problem comments mentioning the motif). Unfortunately for you, this is a gold membership feature.

Ahh, this has been pointed out to me before - gold members have all the fun.  Me, I was born to a world of suffering and disadvantage...

But seriously, I appreciate the potential benefits of the enhanced membership.  I'm still learning about all the features as I go along.  I consider your participating in these forums very generous, Richard, as are the many many features you make accessible to all.  

Thanks as always (and not to worry, I have several good examples of Alekhine's Cannon already!).

Regards,

--interlist


A quick foreword/aside: I stumbled onto this topic, almost by accident while exploring problem searches.  I normally am happy just doing single problems fed me by the generator, and never viewed sets, or even looked at the tables.  So it was with some interest that I first looked at the list of the entire set of problems for both standard and blitz.

Of course, my nature being what it is, I was attracted to the extremes/outliers. Then again, you can also blame Alvaro, who gave me a set of themed problems to look at, one of them being problem 15. Being such a low number it got me curious about the older problems on CT.  

But the main culprit is Munich - whose quest to reach #1 on CT standard has involved much discussion of the hardest problems on CT and what's involved in obtaining/maintaining the highest rating.  The more I became interested in these discussions the more I began thinking about the most extreme problems on CT, and so the pump was primed (as they say).


« Last Edit: Oct 09, 2012, 02:58:37 AM by interlist » Logged

--interlist (was here)
Pages: [1] 2
Print
Jump to: