May 05, 2024, 07:26:48 AM *
Welcome, Guest. Please login or register.
News:
Advanced search
Pages: [1]
Print
Author Topic: Long term tactics solving improvement - Is it possible?  (Read 11864 times)
richard
Administrator
Hero Member
*****
Posts: 19247



« on: Dec 14, 2014, 11:07:48 PM »

A topic that comes up fairly often on the forum (and external blogs) is how much it is possible to improve over an extended period of problem solving. Some believe that it is very uncommon to improve after the first 4000-5000 problems. Some believe most people (or rather most adults) plateau in the 1000-4000 problem range, with some initial improvement, and none after that. In the past this was an easy impression to form, as the very high volume solvers' rating graphs often showed long plateaus or even declines. However it was clear that duplicate reward reduction was a big factor in these cases. The fact that duplicate reward reduction started to become a serious issue after around the 4000 problem mark when duplicates become much more common is certainly no co-incidence. Essentially, reward reduction was masking  improvement in many users. With some users seeing the same problem many times, and therefore receiving very little reward for correct answers, but full punishment for incorrect responses, it is no wonder some of these high volume users were having trouble increasingly their rating, especially if they had poor explicit memory of the solutions to problems they had seen before.

Up to now I'd only looked at a few isolated cases which seemed to contradict the 'all adults plateau early' opinion, but hadn't had time to do a more in depth analysis.  Recently I had time for a somewhat deeper analysis. I looked at all blitz solvers who had done more than 30,000 blitz problems (and less than 100,000 , as the data processing for those small number of users with 100K+ attempts was taking too long to complete). Blitz was chosen, as it does a good job of factoring out the issue that standard rating can be improved by taking longer without neccessarily increasing your skill level. For  similar reasons, I decided to focus only on problems that were either incorrect or correct within the time limit for gaining points, so problems that were correct but lost rating points, or took more than 5 minutes to solve were not analysed. The 5 minute cut-off was used as that is the internal cut-off for solve times, and all longer times are truncated in blitz, so avoiding these avoids having to account for attempts where the user is essentially thinking 'at their leasure'. From this set, all duplicate problem attempts were removed, leaving only a set of problems the user had seen for the first time on Chesstempo (i.e if they had been seen first in modes other than blitz (custom sets, standard etc) then they were excluded, even if they were seen only once in actual blitz mode).

For each user their set of matching problems was then sorted by date solved - from earliest to latest - and I then looked at the performance rating across several intervals. First, the initial 500 of these non-duplicate attempts was given a performance rating, then the attempts from 4000-4500, then the attempts from 10000-10500 , and finally the last 200 attemtpts. This gives 4 performance ratings, an initial level, a level after the first 4000 and 10000 non-duplicate attempts, and then the current performance level as dictated by the last 200 non-duplicate attempts.

Note that because of the high volume solving of this set of users, the 10,000-10500 non-duplicate attempts were far more than 10,000 overall attempts into each user's solving history due to the number of dupicates users are getting at that stage.  For example some people may not reach 10,000 non-duplicates until after solving over 30,000 problems,  depending on how many problems are available in their rating range. So to get to 10,000 non-duplicates, most users have solved a very significant number of problems, and are well beyond the 'plateau at 4000' range.

The performance rating formula used is the 'algorithm of 400' described here:
http://en.wikipedia.org/wiki/Elo_rating_system#Performance_rating
This formula was used rather than trying to calculate glicko across all problems due to convenience reasons - I already had an SQL (database query) implementation of the '400' algorithm and developing a SQL query that performed glicko within the query was more work than I wanted to do right now, and I didn't see a strong reason why it would produce significantly more meaningful results.


To be able to include data from attempts at or beyond 10,000 non-duplicates, users who had over 30000 total attempts , but under 10500 non-duplicate attempts were excluded. This reduced the available analysable users from 116 down to 77. Users who had minimum ratings below 600 were also excluded. The 600 exclusion was done to exclude users like this:
http://chesstempo.com/chess-statistics/torosentado
who deliberately got hundreds of problems wrong in a sequence in order to artificially drop their rating.

To counter for rating drift over time impacting the results, the rating at the time the problem was done was ignored in the performance rating calculations, and the current rating of the problem was used instead (and attempts on disabled problems were also ignored , as these problems don't have up to date current ratings). This means all users are being compared on the same level for each problem irrespective of when their original attempt was, and what the rating of the problem was at the time. This is especially important for these high volume solvers, as often they get served new problems very quickly before the problem has time to settle, so the usage of the current rating also avoids this issue.

Unsurpisingly, almost all of those 77 users improved from their initial 500 attempt performance rating.  93% of the users improved from their initial 500 attempt performance to their 4000-4500 performance sample. The average rating improvement during that time was 88 rating points. At this level, 4000 non-duplicates probably equates to around 5000 total duplicate+non-duplicate attempts for most people, as duplicates at that early stage are not yet a large percentage of problems served.

Now the conjecture is that by this stage it is becoming impossible for adult solvers to improve further, however the data does not seem to support that. The rate of average improvement does begin to slow, and diminishing returns are certainly starting to become a factor, but from the 4000-4500 slice to the 10000-10500 slice, 82% of solvers still improved, with the average improvement sitting at 48 performance rating points. Note that this average includes the decline of the 18%  non-improvers who were equal or worse than their 4000-4500 slice. Jumps over 200 were seen in this range, and jumps over 100 were not uncommon (sorry no std deviation or median data at this point).

The final comparison was between the final 200 non-duplicate attempts for each user, and their 4000-4500 level. Here 87% of solvers had improved in their most recent problem performance over their 4000-4500 performance, with an average improvement of 84 points (which is on top of the 88 rating point improvement the average user had already made  after their first 500 attempts). This indicates that not only do people continue to improve from 4000 non-duplicates to 10000 non-duplicates but they are apparently improving even more when going beyond 10000 attempts , with both a larger percentage and a larger average improvement compared to the 4000 to 10000 comparison.

The 10000 to final improvements are fairly modest at a 36 average, this is partly due to further plataeuing, but partly due to quite a few people clustering their total non-duplicates quite close 10000, so have a smaller scope time wise for improvement.  For those who had over 15000 non-duplicate attempts (25 total people), the average 10,000 to final gap was just under 50 performance rating points.

There is still the issue of how relevant is all of this to older solvers. While I don't have age data for all these users, I do have FIDE year of birth data for the users who had entered their FIDE id in their preferences. Unfortunately this was only 5 people in this data set. Their average age was 51. Cutting down from 116 to 77 users by excluding those with less than 10,000 total duplicates removed 1 non-adult, but several other excluded people were in the 30+ bracket. While it is a very small sample size, based on personal knowledge of those on the forum, and the supporting data of the FIDE ages I think it is fair to say the majority of the users in the 77 person sample are adults, many of them quite old. 3 of the 5 over 40 year olds improved from 4000 non-duplicate attempt level performance to their final 200 performance, with an average  improvement of 45 for the three improvers. It is obviously hard to draw conclusions on a sample of 5, and I think the much larger sample of 77 is probably a better indicator, but either way, improvement for adults after a large number of attempts seems far from impossible based on all the data I've seen, even if the average improvement is relatively modest.

In summary users in this population on average improve their performance rating from the initial 500 problems to the 4000 problem mark by around 88 points. They then almost match this when going from the 4000 mark to their current level. Note that this is definitely sub-linear progress as usually the number of probems attempts per each non-duplicate steadily rises as the total number of problems rises (although this can be partly mitigated by an increasing rating, providing access to unsolved higher rated problems). This indicates decent improvement can still be gained after the first 4000-5000 problem attempts, although it does require a fair bit of effort.  The average performance ratings across this group for initial 500, 4000, 10000 and final were 1571, 1660, 1708 and 1743.

Given many of these users are probably using far from optimal learning/training strategies, I think it is safe to assume these are the minimal average improvements possible, and better improvement is likely with optimal learning methods.

Regards,
Richard.
« Last Edit: Dec 14, 2014, 11:13:34 PM by richard » Logged
morgensternsxe
Newbie
*
Posts: 1


« Reply #1 on: Dec 16, 2014, 11:40:54 PM »

Thanks for sharing this with the community! It might be very interesting to hear more about suchlike researches. Smiley
Logged
richard
Administrator
Hero Member
*****
Posts: 19247



« Reply #2 on: Jan 20, 2015, 12:52:43 AM »

I was asked to clarify what happened to think times of these users as their rating performance improved, as this wasn't a factor I explicitly controlled for in the original experiment. I didn't think it was necessarily given that blitz solving strongly controls for solve time, and strongly encourages users to solve at close to the average time, which ends up happening in practice, so unlike in standard mode, I didn't see this as a big confounding factor.

However it is true that some people can have big shifts in their blitz solving solve times, and understanding the average change in ratio of users compared to the average is important. To determine that, I performed the same test again, but this time collected the ratio of each users average solve times on the included problems to the average time spent by all users on those problems.  A massive increase in solve times would mean that some of the improvement could be due to simply thinking longer rather than "real" tactical improvement.

It was pleasing to see that the average user in the high volume solving sample got faster, not slower over time compared to other solvers on the same problems. The average of the (user think time/average all  user think time) ratio improved over time. The average ratio after 500 problems was 0.86, this dropped to 0.84 at the 4000 level, and improved a little more to 0.83 by the final 500 problems.

So in summary, not only do the majority of high volume solvers improve (and continue to improve long term, albeit more slowly), they do so without excessive think times compared to average, in fact they speed up.

Regards,
Richard.

Logged
Pages: [1]
Print
Jump to: