The Science of Faster Memorization with Spaced Repetition

You can’t talk for too long about the science of learning without running into “spaced repetition.” It comes up when optimizing your use of flashcards, for example, to help you study the right thing at the right time. It all comes back to this guy, Hermann Ebbinghaus:

ebbinghaus

He spent years of his life teaching himself “nonsense syllables” in order to record and analyze how his brain learned information. We can thank him for our understanding of the forgetting curve, which is the basis of Spaced Repetition. The basics of Spaced Repetition (SRS) is quite simple: there is an optimal time to “review” information, and that point is right as you’re about to forget it.

What follows is an essay I wrote on SRS, called “An Algorithm for Learning,” for one of my neuroscience classes.

An Algorithm for Learning

Acquiring, encoding, and retrieving long-term information is a topic of much research and many questions within the field of neuroscience. Numerous studies have affirmed intuitive beliefs about the nature of consciously acquiring knowledge, such as the importance of attention and rehearsal, and some have even yielded formulaic patterns to represent the effects of these quantities. It is difficult to generalize the best practices for learning across all humans and all topics, but there are certain neurological phenomenon that may be used as an advantage in order to maximize the effectiveness of any study session. Even with the most predictable truths of memory, though, it is challenging to extrapolate from this information and make a statement about how to implement the findings in a practical way for students. Still, certain useful conclusions can be drawn from research into the brain as it pertains to learning. Among the most significant of these findings is the manner in which temporal locality effects long-term retention, which can yield useful conclusions about when and how long to study in order to maximize learning potential. Putting this research together, it is possible to assemble an algorithm for learning which increases long-term retention potential. This paper will address research into short and long term memory and their role in learning (with an emphasis on temporal locality), and then examine existing documented study techniques which put this research to use, and finally consider possible improvements that can be made by the student in order to create a maximally effective algorithm for learning.
Surprisingly, one of the greatest difficulties in researching the ways in which to improve learning come from challenges presented in quantifying data. It was a psychologist by the name of Hermann Ebbinghaus who most notably tackled this problem in 1885, using a set of “nonsense syllables” as control information. He also established a “savings method,” used to sum the long-term retention of information, wherein attempts required to relearn material are compared against attempts to originally learn the material in order to create an index pertinent to long-term memory. In his experiments, Ebbinghaus also established the groundwork for what would become one of the most famous and useful temporal learning effects: primary-recency. Among the first things he noticed was the fact that lumping studying together into one large session was significantly less effective than breaking it into several small sessions. He also noted that further practice of material after the initial acquisition of knowledge increased retention in the long term.
Later research derived the primary-recency effect, which breaks any given learning session into three key periods: a first prime-time, a down time, and a second prime-timeii. The prime-times are most useful for learning, while retention is significantly lower during the down-time. Furthermore, as the length of any given session increases, the relative size of the down-time also increases. This supports Ebbinghaus’ original findings, in that the percentage of total time of a session which is less effective is increased during longer study sessions:

Screen Shot 2015-04-18 at 9.44.25 AM

 

This is significant for the planning and timing of study sessions because it can aid in determining how long a session should be and what information should be presented at what relative time. Some researchers have suggested that a study time of 20-30 minutes is idealiii, being that the relative size of the down-time increases well above 20% after this point. It is clear, based upon the primary-recency effect, that the first prime-time is most useful for introducing new information. Still, the down-time is far from useless: it is an excellent opportunity to expand learning by placing information in a different context, as well as review past information. Games and other interactive techniques are perfectly suited for maximizing down-time. Finally, the preferred use of the second prime-time by teachers is that of “closure:” re-rehearsing information which was learned during the session (preferably in a slightly different manner) in order to maximize retention for the next study session.

 

 

While much research is focused specifically on long-term memory, more recent studies in short term memory have also contributed valuable insight to the understanding of how to affect long-term memory. The “7 +/- 2” formula presented by Milleriv describes the amount of information which working memory (short-term memory) can contain at any given point in time. Each piece of information represents a single chunk of data; consequently working memory can generally only be improved by the process of chunkingv. This implies that the effectiveness of working memory can be improved, but it still holds no significant bearing upon long-term storage. It is consequently important to make a conscious distinction between short-term effectiveness and long-term storage in the studying process, as it has been shown (such as in the case of H.M.vi) that keeping information in working memory does not necessarily translate to long-term retention. These working memory truths can be demonstrated in the manner which a person might speak a phone number (7 digits) repeatedly in order to remember it for a few minutes, but immediately after stopping is completely unable to remember even the first number in the sequence. Therefore, any comprehensive approach to learning will compensate for working memory by attempting to ensure that only information which has been encoded only within long-term memory is considered “stored.” At the same time, it can be seen that the Articulatory Loop Rehearsal system suggests that the faster a rehearsal loop happens, the higher the probability that the information will be storedvii. This also implies that the longer words have a lower probability of being stored. Similarly, merely repeating information for a longer period of time does not improve long-term memoryviii. It can therefore be concluded that an effective study technique which understands the properties of working memory will employ quick rehearsal of small bits of information (allowing only a short period of time for each rehearsal) while not returning to previous information until it is certain that the information in question has exited working memory.
Long-term memory is said to be based upon two key factors: activation and strength. Anderson showed in 1976 recently accessed information always performs better in retrieval time, and the amount of practice with regards to the information is relevant assuming that the information was not recently accessedix. Interestingly, the level of activation (easiness of retrieval as a result of information having been recently accessed) spreads to concepts which are related to the topic which was activatedx,xi. The positive benefits of activation as it pertains to recall decay quickly over time, but benefits of practice has a much longer-standing impact upon the speed and quality of recall. It can therefore be seen that subsequent re-activation of recently activated knowledge will produce better recall rates, and studying items which are conceptually linked will also trigger this activation effect. Practically speaking, this neurological feature is useful in the sense that it guarantees that performance will be increased for topics pertinent at the moment. When speaking about American History, for example, it is useful that the brain will provide easier access to related terms through use of the effect of recent activation. Ultimately it simply implies that humans are better at recalling information they are thinking about and using.

 

The eventual goal of learning, though, is to maximize recall in any situation – not just recently activated material. For this purpose, only practice can do the job. It has also been shown that the vast majority of forgetting occurs almost immediately after information is studiedxii. That is to say, information which has already been retained over a long period has a much greater likelihood of being retained in the future. The general power law of reaction time is applicable to practice and long term memory:

Recall Time = C * [Practice Time] K

C and K are both constants pertaining to the given task. This law implies that that the amount of practice time is very important at first, but suffers the law of diminishing returns as practice time increases due to the fact that there is a limit on performance capability. It is not surprising that it has been shown that information which has been learned once before, though forgotten, is easier to re- learnxiii. Furthermore, it has also been demonstrated that the depth of processing information has significant impact upon the effectiveness of practice time. Clearly, then, the goal for establishing an effectively stored long-term memory is to maximize the depth of understanding, and secondly the amount of time spent practicing the information.

 

Building upon the conclusions drawn from working memory, it can be seen that placing small bits of quickly rehearsed information in varying contexts can further improve long-term retention. The importance of differing contexts for information is two-fold: firstly, it increases the probability that the information will be linked with a greater amounts of other information, increasing the potential benefits of recent activation. Secondly, it increases depth of understanding of the information by studying other ways the information can be used. For example, when studying Chinese, rehearsing the same word over and over is not as useful as studying three words which share common charactersxv. The brain uses the power of recent activation, and furthermore the overall understanding of each individual character is improved: it is more valuable to understand the different subtle connotations of the characters 中 (middle) and 国 (country) than to simply understand the word 中国 (China, lit. “middle country”). In other words, quick rehearsal of related information is superior to repeated rehearsal of limited information.

 

The SuperMemo algorithm, originally written in 1985 based upon the “Optimization of Learning” master’s thesis by P.A.Wozniak, combines many diverse pieces of neurological research and is supported by experimental and anecdotal evidence by countless students:

Screen Shot 2015-04-18 at 9.44.30 AM

 

Much of the sophisticated learning software available today uses the SuperMemo algorithm as a basis for SRS (Spaced Repetition Software) in order to improve learning retention. The algorithm was based upon the principles of active recall and minimum information (similar to the requirements for depth of understanding and small chunks of information, outlined above). Experiments which were created by Wozniak supported his theory suggesting increasing intervals of time, giving rise to what he called the “optimum spacing retention principle,” which subsequently helped to give rise to the concept of and support for SRS. After years of experiments, the first SuperMemo algorithm was created using 1, 7, 16, and 35, and N=(i-1)*2 days as the spacing intervals between information re-appearing. The original version of the SuperMemo algorithm was implemented using only a pencil and paper. Information was chunked into pages, each page containing 10 or more distinct pieces of information (notably exceeding the 7 +/- 2 formula for working memory). Then each page was studied once (on its own and not immediately after another page, notably falling within the optimal window of the primary- recency effect). Finally, words which were forgotten from previous pages after a 35-day interval were re-compiled in addition to new words in new pages as the student progressedxvii. The results from this original method were studied over the course of one year, and found to have an 80% retention rate with an average of between 260 and 110 items/year/minute learnedxviii (Fig 2). In 1987, the SuperMemo process was officially converted into a computer algorithm, and by 2005 subsequent versions of it were implemented in popular applications such as Ankixix and Mnemosynexx. The most commonly implemented version of the algorithm in software is SuperMemo-2, which is shown to have a retention rate of 92% for 270 items/year/minutexxi. As it stands, some of the only lacking features of the SuperMemo algorithm are that it does not take into account some potentially useful information, such as the primary-recency effect and also the usefulness of related material to improve recall as a result of activation.

 

Of the other research done in using Spaced Repetition Software, the most well-known commercial implementation is that of the Pimsleur Language Learning System, originally developed by Paul Pimsleur based upon his research at UCLA. Each lesson is thirty minutes in length (notably within the range suggested for minimizing down-time with the primary-recency effect); the intent is that the student listens to a native speaker and repeats a new word or phrase; this phrase is then repeated throughout the course of the lesson at graduated intervals. Similar to SuperMemo, some of the key principles involved in the Pimsleur learning system are breaking information into small manageable chunks, systematically interspersing new and old material, and reinforcing correct answers over successive intervalsxxii. The major fault of existing implementations of the Pimsleur system, however, lies in the fact that they provide no method by which the student can submit feedback, making the learning process unnaturally one-directional.

 

If improvements are to be made to the existing algorithms for learning, it is conceivable that even higher retention rates of material could be shown. The strength of the SuperMemo algorithm lies in that it uses the student’s own assessment of his knowledge in order to choose intervals which are best suited; the Pimsleur approach succeeds in using rich media and a well-timed study session to leverage the primary-recency effect. Further application of this effect would inevitably suggest that the student adjust the difficulty curve of the material in correlation with the first and second prime-times. For example, if the length of the study session were known in advance then the SuperMemo algorithm could be modified to use the first prime-time for new material, the down-time for old material (and potentially related material, in order to take advantage of recent activation of related topics and also increase the depth of understanding), and the second prime-time could be used to review new material introduced in the session. Another possible modification on the SuperMemo algorithm would be to limit facts from recurring for at least an interval of ten facts, in order to help ensure that the item had left both working and recently-activated long-term memory. Ultimately, there is still much that might be added to create a maximally effective algorithm for students. Some preliminary research suggests, for example, that presenting words to be memorized in color yields as much as a 12.7% increase in retentionxxiii. There is still a great deal of research that needs to be done into the formation of long-term memories to support and build upon existing techniques, and as it stands the support for the effectiveness of most techniques is based largely upon empirical evidence. Still, it cannot be denied that research into neuroscience is improving the way we learn, and further improvements can be made by students employing the latest research in their study routine.

[author]

Leave A Comment

You must be logged in to post a comment.

Back to Top