Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models
- Maarten W. Bos ,
- Eric Horvitz ,
- Yejin Choi ,
- Noah A. Smith ,
- James W. Pennebaker
ACL 2020 |
We investigate the use of NLP as a measure of the cognitive processes involved in storytelling, contrasting imagination and recollection of events. To facilitate this, we collect and release HIPPOCORPUS, a dataset of 7,000 stories about imagined and recalled events.
We introduce a measure of narrative flow and use this to examine the narratives for imagined and recalled events. Additionally, we measure the differential recruitment of knowledge attributed to semantic memory versus episodic memory (Tulving, 1972) for imagined and recalled storytelling by comparing the frequency of descriptions of general commonsense events with more specific realis events.
Our analyses show that imagined stories have a substantially more linear narrative flow, compared to recalled stories in which adjacent sentences are more disconnected. In addition, while recalled stories rely more on autobiographical events based on episodic memory, imagined stories express more commonsense knowledge based on semantic memory. Finally, our measures reveal the effect of narrativization of memories in stories (e.g., stories about frequently recalled memories flow more linearly; Bartlett, 1932). Our findings highlight the potential of using NLP tools to study the traces of human cognition in language.
Publication Downloads
Hippocorpus
February 1, 2022
To examine the cognitive processes of remembering and imagining and their traces in language, we introduce Hippocorpus, a dataset of 6,854 English diary-like short stories about recalled and imagined events. Using a crowdsourcing framework, we first collect recalled stories and summaries from workers, then provide these summaries to other workers who write imagined stories. Finally, months later, we collect a retold version of the recalled stories from a subset of recalled authors. Our dataset comes paired with author demographics (age, gender, race), their openness to experience, as well as some variables regarding the author's relationship to the event (e.g., how personal the event is, how often they tell its story, etc.). **New to V3**: We expand the Hippocorpus by releasing sentence-level event annotations on a set of 240 stories. 8 crowdworkers went through an imagined, a recalled, and a retold story about the same event, sentence by sentence, and annotated whether the sentence marked the beginning of a new minor or major event, and if so, whether the event was surprising or expected.