COKE: A Cognitive Knowledge Graph for Machine Theory of Mind

1Tsinghua University 2The Chinese University of Hong Kong 3University of Electronic Science and Technology of China
*Equal Contribution ‡Corresponding author

Theory of mind (ToM) refers to humans' ability to understand and infer the desires, beliefs, and intentions of others. The acquisition of ToM plays a key role in humans' social cognition and interpersonal relations. ToM is still lacking for modern AI and NLP systems since they cannot access the human mental state and cognitive process beneath the training corpus.

To empower AI systems with the ToM ability and narrow the gap between them and humans,
(1) we propose COKE: the first cognitive knowledge graph for machine theory of mind. Specifically, COKE formalizes ToM as a collection of 45k+ manually verified cognitive chains that characterize human mental activities and subsequent behavioral responses and affective responses when facing specific social circumstances. (2) we build a powerful cognitive language model COLM by associating COKE with LLaMA-2, so as to predict cognitive chains for out-of-KG situations. (3) we conduct extensive experiments to evaluate the ToM ability of COLM and typical LLMs. The results show that COLM outperforms strong baseline models such as GPT-4 in both zero-shot and few-shot settings , proved by automatic and human evaluations in all cognitive generation tasks, which in turn demonstrates the high quality of COKE. (4) we further substantiate the potential of COKE in enhancing social applications and prove its effectiveness on downstream emotional support conversation tasks .

Data Structure of COKE

We specify five types of nodes in COKE: situations, clues, thoughts, actions, and emotions. We here define the basic unit of COKE as the following cognitive chain: Situation⇒Clue⇒Thought⇒(Action+Emotion). Furthermore, to distinguish whether a cognitive chain is optimistic or pessimistic under the specific situation, we further define its polarity as positive or negative.

dap

Data Collection

As shown in Figure below, 1) we first manually design suitable few-shot prompts to induce GPT-3.5 to automatically generate raw data for five types of nodes in a pipeline manner. 2) We then recruit and train eight graduate students majoring in social psychology as annotators to select and revise the outputs of GPT-3.5.

dap

Experiment

Given a situation, we can decompose the cognitive process into the following four generation tasks as shown in the Table.

dap

Compared to baseline models like LLaMA-2 and Mistral, which face challenges in following task instructions, COLM shows significant performance enhancements across various cognitive generation tasks with COKE. While leveraging additional prompts boosts performance for powerful models like GPT-3.5 Turbo and GPT-4 via in-context learning in clue/thought/action generation tasks, these LLMs still struggle with complex emotional understanding and fail in the emotion classification task. Compared to these powerful LLMs, COLM maintains substantial advantages across all evaluation metrics.
We hereby have two observations: 1) The data we collected in COKE is of high quality and can empower the model with strong ToM ability. 2) COLM, which is designed as a controllable generative model for multiple cognitive tasks, can effectively internalize the ToM ability and cope with unseen situations

dap

Downstream Task

Emotional Support Conversation

As COKE is a cognitive knowledge graph for machine theory of mind, we further substantiate its effectiveness in empowering social applications with ToM capabilities. Our chosen testing ground is the emotional support conversation (ESC) task. The prime objective of ESC is to generate empathetic and effective responses in social dialogues, with the aim of mitigating users’ emotional distress and fostering improved mental states.
We treat each user’s utterance in the dialogue as a clue, and employ COLM to infer two thoughts (1 positive and 1 negative) via thought generation. Based on the generated thoughts, we can accordingly infer actions for each thought via action generation. We simply extract verbs and nouns in thoughts or actions as the ToM knowledge keywords, and append them to the end of dialogue history, serving as an enhanced context. Based on the context, the ESC system is trained to generate the corresponding response.

dap
Case Study

Here is a case study and how COKE offers a more nuanced and empathetic approach to AI-driven emotional support.


BibTeX

@article{wu2023coke,
  title={Coke: A cognitive knowledge graph for machine theory of mind},
  author={Wu, Jincenzi and Chen, Zhuang and Deng, Jiawen and Sabour, Sahand and Huang, Minlie},
  journal={arXiv preprint arXiv:2305.05390},
  year={2023}
}