Analyze AI Understanding Humour and Metaphor Comprehension in the Xiaoyu Tong Dissertation at the University of Amsterdam

Analyze AI Understanding Humour and Metaphor Comprehension in the Xiaoyu Tong Dissertation at the University of Amsterdam

Large language models have fundamentally altered how professionals, researchers, and everyday users interact with technology. These systems generate highly fluent text, answer complex queries, and assist with demanding tasks ranging from software development to copywriting. However, recent academic research from the Netherlands demonstrates that generating grammatically correct sentences is not the same as understanding human language. According to the latest UvA News Articles, artificial intelligence still struggles significantly with some of the most inherently human aspects of communication: jokes and figurative speech.

Why Fluency Does Not Equal True Comprehension in Modern AI

It is easy to conflate the ability to produce coherent text with actual comprehension. When an AI system outputs a well-structured paragraph, the natural assumption is that the system understands the underlying meaning of the words it arranges. University of Amsterdam computer scientist and linguist Xiaoyu Tong challenges this assumption directly in her recent academic work. As she articulates, the fact that AI systems can produce fluent language does not automatically mean they understand language in the same way humans do.

This distinction is critical for developers, researchers, and businesses that rely on AI for nuanced communication. If a system cannot interpret the underlying intent or cultural context of a statement, its utility in complex human environments remains limited. The Xiaoyu Tong dissertation provides concrete evidence of these limitations, offering a sobering reminder that AI systems are fundamentally different from human cognitive processors. Schedule a free consultation to learn more about how these linguistic limitations might impact your industry.

The Mechanics of Metaphor Comprehension in Human Language

To understand why AI fails in certain areas, one must first examine what makes those areas complex. Metaphors are not merely decorative literary devices; they are foundational to how humans process complex information. People use phrases like “grasping an idea” or “attacking an argument” daily without consciously acknowledging the physical actions being applied to abstract concepts. Metaphor comprehension allows individuals to explain difficult concepts rapidly, persuade audiences, create vivid mental imagery, and strengthen social bonds.

For an AI, however, distinguishing between a literal statement and a figurative one requires a level of semantic mapping that current architectures often lack. To investigate this specific deficit, Tong developed the Metaphor Understanding Challenge Dataset (MUNCH). This benchmark contains more than 10,000 carefully annotated paraphrases of metaphorical sentences. When subjected to this rigorous testing, leading commercial language models frequently failed to capture the intended figurative interpretation, instead defaulting to literal translations that missed the point entirely.

Categorizing Metaphorical Intentions

Recognizing a metaphor is only the first step; understanding why a speaker used that metaphor is a much higher cognitive function. Metaphors serve distinct purposes depending on the context. Together with her colleagues at the University of Amsterdam, Tong developed the first large-scale taxonomy of metaphorical intentions, identifying nine distinct categories of use.

While current AI models could recognize some of these intentions reasonably well, their performance was inconsistent across the nine categories. This inconsistency highlights a fundamental gap in how AI processes context. A human listener instinctively knows when a metaphor is being used to explain a complex scientific concept versus when it is being used to mock a political opponent. AI systems, lacking this intuitive social radar, struggle to apply the correct interpretive framework, leading to outputs that can feel tone-deaf or inaccurate.

Testing AI Understanding Humour in the Netherlands and Beyond

If metaphor comprehension presents a steep challenge for AI, understanding humour represents an even higher barrier. Humour relies heavily on incongruity—the recognition of unexpected connections, contradictions, or shared cultural references. A joke works precisely because it subverts expectations. For a machine trained on statistical probabilities, identifying and appreciating the deliberate subversion of those probabilities is an immense challenge.

To quantify this challenge, Tong created the Hummus dataset, a specialized collection of 1,000 cartoons from The New Yorker. These cartoons are an ideal testing ground for AI understanding humour because they rarely rely on simple puns. Instead, they depend on visual metaphors, irony, and subtle connections between an image and its accompanying caption.

The Challenge of Multimodal Jokes

The Hummus dataset revealed that even state-of-the-art multimodal AI systems—those designed to process both text and images simultaneously—often fail to “get the joke.” Understanding a cartoon requires the system to synthesize visual information with textual information, recognize the incongruity between the two, and then apply cultural knowledge to resolve that incongruity into a humorous punchline.

As Tong notes, understanding a joke is about much more than recognizing words or identifying objects in an image. It requires an appreciation of what makes a situation unexpected within a specific cultural framework. AI systems consistently struggle to combine these elements effectively. They might accurately describe the objects in a cartoon and correctly transcribe the caption, but they fail to bridge the gap between the two to identify the humorous intent. Explore our related articles for further reading on the intersection of multimodal AI and human cognition.

Implications for Future Human-Centred AI Systems

The findings from the Xiaoyu Tong dissertation are not merely academic curiosities; they have practical implications for the future development of AI technologies. As artificial intelligence becomes more deeply integrated into everyday life—powering virtual assistants, customer service bots, educational tools, and workplace applications—the ability to navigate the subtleties of human communication becomes paramount.

Consider a virtual assistant designed to tutor students or assist elderly users. If a user makes a joke to break the ice or uses a common metaphor to describe a problem they are facing, an AI that cannot interpret these linguistic nuances will respond in a rigid, literal, and ultimately unhelpful manner. This lack of flexibility limits user trust and reduces the overall usability of the system. Improving AI’s capacity for metaphor comprehension and humour recognition is essential for building systems that are reliable, intuitive, and genuinely useful.

Developers must move beyond optimizing for mere fluency. The next frontier in natural language processing involves building architectures that can map conceptual frameworks, track cultural context, and identify incongruity. By focusing research efforts on these higher-order cognitive tasks, the AI community can begin to close the gap between statistical text generation and genuine language understanding.

Key Takeaways from the Xiaoyu Tong Dissertation

The research conducted at the University of Amsterdam provides several critical insights for technologists and linguists:

  • Fluency masks incompetence: The ability to generate grammatically perfect text does not indicate that an AI understands the semantic meaning, intent, or cultural context of that text.
  • Metaphors are multi-faceted: The MUNCH dataset proves that AI struggles to distinguish between literal and figurative language, and the newly developed nine-category taxonomy shows that AI cannot reliably determine the intention behind a metaphor.
  • Humour requires multimodal synthesis: The Hummus dataset demonstrates that AI systems fail to combine visual and textual data effectively to resolve incongruity, a necessary step for understanding humour.
  • Context is king: Cultural knowledge and social context remain the most difficult elements for AI to replicate, acting as the primary barrier to true conversational intelligence.

Stay Informed with UvA News Articles

Advancements in artificial intelligence require rigorous, independent academic scrutiny to separate marketing claims from technical reality. The research emerging from the Netherlands continues to provide a clear-eyed assessment of what current technology can and cannot achieve. Following UvA News Articles is an effective way for professionals and academics to stay updated on these critical developments.

The Xiaoyu Tong dissertation, titled “Dissecting Incongruity: Metaphor and Humor Understanding of Large Language Models,” stands as a significant contribution to the field of natural language processing. It clearly defines the boundaries of current AI capabilities and sets a targeted roadmap for future research. As AI continues to evolve, moving from simple text predictors to genuinely interactive assistants, the lessons drawn from this research will be invaluable. Share your experiences in the comments below regarding how AI handles figurative language in your specific line of work.

The journey toward human-centred AI is far from complete, but with precise benchmarking and critical analysis like that conducted at the University of Amsterdam, the path forward becomes much clearer. Have questions? Write to us! to continue the discussion on the future of AI and linguistics.

Related Posts

Get in Touch with Our Experts!

Footer and Blog Sticky Form

Share:

Facebook
Twitter
Pinterest
LinkedIn
  • Comments are closed.