Research on Multimodality in the Field of Linguistics in China

Research on Multimodality in the Field of Linguistics in China

The research on multimodality in the field of linguistics in China has formed a research pattern centered on systemic functional linguistics since its introduction in the early 21st century, integrating multiple disciplines. The following is a comprehensive overview from the perspectives of development context, theoretical framework, core areas, research hotspots, and challenges:

1. Development Context and Theoretical Framework

1. Origin and Theoretical Introduction

The domestic research on multimodality began in 2003 when Li Zhanzai introduced Kress and van Leeuwen’s theory of visual grammar, analyzing the collaborative effects of language, images, sounds, and other symbols based on systemic functional linguistics (SFL). Early research focused on theoretical introduction, with an average of less than 5 publications per year from 2003 to 2011. After 2012, it entered a period of rapid growth, with an average of over 20 publications per year, expanding research topics to education, advertising, film, and other fields.

2. Expansion of Theoretical Framework

Systemic Functional Linguistics: Emphasizes the three meta-functions of multimodal symbols (ideational, interpersonal, textual), using visual grammar to analyze image composition (information value, salience, framing).

Cognitive Linguistics: Combines multimodal metaphor theory to explore the dynamic meaning construction of images and texts in advertisements and comics, such as Feng Dezheng’s analysis of spatial metaphors in car advertisements expressing social relationships.

Interdisciplinary Models: Royce’s theory of intersemiotic complementarity and modal classification system (static/dynamic, realistic/non-realistic) are applied to multimodal discourse analysis.

2. Core Research Areas

1. Multimodal Discourse Analysis

Genre Analysis: Focuses on text types such as visual poetry, advertising, and film. For example, research on the interaction between typography and text in visual poetry (Hu Dan, 2007), and the metaphor construction of image-text relationships in advertisements (Zhang Hui, 2011).

Technical Tool Development: Introduces software like ELAN for multimodal corpus annotation, promoting empirical research (Wang Lifei, 2008).

2. Applications in Education

Foreign language teaching is a major application scenario, covering textbook design (multimodal PPT and gesture coordination) and optimization of classroom interaction patterns. Empirical evidence shows that multimodal teaching can enhance students’ language output ability by over 30%.

3. Interaction Linguistics and Multimodality

Recent studies combine interaction linguistics to analyze the impact of multimodal resources (such as gestures and prosody) on meaning construction in spoken communication. For example, Li Xianyin et al. (2024) explore the online output and multimodal coordination mechanisms in spoken Chinese.

4. Film and Digital Technology

Generative AI technologies (such as diffusion models and GPT) are applied to the generation of virtual characters in films and group animation simulations, but face challenges in generation stability and computational costs.

3. Research Hotspots and Trends

1. Current Hotspots

Multimodal Large Models (MLLM): Represented by BLIP-2 and InstructBLIP, these models combine visual encoders to achieve cross-modal reasoning, promoting zero-shot learning development.

Cross-Modal Generalization: The unified application of diffusion models in image, video, and action generation, such as virtual film production technology.

Emotion Computing: Enhances emotion recognition accuracy through multimodal complementarity (e.g., the advantage of text modality in expressing pleasure).

2. Future Trends

Deepening Generative AI: Exploring fields such as emotional expression in digital humans and automatic generation of multimodal corpora.

Localization of Theoretical Construction: Reducing dependence on Western frameworks and developing modal coordination theories that adapt to the characteristics of the Chinese language.

Optimization in Low-Resource Scenarios: Developing lightweight models to reduce computational costs and enhance the robustness of multimodal integration.

4. Challenges and Suggestions

1. Existing Issues

Insufficient Theoretical Innovation: 70% of research still relies on the SFL framework, lacking localized innovation.

Weak Interdisciplinary Collaboration: Insufficient deep integration between linguistics and computer science, cognitive science.

Limitations of Empirical Research: High costs of corpus annotation, and analysis tools for dynamic multimodal data (such as video) are still immature.

2. Development Suggestions

Build a Chinese multimodal corpus to promote the construction of open-source datasets;

Strengthen interdisciplinary team collaboration, such as joint efforts between linguists and AI engineers;

Explore the combination of multimodal metaphor and cognitive neuroscience to enhance theoretical explanatory power.

Conclusion

The research on multimodality in the field of linguistics in China has shifted from theoretical introduction to application deepening. In the future, breakthroughs in technical bottlenecks driven by generative AI are needed, while also focusing on localized theoretical innovation and interdisciplinary resource integration to expand practical value in fields such as digital humanities and smart education.

Leave a Comment