February 07, 2023

Multimodal Chain-of-Thought Reasoning in Language Models

27 minutes

Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies are mostly isolated in the language modality with LLMs, where LLMs are hard to deploy. To elicit CoT reasoning in multimodality, a possible solution is to ﬁne-tune small language models by fusing the vision and language features to perform CoT reasoning.

2023: Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, G. Karypis, Alexander J. Smola

https://arxiv.org/pdf/2302.00923v1.pdf

...more