Recently, OpenAI launched ChatGPT to the public and tech giants Google and Baidu quickly followed with their own AI-powered chatbots. However, Amazon has now raised the bar with their latest language models which have outperformed GPT-3.5 by 16% on the ScienceQA benchmark.
The ScienceQA benchmark is a large set of science-based multiple-choice questions that test a language model’s ability to reason and make inferences. Amazon’s models achieved this high accuracy through their Multimodal-CoT approach, which combines visual and language data to create a more effective rationale.
Multimodal-CoT works by breaking down reasoning into two stages: generating a reason and figuring out the answer. The model is fed both visual and language data in the rationale generation stage and the resulting rationale is added to the language input in the answer inference stage. This results in a more convincing argument and more accurate answers.
The Amazon researchers concluded that their method, which outperforms GPT-3.5 on the ScienceQA benchmark, has potential for further improvement in future studies. They plan to leverage more effective visual features, infuse common sense information, and apply filtering processes to enhance their CoT reasoning.