Adversarial Benchmarks for Commonsense Reasoning

377
Published on 25 Mar 2019, 19:46
Human intelligence involves comprehending new situations through a rich model of the world. Given a single image from a movie, or a paragraph from a novel, we can easily infer people’s intentions, mental states, and actions. However, enabling machines to perform this kind of commonsense reasoning remains elusive. Beyond the inherent difficulty of building models that reason, we lack robust benchmarks that evaluate AI reasoning ability.

In this talk, I will present two new large-scale benchmark datasets for commonsense reasoning, covering text (SWAG, rowanzellers.com/swag) and vision (VCR; visualcommonsense.com). These datasets pose new types of reasoning challenges: machines must abstract away from text and images and understand the entire situation, and then explain their predictions. Equally important is what these datasets don’t contain: they are adversarially constructed using a suite of new techniques, so as to be resistant to biases. In addition, I will introduce models for these datasets, and discuss where the field might go next towards human-level commonsense reasoning.

See more at microsoft.com/en-us/research/video/adver...
news tech music