Abstract: Deep learning models have made great strides in recent years. Subsequently, model calibration and measurements of the quantity have gained much attention, with the degree being an indication of reliability of a model. In this study, we explore the limitations of the existing calibration metrics, and propose a simple calibration metric that caters to natural language generation (NLG) tasks. Unlike existing calibration metrics, our metric is not confined to/not sorely based on a single prediction; it considers a distribution mapped by a model. In this regard, the proposed metric takes intrinsic uncertainty present in a natural language into account when quantifying the calibration degree. The metric has been tested on machine translation datasets, a popular NLG task with intrinsic uncertainty. A thorough analysis illustrates that the proposed metric possesses the ability to handle intrinsic uncertainty and hence is more suitable measure under NLG tasks.
Paper Type: short
0 Replies
Loading