RELIC: Investigating Large Language Model Responses using Self-Consistency

Furui Cheng1 Vilém Zouhar1 Simran Arora2 Mrinmaya Sachan1 Hendrik Strobelt3 Mennatallah El-Assady1
ETH Zurich1 Stanford University2 IBM Research3

Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To tackle this challenge, we propose an interactive system that helps users obtain insights into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence in individual claims in the generated texts. Using this idea, we design RELIC, an interactive system that enables users to investigate and verify semantic-level variations in multiple long-form responses. This allows users to recognize potentially inaccurate information in the generated text and make necessary corrections. From a user study with ten participants, we demonstrate that our approach helps users better verify the reliability of the generated text. We further summarize the design implications and lessons learned from this research for inspiring future studies on reliable human-LLM interactions.

Preprint Presentation Code (WIP) Demo (WIP)

@inproceedings{cheng2024relic,
   title={{RELIC}: Investigating Large Language Model Responses using Self-Consistency},
   author={Cheng, Furui and Zouhar, Vilém and Arora, Simran and Sachan, Mrinmaya and Strobelt, Hendrik and El-Assady, Mennatallah},
   booktitle={Proceedings of the 2024 CHI conference on human factors in computing systems},
   year={2024},
   doi={https://doi.org/10.1145/3613904.3641904}
}