Best Blackhat Forum

Full Version: Large language models can do limited counterfactual reasoning
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Large language models can do limited counterfactual reasoning (31 minute read)

There's been a lot of discussion recently about how to evaluate language model performance. Benchmarks are good and give directional metrics of improvement. However, they are well known to be incomplete. This paper explores counterfactual reasoning and suggests that models are not as powerful as we believe, but are more powerful than recent baseline models.
Reference URL's