Why single benchmark scores mislead: interpreting a low Vectara score with high AA-Omniscience
http://www.video-bookmark.com/user/aslebyfwkt
3 key factors when evaluating LLMs beyond a single leaderboard number Many teams pick a model because it tops a single benchmark