[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog":3},{"title":4,"desc":5,"bannerImg":6,"date":7,"links":8,"description":5,"content":9,"tag1":409,"tag2":410,"logosByUrl":413,"resLinks":415},"Meet VideoScore2: The AI Film Critic That Thinks Before It Scores","As AI-generated video explodes, how do we judge it? Discover VideoScore2, a new framework that acts like an expert film critic, providing detailed reasoning before its final verdict, and setting a new standard for AI evaluation.","https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002FBanner_blog\u002Fbanner_videoscore2.png","2025-11-11","{\"github\":\"https:\u002F\u002Ftiger-ai-lab.github.io\u002FVideoScore2\u002F\",\"huggingface\":\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTIGER-Lab\u002FVideoFeedback2\", \"arxiv\":\"https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2509.22799\",\"homepage\":\"\"}",{"data":10,"body":13,"toc":397},{"title":11,"description":12},"Meet VideoScore2: AI Film Critic That Thinks Before It Scores","The age of AI cinema is here. Text-to-video models generate everything from breathtaking landscapes to quirky short films, pushing the boundaries of digital creativity. But this creative explosion has exposed a fundamental challenge: with thousands of AI-generated videos, how do we separate the masterpieces from digital noise?",{"type":14,"children":15},"root",[16,24,29,42,67,94,99,116,121,128,133,168,180,186,191,198,203,208,225,230,247,252,264,281,286,291,308,313,319,324,344,350,369,375,380,392],{"type":17,"tag":18,"props":19,"children":21},"element","h1",{"id":20},"meet-videoscore2-ai-film-critic-that-thinks-before-it-scores",[22],{"type":23,"value":11},"text",{"type":17,"tag":25,"props":26,"children":27},"p",{},[28],{"type":23,"value":12},{"type":17,"tag":25,"props":30,"children":31},{},[32,34,40],{"type":23,"value":33},"Until now, AI evaluators have acted like a simplistic ratings aggregator—giving a single, opaque score without any explanation. It’s like getting a thumbs-up or thumbs-down with no review. This isn’t helpful for developers who need to understand ",{"type":17,"tag":35,"props":36,"children":37},"em",{},[38],{"type":23,"value":39},"why",{"type":23,"value":41}," a video failed or succeeded. What the field really needs is an expert critic — an AI that doesn't just give a rating but shows its work.",{"type":17,"tag":25,"props":43,"children":44},{},[45,47,53,55,60,62],{"type":23,"value":46},"That’s why a global research team from top universities and AI organizations like ",{"type":17,"tag":48,"props":49,"children":50},"strong",{},[51],{"type":23,"value":52},"2077AI",{"type":23,"value":54},", M-A-P, UIUC and Abaka AI developed ",{"type":17,"tag":48,"props":56,"children":57},{},[58],{"type":23,"value":59},"VideoScore2",{"type":23,"value":61},". It’s a revolutionary evaluation framework designed with a simple, powerful principle: ",{"type":17,"tag":48,"props":63,"children":64},{},[65],{"type":23,"value":66},"think before you score.",{"type":17,"tag":68,"props":69,"children":75},"div",{"className":70,"style":74},[71,72,73],"img-wrap","has-caption","center","width: 100%; position: relative; margin-bottom: 62px",[76,78,85,86],{"type":23,"value":77},"\n  ",{"type":17,"tag":79,"props":80,"children":84},"img",{"src":81,"alt":82,"style":83},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251111\u002FVS2-01.webp","An overview of the VideoScore2 pipeline, from diverse data curation to a two-stage training process that enables the model to 'think before it scores'.","width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px",[],{"type":23,"value":77},{"type":17,"tag":25,"props":87,"children":91},{"className":88,"style":90},[89],"img-text","position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px",[92],{"type":23,"value":93},"\n    An overview of the VideoScore2 pipeline, from diverse data curation to a two-stage training process that enables the model to \"think before it scores\".\n  ",{"type":17,"tag":25,"props":95,"children":96},{},[97],{"type":23,"value":98},"As the first framework to combine multi-dimensional scoring with detailed rationales, VideoScore2 immediately sets itself apart from prior work.",{"type":17,"tag":68,"props":100,"children":102},{"className":101,"style":74},[71,72,73],[103,104,109,110],{"type":23,"value":77},{"type":17,"tag":79,"props":105,"children":108},{"src":106,"alt":107,"style":83},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251111\u002FVS2-02.webp","Comparison of VIDEOSCORE2 and existing reward models",[],{"type":23,"value":77},{"type":17,"tag":25,"props":111,"children":113},{"className":112,"style":90},[89],[114],{"type":23,"value":115},"\n    Comparison of VIDEOSCORE2 and existing reward models\n  ",{"type":17,"tag":25,"props":117,"children":118},{},[119],{"type":23,"value":120},"VideoScore2 is the first framework to combine multi-dimensional scoring with detailed rationales, setting it apart from existing video reward models.",{"type":17,"tag":122,"props":123,"children":125},"h2",{"id":124},"inside-the-critics-mind-3-pillars-of-a-great-ai-video",[126],{"type":23,"value":127},"🎬 Inside the Critic's Mind: 3 Pillars of a Great AI Video",{"type":17,"tag":25,"props":129,"children":130},{},[131],{"type":23,"value":132},"Unlike other evaluators, VideoScore2 doesn't just look at one thing. It assesses a video's quality across three distinct, crucial dimensions, much like a human critic would.",{"type":17,"tag":134,"props":135,"children":136},"ul",{},[137,148,158],{"type":17,"tag":138,"props":139,"children":140},"li",{},[141,146],{"type":17,"tag":48,"props":142,"children":143},{},[144],{"type":23,"value":145},"🎨 Visual Quality:",{"type":23,"value":147}," Is the film technically proficient? This covers everything from resolution and clarity to the smoothness of motion and the absence of weird visual artifacts.",{"type":17,"tag":138,"props":149,"children":150},{},[151,156],{"type":17,"tag":48,"props":152,"children":153},{},[154],{"type":23,"value":155},"✍️ Text Alignment:",{"type":23,"value":157}," Does it stick to the script? The AI must faithfully render all the subjects, actions, and styles requested in the text prompt.",{"type":17,"tag":138,"props":159,"children":160},{},[161,166],{"type":17,"tag":48,"props":162,"children":163},{},[164],{"type":23,"value":165},"🧠 Physical Consistency:",{"type":23,"value":167}," Does the world make sense? The video must adhere to common sense and the basic laws of physics. Gravity should work, and objects shouldn't magically morph or disappear without reason.",{"type":17,"tag":25,"props":169,"children":170},{},[171,173,178],{"type":23,"value":172},"Crucially, VideoScore2 doesn't just output three numbers. It first generates a ",{"type":17,"tag":48,"props":174,"children":175},{},[176],{"type":23,"value":177},"detailed, chain-of-thought rationale",{"type":23,"value":179},", explaining its assessment for each dimension before concluding with a final score. This transparency is a game-changer for AI development.",{"type":17,"tag":122,"props":181,"children":183},{"id":182},"education-of-a-critic-how-we-trained-videoscore2",[184],{"type":23,"value":185},"Education of a Critic: How We Trained VideoScore2",{"type":17,"tag":25,"props":187,"children":188},{},[189],{"type":23,"value":190},"Creating such a sophisticated AI critic required an equally sophisticated education. We developed a two-stage training process, akin to sending our model to an elite film school with a world-class curriculum.",{"type":17,"tag":192,"props":193,"children":195},"h3",{"id":194},"part-1-the-dataset",[196],{"type":23,"value":197},"Part 1: The Dataset",{"type":17,"tag":25,"props":199,"children":200},{},[201],{"type":23,"value":202},"The foundation of this education is VideoFeedback2, a massive dataset of over 27,000 AI-generated videos. This \"film library\" was built with diversity at its core. Prompts were sourced from a wide range of real-world and manually designed scenarios to test everything from basic actions to complex camera movements.",{"type":17,"tag":25,"props":204,"children":205},{},[206],{"type":23,"value":207},"The VideoFeedback2 dataset was built from a diverse mix of prompt sources to ensure broad coverage.",{"type":17,"tag":68,"props":209,"children":211},{"className":210,"style":74},[71,72,73],[212,213,218,219],{"type":23,"value":77},{"type":17,"tag":79,"props":214,"children":217},{"src":215,"alt":216,"style":83},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251111\u002FVS2-03.webp","Prompt source proportion",[],{"type":23,"value":77},{"type":17,"tag":25,"props":220,"children":222},{"className":221,"style":90},[89],[223],{"type":23,"value":224},"\n    Prompt source proportion\n  ",{"type":17,"tag":25,"props":226,"children":227},{},[228],{"type":23,"value":229},"To create a realistic gradient of quality, videos were generated by over 20 different text-to-video models, from early open-source projects to state-of-the-art systems like Sora and Kling.",{"type":17,"tag":68,"props":231,"children":233},{"className":232,"style":74},[71,72,73],[234,235,240,241],{"type":23,"value":77},{"type":17,"tag":79,"props":236,"children":239},{"src":237,"alt":238,"style":83},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251111\u002FVS2-04.webp","Detailed information of videos in our dataset",[],{"type":23,"value":77},{"type":17,"tag":25,"props":242,"children":244},{"className":243,"style":90},[89],[245],{"type":23,"value":246},"\n    Detailed information of videos in our dataset\n  ",{"type":17,"tag":25,"props":248,"children":249},{},[250],{"type":23,"value":251},"Videos were generated by over 20 different text-to-video models, creating a fine-grained quality spectrum from \"Poor\" to \"Perfect.\"",{"type":17,"tag":25,"props":253,"children":254},{},[255,257,262],{"type":23,"value":256},"Each video in this library was then watched and annotated by human experts who provided not just scores, but detailed ",{"type":17,"tag":48,"props":258,"children":259},{},[260],{"type":23,"value":261},"reasoning traces ",{"type":23,"value":263},"— the exact \"why\" behind their ratings.",{"type":17,"tag":68,"props":265,"children":267},{"className":266,"style":74},[71,72,73],[268,269,274,275],{"type":23,"value":77},{"type":17,"tag":79,"props":270,"children":273},{"src":271,"alt":272,"style":83},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251111\u002FVS2-05.webp","Video examples from different quality tiers",[],{"type":23,"value":77},{"type":17,"tag":25,"props":276,"children":278},{"className":277,"style":90},[89],[279],{"type":23,"value":280},"\n    Video examples from different quality tiers\n  ",{"type":17,"tag":25,"props":282,"children":283},{},[284],{"type":23,"value":285},"Examples from the VideoFeedback2 dataset, showcasing the range of quality from \"Good\" to \"Perfect\u002FModern\" tiers used to train VideoScore2.",{"type":17,"tag":25,"props":287,"children":288},{},[289],{"type":23,"value":290},"The final dataset contains a natural distribution of scores, with most videos falling into the nuanced mid-range, providing the perfect material for training a discerning critic.",{"type":17,"tag":68,"props":292,"children":294},{"className":293,"style":74},[71,72,73],[295,296,301,302],{"type":23,"value":77},{"type":17,"tag":79,"props":297,"children":300},{"src":298,"alt":299,"style":83},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251111\u002FVS2-06.webp","Human annotated score distribution",[],{"type":23,"value":77},{"type":17,"tag":25,"props":303,"children":305},{"className":304,"style":90},[89],[306],{"type":23,"value":307},"\n    Human annotated score distribution\n  ",{"type":17,"tag":25,"props":309,"children":310},{},[311],{"type":23,"value":312},"The distribution of human scores in our dataset captures the full spectrum of quality, providing a realistic and challenging curriculum for AI.",{"type":17,"tag":192,"props":314,"children":316},{"id":315},"part-2-training-method",[317],{"type":23,"value":318},"Part 2: Training Method",{"type":17,"tag":25,"props":320,"children":321},{},[322],{"type":23,"value":323},"With the curriculum in place, the training began.",{"type":17,"tag":134,"props":325,"children":326},{},[327,339],{"type":17,"tag":138,"props":328,"children":329},{},[330,332,337],{"type":23,"value":331},"First, through ",{"type":17,"tag":48,"props":333,"children":334},{},[335],{"type":23,"value":336},"Supervised Fine-Tuning (SFT)",{"type":23,"value":338},", the model studied the entire VideoFeedback2 library, learning to imitate the patterns of human experts' scores and reasoning.",{"type":17,"tag":138,"props":340,"children":341},{},[342],{"type":23,"value":343},"Then, it entered a \"masterclass\" to sharpen its analytical robustness. We employed Group Relative Policy Optimization (GRPO), an advanced Reinforcement Learning (RL) technique, to hone its judgment, allowing it to generalize its knowledge and align even more closely with nuanced human preferences, making it a truly robust critic.",{"type":17,"tag":122,"props":345,"children":347},{"id":346},"results-putting-videoscore2-to-the-test",[348],{"type":23,"value":349},"🌟 Results: Putting VideoScore2 to the Test",{"type":17,"tag":25,"props":351,"children":352},{},[353,355,360,362,367],{"type":23,"value":354},"After its training, it was time for the final exam. We tested VideoScore2 against a suite of existing benchmarks, and it passed with flying colors. It not only achieved ",{"type":17,"tag":48,"props":356,"children":357},{},[358],{"type":23,"value":359},"state-of-the-art accuracy",{"type":23,"value":361}," on our internal benchmark but, more importantly, demonstrated ",{"type":17,"tag":48,"props":363,"children":364},{},[365],{"type":23,"value":366},"superior generalization",{"type":23,"value":368}," across four out-of-domain benchmarks. This proves that VideoScore2 didn't just cram for its own exam; it truly learned the principles of what makes a good video.",{"type":17,"tag":122,"props":370,"children":372},{"id":371},"from-critic-to-director-whats-next",[373],{"type":23,"value":374},"🎥 From Critic to Director: What's Next?",{"type":17,"tag":25,"props":376,"children":377},{},[378],{"type":23,"value":379},"Perhaps the most powerful aspect of VideoScore2 is that its journey doesn't end as a critic. It can now become a director.",{"type":17,"tag":25,"props":381,"children":382},{},[383,385,390],{"type":23,"value":384},"By serving as a highly reliable ",{"type":17,"tag":48,"props":386,"children":387},{},[388],{"type":23,"value":389},"reward model",{"type":23,"value":391},", VideoScore2 can provide the nuanced feedback needed to train the next generation of text-to-video models. We've already shown that using it to automatically select the \"best-of-N\" generated videos significantly improves output quality. This creates a powerful feedback loop where better evaluation leads to better generation.",{"type":17,"tag":25,"props":393,"children":394},{},[395],{"type":23,"value":396},"This entire project is a testament to the power of open, collaborative research. By releasing VideoScore2 and its dataset, we are providing the entire AI community with a transparent, powerful tool to accelerate progress and build more aligned, creative, and controllable generative models",{"title":398,"searchDepth":399,"depth":399,"links":400},"",2,[401,402,407,408],{"id":124,"depth":399,"text":127},{"id":182,"depth":399,"text":185,"children":403},[404,406],{"id":194,"depth":405,"text":197},3,{"id":315,"depth":405,"text":318},{"id":346,"depth":399,"text":349},{"id":371,"depth":399,"text":374},"model",[411,412],"video","multimodal",[414],"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002Fdocs-hub\u002F2077ai\u002Forg-logo\u002FUniversity_of_Illinois_at_Urbana-Champaign_Wordmark.png",{"homepage":416,"arxiv":417,"github":418,"huggingface":419},"https:\u002F\u002Ftiger-ai-lab.github.io\u002FVideoScore2\u002F","https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.22799","https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FVideoScore2\u002F","https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FTIGER-Lab\u002FVideoFeedback2"]