[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog":3},{"title":4,"desc":5,"bannerImg":6,"date":7,"links":8,"description":5,"content":9,"tag1":396,"tag2":397,"logosByUrl":400,"highlighted":389,"resLinks":402},"Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing","Discover how EditReward, a new human-aligned reward model, is solving the biggest bottleneck in AI image editing. Learn how 2077AI‘s high-fidelity data is empowering open-source models to compete with giants like GPT-5.","https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002FBanner_blog\u002Fblog_editreward.png","2025-10-31","{\"github\":\"https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FEditReward\",\"huggingface\":\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTIGER-Lab\u002Feditreward-68ddf026ef9eb1510458abc6\", \"arxiv\":\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.26346\"}",{"data":10,"body":12,"toc":387},{"title":4,"description":11},"Have you ever wondered why open-source AI image editors, while impressive, often struggle to match the flawless performance of closed-source giants like Google's Nano Banana or OpenAI's GPT-Image? The secret isn't just about model architecture; it's about the quality of the data they're trained on.",{"type":13,"children":14},"root",[15,23,28,33,38,71,78,83,111,116,122,135,147,169,174,221,226,243,249,254,259,270,275,281,286,291,327,344,356,366,372,377,382],{"type":16,"tag":17,"props":18,"children":20},"element","h1",{"id":19},"introducing-editreward-the-ai-judge-thats-closing-the-gap-in-open-source-image-editing",[21],{"type":22,"value":4},"text",{"type":16,"tag":24,"props":25,"children":26},"p",{},[27],{"type":22,"value":11},{"type":16,"tag":24,"props":29,"children":30},{},[31],{"type":22,"value":32},"The biggest bottleneck for open-source AI has been the lack of a reliable \"judge\" — an AI that can accurately tell a good image edit from a bad one. Without a good judge, you can't create the high-quality training data needed to build a great editor.",{"type":16,"tag":24,"props":34,"children":35},{},[36],{"type":22,"value":37},"Today, we're pulling back the curtain on a project that tackles this problem head-on. Introducing EditReward, a human-aligned reward model designed to serve as a fair, consistent, and incredibly accurate critic for instruction-guided image editing.",{"type":16,"tag":39,"props":40,"children":46},"div",{"className":41,"style":45},[42,43,44],"img-wrap","has-caption","center","width: 100%; position: relative; margin-bottom: 62px",[47,49,56,57],{"type":22,"value":48},"\n  ",{"type":16,"tag":50,"props":51,"children":55},"img",{"src":52,"alt":53,"style":54},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251031\u002FER01.webp","An overview of the EditReward framework","width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px",[],{"type":22,"value":48},{"type":16,"tag":24,"props":58,"children":62},{"className":59,"style":61},[60],"img-text","position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px",[63,65,69],{"type":22,"value":64},"\n    An overview of the EditReward framework, ",{"type":16,"tag":66,"props":67,"children":68},"br",{},[],{"type":22,"value":70},"from the multi-model data generation pipeline to the multi-dimensional reward model training.\n  ",{"type":16,"tag":72,"props":73,"children":75},"h2",{"id":74},"the-problem-why-is-judging-image-so-hard",[76],{"type":22,"value":77},"The Problem: Why Is Judging Image So Hard?",{"type":16,"tag":24,"props":79,"children":80},{},[81],{"type":22,"value":82},"Imagine trying to train a chef using a food critic who only says \"I like it\" or \"I don't.\" It's not very helpful, right? This is the problem facing AI image editing. Current \"reward models\" (the AI critics) are often unreliable:",{"type":16,"tag":84,"props":85,"children":86},"ul",{},[87,93,98],{"type":16,"tag":88,"props":89,"children":90},"li",{},[91],{"type":22,"value":92},"Some are trained on noisy, inconsistent data from crowd-sourced platforms.",{"type":16,"tag":88,"props":94,"children":95},{},[96],{"type":22,"value":97},"Others use labels generated by proprietary models, which can be biased and inaccurate.",{"type":16,"tag":88,"props":99,"children":100},{},[101,103,109],{"type":22,"value":102},"Most provide a single, vague score, failing to capture ",{"type":16,"tag":104,"props":105,"children":106},"em",{},[107],{"type":22,"value":108},"why",{"type":22,"value":110}," an edit is good or bad. Is it visually stunning but ignored the user's instructions? Or did it follow the prompt perfectly but create a distorted, unrealistic image?",{"type":16,"tag":24,"props":112,"children":113},{},[114],{"type":22,"value":115},"This lack of a reliable critic has held the open-source community back. To build a truly state-of-the-art model, you first need a state-of-the-art judge.",{"type":16,"tag":72,"props":117,"children":119},{"id":118},"editreward-data-the-foundation-of-excellence",[120],{"type":22,"value":121},"EDITREWARD-DATA: The Foundation of Excellence",{"type":16,"tag":24,"props":123,"children":124},{},[125,127,133],{"type":22,"value":126},"This is where the journey of EditReward begins, with a foundational contribution from data experts at ",{"type":16,"tag":128,"props":129,"children":130},"strong",{},[131],{"type":22,"value":132},"2077AI",{"type":22,"value":134},". We knew that, to train a world-class AI judge, we first needed to build the world's best \"rulebook\" — a dataset that embodies what humans truly consider a high-quality edit.",{"type":16,"tag":24,"props":136,"children":137},{},[138,140,145],{"type":22,"value":139},"This led to the creation of EditReward-data. This isn't just another dataset; it's a meticulously curated collection of over ",{"type":16,"tag":128,"props":141,"children":142},{},[143],{"type":22,"value":144},"200,000 human preference pairs",{"type":22,"value":146},".",{"type":16,"tag":39,"props":148,"children":150},{"className":149,"style":45},[42,43,44],[151,152,157,158],{"type":22,"value":48},{"type":16,"tag":50,"props":153,"children":156},{"src":154,"alt":155,"style":54},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251031\u002FER02.webp","Visualizing the diversity of EditReward-data",[],{"type":22,"value":48},{"type":16,"tag":24,"props":159,"children":161},{"className":160,"style":61},[60],[162,164,167],{"type":22,"value":163},"\n    Visualizing the diversity of EditReward-data, ",{"type":16,"tag":66,"props":165,"children":166},{},[],{"type":22,"value":168},"which covers a wide range of editing categories, data sources, and models.\n  ",{"type":16,"tag":24,"props":170,"children":171},{},[172],{"type":22,"value":173},"Here’s what makes it different, and where the expertise of the 2077AI team shines:",{"type":16,"tag":84,"props":175,"children":176},{},[177,187,211],{"type":16,"tag":88,"props":178,"children":179},{},[180,185],{"type":16,"tag":128,"props":181,"children":182},{},[183],{"type":22,"value":184},"Expert Annotation:",{"type":22,"value":186}," Instead of noisy crowd-sourcing, every single data point was annotated by trained experts following a rigorous, standardized protocol.",{"type":16,"tag":88,"props":188,"children":189},{},[190,195,197,202,204,209],{"type":16,"tag":128,"props":191,"children":192},{},[193],{"type":22,"value":194},"Multi-Dimensional Scoring:",{"type":22,"value":196}," We moved beyond a single score. Our experts rated each edit on two distinct axes: ",{"type":16,"tag":128,"props":198,"children":199},{},[200],{"type":22,"value":201},"Instruction Following",{"type":22,"value":203}," (did it do what you asked?) and ",{"type":16,"tag":128,"props":205,"children":206},{},[207],{"type":22,"value":208},"Visual Quality",{"type":22,"value":210}," (does it look good and realistic?).",{"type":16,"tag":88,"props":212,"children":213},{},[214,219],{"type":16,"tag":128,"props":215,"children":216},{},[217],{"type":22,"value":218},"Diversity:",{"type":22,"value":220}," The data covers a massive range of edits from seven state-of-the-art models, ensuring our final judge is fair and unbiased.",{"type":16,"tag":24,"props":222,"children":223},{},[224],{"type":22,"value":225},"Building this dataset was a monumental task, reflecting 2077AI's commitment to pioneering the high-fidelity data infrastructure that empowers the entire open-source ecosystem.",{"type":16,"tag":39,"props":227,"children":229},{"className":228,"style":45},[42,43,44],[230,231,236,237],{"type":22,"value":48},{"type":16,"tag":50,"props":232,"children":235},{"src":233,"alt":234,"style":54},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251031\u002FER03.webp","A look into the meticulous annotation process on the platform provided by Abaka AI, a key contributor from 2077AI.",[],{"type":22,"value":48},{"type":16,"tag":24,"props":238,"children":240},{"className":239,"style":61},[60],[241],{"type":22,"value":242},"\n    A look into the meticulous annotation process on the platform provided by Abaka AI, a key contributor from 2077AI.\n  ",{"type":16,"tag":72,"props":244,"children":246},{"id":245},"training-the-ultimate-image-critic",[247],{"type":22,"value":248},"Training the Ultimate Image Critic",{"type":16,"tag":24,"props":250,"children":251},{},[252],{"type":22,"value":253},"With this gold-standard dataset in hand, we trained EditReward. We used a powerful Vision-Language Model (VLM) as its backbone and taught it to think like our human experts.",{"type":16,"tag":24,"props":255,"children":256},{},[257],{"type":22,"value":258},"We employed a sophisticated training strategy called Multi-Dimensional Uncertainty-Aware Ranking. It’s a mouthful, but the concept is intuitive: we taught the model to understand that a great edit is a balance of different factors and to weigh them accordingly. It learns not just to pick a winner between two images, but to understand the nuanced trade-offs between following instructions perfectly and achieving visual perfection.",{"type":16,"tag":39,"props":260,"children":263},{"className":261,"style":262},[42,44],"width: 100%; position: relative",[264,265],{"type":22,"value":48},{"type":16,"tag":50,"props":266,"children":269},{"src":267,"alt":268,"style":54},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251031\u002FER04.webp"," ",[],{"type":16,"tag":24,"props":271,"children":272},{},[273],{"type":22,"value":274},"EditReward in action, correctly assigning high scores to successful edits and low scores to failed ones, demonstrating strong alignment with human judgment.",{"type":16,"tag":72,"props":276,"children":278},{"id":277},"the-results",[279],{"type":22,"value":280},"🚀 The Results",{"type":16,"tag":24,"props":282,"children":283},{},[284],{"type":22,"value":285},"So, does it work? The results speak for themselves.",{"type":16,"tag":24,"props":287,"children":288},{},[289],{"type":22,"value":290},"EditReward doesn't just perform well; it sets a new standard for AI evaluation.",{"type":16,"tag":84,"props":292,"children":293},{},[294,317],{"type":16,"tag":88,"props":295,"children":296},{},[297,302,304,309,311,316],{"type":16,"tag":128,"props":298,"children":299},{},[300],{"type":22,"value":301},"Outperforming Giants:",{"type":22,"value":303}," On established benchmarks like GenAI-Bench and AURORA-Bench, EditReward achieves a higher correlation with human judgment than powerful proprietary models like ",{"type":16,"tag":128,"props":305,"children":306},{},[307],{"type":22,"value":308},"GPT-5",{"type":22,"value":310}," and ",{"type":16,"tag":128,"props":312,"children":313},{},[314],{"type":22,"value":315},"GPT-4o",{"type":22,"value":146},{"type":16,"tag":88,"props":318,"children":319},{},[320,325],{"type":16,"tag":128,"props":321,"children":322},{},[323],{"type":22,"value":324},"Massive Uplift:",{"type":22,"value":326}," When we applied our training framework to a standard open-source VLM, its performance as a judge skyrocketed, improving by over 23 points on GenAI-Bench. This proves the power of our high-quality data and training methodology.",{"type":16,"tag":39,"props":328,"children":330},{"className":329,"style":45},[42,43,44],[331,332,337,338],{"type":22,"value":48},{"type":16,"tag":50,"props":333,"children":336},{"src":334,"alt":335,"style":54},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20251031\u002FER05.webp","EditReward sets a new state-of-the-art",[],{"type":22,"value":48},{"type":16,"tag":24,"props":339,"children":341},{"className":340,"style":61},[60],[342],{"type":22,"value":343},"\n    EditReward sets a new state-of-the-art, outperforming strong proprietary models like GPT-5 on public benchmarks for human preference alignment.\n  ",{"type":16,"tag":24,"props":345,"children":346},{},[347,349,354],{"type":22,"value":348},"But the most exciting result is its practical application. We used EditReward to filter a large, noisy dataset of 46,000 images down to a high-quality subset of 20,000. When we trained a leading open-source model, Step1X-Edit, on this smaller, curated dataset, its performance ",{"type":16,"tag":128,"props":350,"children":351},{},[352],{"type":22,"value":353},"significantly surpassed",{"type":22,"value":355}," training on the full, noisy dataset.",{"type":16,"tag":24,"props":357,"children":358},{},[359,364],{"type":16,"tag":128,"props":360,"children":361},{},[362],{"type":22,"value":363},"This is the key takeaway: quality over quantity.",{"type":22,"value":365}," A powerful AI judge like EditReward is the key to unlocking the next generation of generative models.",{"type":16,"tag":72,"props":367,"children":369},{"id":368},"why-this-matters-for-the-future-of-ai",[370],{"type":22,"value":371},"✨ Why This Matters for the Future of AI",{"type":16,"tag":24,"props":373,"children":374},{},[375],{"type":22,"value":376},"The development of EditReward represents more than a technical milestone — it’s a foundational step toward a fairer and more open AI ecosystem. By offering the community a reliable, human-aligned “critic”, we’re empowering researchers and developers everywhere to train models that better reflect human intent and aesthetic judgment.",{"type":16,"tag":24,"props":378,"children":379},{},[380],{"type":22,"value":381},"Built upon the data excellence and open collaboration ethos of 2077AI, EditReward embodies the spirit of collective innovation. We’re not just building another model — we’re building the infrastructure for alignment, a framework that makes open-source AI as capable, safe, and scalable as its proprietary counterparts.",{"type":16,"tag":24,"props":383,"children":384},{},[385],{"type":22,"value":386},"We’re thrilled to release EditReward, along with EditReward-Data and EditReward-Bench, to accelerate the next wave of progress in human-aligned generative AI.",{"title":388,"searchDepth":389,"depth":389,"links":390},"",2,[391,392,393,394,395],{"id":74,"depth":389,"text":77},{"id":118,"depth":389,"text":121},{"id":245,"depth":389,"text":248},{"id":277,"depth":389,"text":280},{"id":368,"depth":389,"text":371},"model",[398,399],"image","multimodal",[401],"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002Fdocs-hub\u002F2077ai\u002Forg-logo\u002Fwaterloo.png",{"homepage":403,"arxiv":404,"github":405,"huggingface":406},"https:\u002F\u002Ftiger-ai-lab.github.io\u002FEditReward\u002F","https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.26346","https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FEditReward","https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTIGER-Lab\u002Feditreward"]