[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog":3},{"title":4,"desc":5,"bannerImg":6,"date":7,"orgImgLinks":8,"bannerLinks":9,"blogCategory":10,"category":11,"weight":12,"externalUrl":11,"links":13,"description":5,"content":14,"tag1":481,"tag2":482,"resLinks":485},"Human-Aligned Reward Modeling for AI: EditReward's 200K-Pair Dataset","EditReward advances instruction-guided image editing (IGIE) with a generative reward model. See how it outperforms GPT-4o as a judge, improves dataset quality, and sets a new SOTA for human-AI alignment in editing tasks.","https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002FBanner_blog\u002Fbanner_1.png","2026-1-16","[]","{}","Research","",0,"{\"homepage\":\"\",\"github\":\"\",\"huggingface\":\"\",\"x\":\"\",\"discord\":\"\",\"arxiv\":\"\"}",{"data":15,"body":18,"toc":473},{"title":16,"description":17},"Why Your AI Still Doesn't \"Get\" Your Edits: The Power of Human-Aligned Reward Modeling","If you’ve ever used an AI image editor and asked it to \"make the dog look like it’s wearing a tiny tuxedo\" only to have the dog turn into a blurry blob or, worse, vanish entirely and be replaced by a human in a suit, you’ve experienced the Instruction-Guided Image Editing (IGIE) bottleneck.",{"type":19,"children":20},"root",[21,36,89,98,107,119,128,174,183,193,202,211,219,234,244,253,262,323,332,342,351,360,367,379,388,398,407,416,425,466],{"type":22,"tag":23,"props":24,"children":28},"element","h1",{"className":25,"id":27},[26],"heading__h1","why-your-ai-still-doesnt-get-your-edits-the-power-of-human-aligned-reward-modeling",[29],{"type":22,"tag":30,"props":31,"children":33},"span",{"style":32},"white-space: pre-wrap;",[34],{"type":35,"value":16},"text",{"type":22,"tag":37,"props":38,"children":41},"p",{"className":39},[40],"doxhub-editor-paragraph",[42,47,59,64,72,77],{"type":22,"tag":30,"props":43,"children":44},{"style":32},[45],{"type":35,"value":46},"If you’ve ever used an AI image editor and asked it to ",{"type":22,"tag":48,"props":49,"children":50},"i",{},[51],{"type":22,"tag":52,"props":53,"children":56},"em",{"className":54,"style":32},[55],"text__italic",[57],{"type":35,"value":58},"\"",{"type":22,"tag":30,"props":60,"children":61},{"style":32},[62],{"type":35,"value":63},"make the dog look like it’s wearing a tiny tuxedo",{"type":22,"tag":48,"props":65,"children":66},{},[67],{"type":22,"tag":52,"props":68,"children":70},{"className":69,"style":32},[55],[71],{"type":35,"value":58},{"type":22,"tag":30,"props":73,"children":74},{"style":32},[75],{"type":35,"value":76}," only to have the dog turn into a blurry blob or, worse, vanish entirely and be replaced by a human in a suit, you’ve experienced the ",{"type":22,"tag":78,"props":79,"children":80},"b",{},[81],{"type":22,"tag":82,"props":83,"children":86},"strong",{"className":84,"style":32},[85],"text__bold",[87],{"type":35,"value":88},"Instruction-Guided Image Editing (IGIE) bottleneck.",{"type":22,"tag":37,"props":90,"children":92},{"className":91},[40],[93],{"type":22,"tag":30,"props":94,"children":95},{"style":32},[96],{"type":35,"value":97},"While closed-source giants like GPT-Image-1 and Google-Nano-Banana have shown flashes of brilliance, the open-source community has struggled to keep up. The reason? It’s not just about the model size; it’s about alignment. AI models don’t inherently know what a \"good edit\" looks like. They need a teacher.",{"type":22,"tag":37,"props":99,"children":101},{"className":100},[40],[102],{"type":22,"tag":30,"props":103,"children":104},{"style":32},[105],{"type":35,"value":106},"Today, we’re diving into a breakthrough from 2077AI research team: EditReward. This new generative reward model is designed to solve the \"alignment gap\" using a massive, expert-annotated dataset of over 200,000 preference pairs.",{"type":22,"tag":108,"props":109,"children":113},"h2",{"className":110,"id":112},[111],"heading__h2","the-bottleneck-why-editing-is-harder-than-generation",[114],{"type":22,"tag":30,"props":115,"children":116},{"style":32},[117],{"type":35,"value":118},"The Bottleneck: Why Editing is Harder Than Generation",{"type":22,"tag":37,"props":120,"children":122},{"className":121},[40],[123],{"type":22,"tag":30,"props":124,"children":125},{"style":32},[126],{"type":35,"value":127},"Generating an image from scratch is relatively \"easy\" for AI because it has a high degree of freedom. However, Instruction-Guided Image Editing is a balancing act. The model must satisfy two competing forces:",{"type":22,"tag":129,"props":130,"children":133},"ol",{"className":131},[132],"doxhub-editor-ol",[134,155],{"type":22,"tag":135,"props":136,"children":140},"li",{"value":137,"className":138},"1",[139],"doxhub-editor-list-item",[141,150],{"type":22,"tag":78,"props":142,"children":143},{},[144],{"type":22,"tag":82,"props":145,"children":147},{"className":146,"style":32},[85],[148],{"type":35,"value":149},"Instruction Following (IF):",{"type":22,"tag":30,"props":151,"children":152},{"style":32},[153],{"type":35,"value":154}," Did you actually do what I asked?",{"type":22,"tag":135,"props":156,"children":159},{"value":157,"className":158},"2",[139],[160,169],{"type":22,"tag":78,"props":161,"children":162},{},[163],{"type":22,"tag":82,"props":164,"children":166},{"className":165,"style":32},[85],[167],{"type":35,"value":168},"Image Quality & Plausibility (IQ):",{"type":22,"tag":30,"props":170,"children":171},{"style":32},[172],{"type":35,"value":173}," Does the result still look like a real photo, or did you introduce artifacts and \"uncanny valley\" distortions?",{"type":22,"tag":37,"props":175,"children":177},{"className":176},[40],[178],{"type":22,"tag":30,"props":179,"children":180},{"style":32},[181],{"type":35,"value":182},"Existing Vision-Language Models (VLMs) like GPT-4o or Claude 3.5 Sonnet are often used as \"judges\" to score these edits, but their correlation with human preference is limited. They might reward a model for adding the tuxedo but ignore the fact that the dog’s legs are now coming out of its ears.",{"type":22,"tag":108,"props":184,"children":187},{"className":185,"id":186},[111],"enter-editreward-200k-expert-lessons",[188],{"type":22,"tag":30,"props":189,"children":190},{"style":32},[191],{"type":35,"value":192},"Enter EditReward: 200K Expert Lessons",{"type":22,"tag":37,"props":194,"children":196},{"className":195},[40],[197],{"type":22,"tag":30,"props":198,"children":199},{"style":32},[200],{"type":35,"value":201},"The core contribution of the EditReward paper is the recognition that reward modeling—the process of teaching AI to prefer one outcome over another—requires high-density, high-quality human feedback.",{"type":22,"tag":37,"props":203,"children":205},{"className":204},[40],[206],{"type":22,"tag":30,"props":207,"children":208},{"style":32},[209],{"type":35,"value":210},"2077AI research team didn't just scrape the web. We built EDITREWARD-DATA, a massive dataset containing 200K preference pairs. Each pair was meticulously annotated by trained experts following a rigorous protocol. These experts compared two edited results for the same instruction and selected the one that better followed the user’s intent, while also preserving physical consistency — including lighting, shadows, reflections, perspective, and object interactions within the scene.",{"type":22,"tag":37,"props":212,"children":214},{"className":213},[40],[215],{"type":22,"tag":216,"props":217,"children":218},"br",{},[],{"type":22,"tag":220,"props":221,"children":222},"figure",{},[223,229],{"type":22,"tag":224,"props":225,"children":228},"img",{"src":226,"alt":227},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20260115\u002FAn%20overview%20of%20the%20framework%2C%20illustrating%20the%20construction%20of%20the%20EditReward-Data%20and%20the%20subsequent%20training%20of%20EditReward..webp","An overview of the framework, illustrating the construction of the EditReward-Data and the subsequent training of EditReward. ",[],{"type":22,"tag":230,"props":231,"children":232},"figcaption",{},[233],{"type":35,"value":227},{"type":22,"tag":108,"props":235,"children":238},{"className":236,"id":237},[111],"how-it-works-the-architecture-of-a-judge",[239],{"type":22,"tag":30,"props":240,"children":241},{"style":32},[242],{"type":35,"value":243},"How It Works: The Architecture of a Judge",{"type":22,"tag":37,"props":245,"children":247},{"className":246},[40],[248],{"type":22,"tag":30,"props":249,"children":250},{"style":32},[251],{"type":35,"value":252},"EditReward isn't just a simple classifier. By leveraging a Vision-Language Model (VLM) framework, EditReward can \"see\" the original image, \"read\" the instruction, and \"evaluate\" the edited result.",{"type":22,"tag":37,"props":254,"children":256},{"className":255},[40],[257],{"type":22,"tag":30,"props":258,"children":259},{"style":32},[260],{"type":35,"value":261},"The model is trained to focus on several sub-dimensions:",{"type":22,"tag":263,"props":264,"children":267},"ul",{"className":265},[266],"doxhub-editor-ul",[268,286,304],{"type":22,"tag":135,"props":269,"children":271},{"value":137,"className":270},[139],[272,281],{"type":22,"tag":78,"props":273,"children":274},{},[275],{"type":22,"tag":82,"props":276,"children":278},{"className":277,"style":32},[85],[279],{"type":35,"value":280},"Plausibility:",{"type":22,"tag":30,"props":282,"children":283},{"style":32},[284],{"type":35,"value":285}," Does the edit align with natural laws? Evaluate whether lighting, shadows, reflections, perspective, and object interactions are physically coherent.",{"type":22,"tag":135,"props":287,"children":289},{"value":157,"className":288},[139],[290,299],{"type":22,"tag":78,"props":291,"children":292},{},[293],{"type":22,"tag":82,"props":294,"children":296},{"className":295,"style":32},[85],[297],{"type":35,"value":298},"Artifact-Free Quality:",{"type":22,"tag":30,"props":300,"children":301},{"style":32},[302],{"type":35,"value":303}," Are there pixel misalignments or unnatural textures? Check for blur, distortions, pixel misalignment, unnatural textures, visible seams around edited regions, or other visual artifacts that break the illusion of realism.",{"type":22,"tag":135,"props":305,"children":308},{"value":306,"className":307},"3",[139],[309,318],{"type":22,"tag":78,"props":310,"children":311},{},[312],{"type":22,"tag":82,"props":313,"children":315},{"className":314,"style":32},[85],[316],{"type":35,"value":317},"Aesthetic Harmony:",{"type":22,"tag":30,"props":319,"children":320},{"style":32},[321],{"type":35,"value":322}," Does the edit enhance the image or degrade it? Consider color balance, composition, atmosphere, and visual harmony. The modification should enhance the image while preserving realism, rather than making it look artificial or awkward.",{"type":22,"tag":37,"props":324,"children":326},{"className":325},[40],[327],{"type":22,"tag":30,"props":328,"children":329},{"style":32},[330],{"type":35,"value":331},"Instead of providing a generic \"good\" or \"bad\" score, EditReward mimics human experts by evaluating the edit on a 1–4 scale, where an 4 represents perfectly accurate, complete, and exclusive execution of the instruction..",{"type":22,"tag":108,"props":333,"children":336},{"className":334,"id":335},[111],"state-of-the-art-performance",[337],{"type":22,"tag":30,"props":338,"children":339},{"style":32},[340],{"type":35,"value":341},"State-of-the-Art Performance",{"type":22,"tag":37,"props":343,"children":345},{"className":344},[40],[346],{"type":22,"tag":30,"props":347,"children":348},{"style":32},[349],{"type":35,"value":350},"The results speak for themselves. In the research, EditReward was tested against established benchmarks like GenAI-Bench, AURORA-Bench, and ImagenHub.",{"type":22,"tag":37,"props":352,"children":354},{"className":353},[40],[355],{"type":22,"tag":30,"props":356,"children":357},{"style":32},[358],{"type":35,"value":359},"The findings? EditReward achieved state-of-the-art or highly competitive performance on multiple benchmarks, outperforming even the most advanced VLM-as-judge models. Specifically, EditReward demonstrated higher accuracy in predicting human preferences compared to GPT-4o, Gemini-2.0-Flash and other open-source models on selected benchmarks.",{"type":22,"tag":37,"props":361,"children":363},{"className":362},[40],[364],{"type":22,"tag":216,"props":365,"children":366},{},[],{"type":22,"tag":220,"props":368,"children":369},{},[370,375],{"type":22,"tag":224,"props":371,"children":374},{"src":372,"alt":373},"https:\u002F\u002Fdoxhub.s3.us-east-1.amazonaws.com\u002F2077ai\u002F20260115\u002FComprehensive%20results%20on%20public%20benchmarks%20and%20EDITREWARD-BENCH.webp","Comprehensive results on public benchmarks and EDITREWARD-BENCH",[],{"type":22,"tag":230,"props":376,"children":377},{},[378],{"type":35,"value":373},{"type":22,"tag":37,"props":380,"children":382},{"className":381},[40],[383],{"type":22,"tag":30,"props":384,"children":385},{"style":32},[386],{"type":35,"value":387},"Perhaps most importantly, 2077AI research team used EditReward to \"clean up\" existing noisy datasets. By filtering out low-quality synthetic data and keeping only the high-reward examples, they demonstrated that we can train better editing models with less (but higher quality) data.",{"type":22,"tag":108,"props":389,"children":392},{"className":390,"id":391},[111],"why-this-matters-for-the-future-of-ai",[393],{"type":22,"tag":30,"props":394,"children":395},{"style":32},[396],{"type":35,"value":397},"Why This Matters for the Future of AI",{"type":22,"tag":37,"props":399,"children":401},{"className":400},[40],[402],{"type":22,"tag":30,"props":403,"children":404},{"style":32},[405],{"type":35,"value":406},"The significance of EditReward goes beyond just making better filters for your photos. It represents a shift toward generative reward modeling as a cornerstone of AI development.",{"type":22,"tag":37,"props":408,"children":410},{"className":409},[40],[411],{"type":22,"tag":30,"props":412,"children":413},{"style":32},[414],{"type":35,"value":415},"For the open-source community to catch up with proprietary models, we don't just need more GPUs; we need better \"objective functions.\" We need models that understand nuance—that know the difference between a \"creative change\" and a \"hallucinated error.\"",{"type":22,"tag":37,"props":417,"children":419},{"className":418},[40],[420],{"type":22,"tag":30,"props":421,"children":422},{"style":32},[423],{"type":35,"value":424},"EditReward provides the blueprint for that understanding. By aligning AI rewards with human expertise at scale, we are finally teaching models not just to draw, but to listen.",{"type":22,"tag":37,"props":426,"children":428},{"className":427},[40],[429,434,457],{"type":22,"tag":30,"props":430,"children":431},{"style":32},[432],{"type":35,"value":433},"If you want to learn more about EDITREWARD, read the previous blog of ",{"type":22,"tag":435,"props":436,"children":442},"a",{"href":437,"rel":438,"className":440},"https:\u002F\u002Fwww.2077ai.com\u002Fblog\u002Fintroducing-editreward-human-aligned-ai-for-image-editing?utm_source=officialwebsite&utm_medium=blog&utm_campaign=editreward2",[439],"noreferrer",[441],"text__link",[443],{"type":22,"tag":444,"props":445,"children":446},"u",{},[447],{"type":22,"tag":78,"props":448,"children":449},{},[450],{"type":22,"tag":82,"props":451,"children":454},{"className":452,"style":32},[85,453],"text__underline",[455],{"type":35,"value":456},"the general instruction to EDITREWARD",{"type":22,"tag":78,"props":458,"children":459},{},[460],{"type":22,"tag":82,"props":461,"children":463},{"className":462,"style":32},[85],[464],{"type":35,"value":465},".",{"type":22,"tag":37,"props":467,"children":469},{"className":468},[40],[470],{"type":22,"tag":216,"props":471,"children":472},{},[],{"title":11,"searchDepth":474,"depth":474,"links":475},2,[476,477,478,479,480],{"id":112,"depth":474,"text":118},{"id":186,"depth":474,"text":192},{"id":237,"depth":474,"text":243},{"id":335,"depth":474,"text":341},{"id":391,"depth":474,"text":397},"dataset",[483,484],"image","multimodal",{"homepage":486,"arxiv":487,"github":488,"huggingface":489},"https:\u002F\u002Ftiger-ai-lab.github.io\u002FEditReward\u002F","https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.26346","https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002FEditReward","https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FTIGER-Lab\u002Feditreward"]