\\n\\nIn the rapid tide of artificial intelligence development, accurately assessing the capabilities of AI models has become a key issue in industrial progress. On the journey to explore the boundaries of AI capabilities, researchers have come to deeply recognize the limitations of existing evaluation systems. This has spurred scholars to collaborate with top industry research institutions to jointly break through the established paradigms of AI evaluation. Against this backdrop, the milestone project SuperGPQA has emerged.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250309-2077AI/ScalingLLM.png\\\" alt=\\\"SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines\\n \u003C/p>\\n\u003C/div>\\n\\n## 1. Insights on the Frontier: The Real Challenges of AI Evaluation\\n\\nWith large language models like GPT-4 and Claude demonstrating capabilities that approach or even surpass human levels in mainstream academic fields, accurately assessing the true proficiency of AI in a broader range of specialized areas has become an urgent challenge. Existing evaluation benchmarks, such as MMLU and GPQA, suffer from severe imbalances in subject coverage—long-tail disciplines like light industry, agriculture, and service science have a coverage rate of less than 5%, and the discriminative power of evaluation questions is gradually diminishing.\\n\\nIn response to this real challenge, 2077AI, in collaboration with top research institutions, spent six months developing the SuperGPQA project. It has, for the first time, achieved a benchmark for AI evaluation covering 285 graduate-level subjects.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250309-2077AI/BenchmarkComparison.png\\\" alt=\\\"Comparison of benchmarks for different models\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Comparison of benchmarks for different models\\n \u003C/p>\\n\u003C/div>\\n\\n## 2. Breakthrough Innovation: Restructuring the Evaluation Paradigm\\n\\nSuperGPQA has achieved breakthrough innovations in both scale and depth. The project has constructed a vast knowledge system comprising 26,529 specialized questions, far exceeding the 448 questions in GPQA and the 12,032 questions in MMLU-Pro. In terms of subject coverage, SuperGPQA spans 13 major categories, 72 first-level disciplines, and 285 second-level disciplines, achieving a comprehensive mapping of the human knowledge system. Each question is equipped with an average of 9.67 options, significantly higher than the traditional four-option format, which greatly increases the challenge of the evaluation. Notably, 42.33% of the selected questions require mathematical calculations or formal reasoning. This design ensures the evaluative distinction and depth of the assessment.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250309-2077AI/SuperGPQA.png\\\" alt=\\\"The data collection process of SuperGPQA\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The data collection process of SuperGPQA\\n \u003C/p>\\n\u003C/div>\\n\\n### 2.1 Technical Highlights: Interdisciplinary Semantic Analysis\\n\\nThrough t-SNE visualization analysis, the research team discovered that SuperGPQA exhibits a unique interdisciplinary clustering pattern in the semantic space. Questions from engineering and science demonstrate high semantic similarity, while those from the humanities maintain their distinct knowledge centers. The clustering of different disciplines achieves a complete mapping of the diverse human knowledge system. This distribution characteristic also validates the scientific nature and comprehensiveness of the evaluation dataset.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250309-2077AI/Visualization.png\\\" alt=\\\"For the visualization of subject problem sampling\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n For the visualization of subject problem sampling\\n \u003C/p>\\n\u003C/div>\\n\\n\\n### 2.2 Methodological Innovation: Three-Stage Quality Control\\n\\nTo ensure the reliability of the evaluation, the project team has designed a rigorous three-stage quality control mechanism.\\n\\nIn the source screening phase, the SuperGPQA team abandoned traditional crowdsourcing methods and instead selected original questions from textbooks and authoritative materials by a team of experts.\\n\\nIn the standardization transcription phase, a professional team normalized the academic language and unified the format of all questions, ensuring that the average length of each question was maintained at 58.42 characters and guaranteeing the consistency and comparability of the options.\\n\\nIn the quality inspection phase, the research team integrated automated rule checks, cross-validation by multiple models, and in-depth expert reviews to build a robust quality assurance system.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250309-2077AI/RewritingProcess.png\\\" alt=\\\"The complex rewriting process of correct and incorrect judgment questions in SuperGPQA\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The complex rewriting process of correct and incorrect judgment questions in SuperGPQA\\n \u003C/p>\\n\u003C/div>\\n\\n## 3. Key Findings: Revealing the Boundaries of AI Capabilities\\n\\nThrough systematic evaluation of 51 mainstream models, the research team has made a series of important discoveries.\\n\\nUnder the evaluation criteria of SuperGPQA, even the best-performing DeepSeek-R1 model only achieved an accuracy rate of 61.82% in answering interdisciplinary questions. This result clearly reveals the significant gap that exists between current AI and general artificial intelligence. Experimental data indicates that instruction fine-tuning has a significant positive impact on model performance. For example, the accuracy rate of the instruction fine-tuned version of DeepSeek-V3 (47.40%) is far higher than that of its base version (32.14%).\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250309-2077AI/SuperGPQAStandard.png\\\" alt=\\\"The performance of different models under the SuperGPQA standard\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The performance of different models under the SuperGPQA standard\\n \u003C/p>\\n\u003C/div>\\n\\nDuring the in-depth analysis, the research team observed a clear correlation between model scale and performance equilibrium. DeepSeek-R1 demonstrated stable performance across questions of varying difficulty levels, with an accuracy rate of 63.59% on easy questions, 63.63% on medium-difficulty questions, and 56.87% on difficult ones. Additionally, the performance improvement brought about by model version iteration was also significant. For example, the accuracy rate of the GPT-4o series steadily increased from 39.76% to 44.40% with each version update.\\n\\n## 4. Future Outlook\\n\\nThe open-source release of SuperGPQA not only fills an important gap in the field of AI evaluation but also pioneers a new research paradigm. This breakthrough provides the academic and industrial communities with a reliable \\\"compass,\\\" guiding the direction of AI technology development. As a core participant in the SuperGPQA project, 2077AI, together with the project team, has jointly planned the future development direction of the evaluation system. The research team will continue to expand the dimensions of evaluation, introduce more refined assessment criteria in specialized fields, develop dynamic difficulty adjustment mechanisms, and build cross-lingual evaluation capabilities. On the methodological front, the project team is committed to optimizing human-machine collaborative evaluation mechanisms, developing adaptive question generation technologies, and establishing a more detailed capability classification system. Meanwhile, the SuperGPQA team will also vigorously promote the open-source sharing of evaluation standards, establish a global collaborative research network, and foster in-depth integration of industry, academia, and research.\\n\",\"headings\":[{\"level\":2,\"text\":\"Introduction\",\"children\":[]},{\"level\":2,\"text\":\"1. Insights on the Frontier: The Real Challenges of AI Evaluation\",\"children\":[]},{\"level\":2,\"text\":\"2. Breakthrough Innovation: Restructuring the Evaluation Paradigm\",\"children\":[{\"level\":3,\"text\":\"2.1 Technical Highlights: Interdisciplinary Semantic Analysis\",\"children\":[]},{\"level\":3,\"text\":\"2.2 Methodological Innovation: Three-Stage Quality Control\",\"children\":[]}]},{\"level\":2,\"text\":\"3. Key Findings: Revealing the Boundaries of AI Capabilities\",\"children\":[]},{\"level\":2,\"text\":\"4. Future Outlook\",\"children\":[]}]},{\"slug\":\"KOR-Bench\",\"link\":\"/blog/KOR-Bench\",\"title\":\"Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models\",\"desc\":\"The knowledge orthogonality of the KOR-Bench dataset ensures that evaluation tasks are independent of pre-trained knowledge, requiring models to rely on their understanding of new rules and pure reasoning capabilities to solve problems.\",\"bannerImg\":\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250507/bannerImg.png\",\"date\":\"2025-05-07\",\"content\":\"\\n\\n# Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models\\n\\n> * Homepage:\u003Chttps://kor-bench.github.io/>\\n>\\n> * GitHub:\u003Chttps://github.com/multimodal-art-projection/KOR-BENCH>\\n>\\n> * Paper:\u003Chttps://arxiv.org/abs/2410.06526>\\n\\n## 1. Breaking Cognitive Boundaries: Innovative Framework for Evaluating Knowledge-Orthogonal Models' Reasoning Abilities\\n\\nIn the field of AI evaluation, models' reasoning abilities have long been obscured by the \\\"noise\\\" of pre-trained knowledge. Existing evaluation benchmarks often fail to distinguish whether models truly possess reasoning capabilities or are merely repeating patterns from training data, relying on the accumulation of traditional prior knowledge.\\n\\nIn March 2025, led by the M-A-P research team and jointly open-sourced with organizations such as 2077AI, KOR-Bench introduced the innovative concept of \\\"knowledge orthogonality,\\\" completely changing the status quo of model evaluation benchmarks that struggle to assess models' reasoning abilities. The knowledge orthogonality of the KOR-Bench dataset ensures that evaluation tasks are independent of pre-trained knowledge, requiring models to rely on their understanding of new rules and pure reasoning capabilities to solve problems.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250507/KOR-Bench-Overview.png\\\" alt=\\\"Overview of KOR-Bench\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Overview of KOR-Bench\\n \u003C/p>\\n\u003C/div>\\n\\nThrough its meticulously designed rule system, KOR-Bench not only establishes a testing environment for accurately evaluating models' intrinsic reasoning abilities, but also pioneers a new paradigm for assessing artificial intelligence capabilities.\\n\\n## 2. Deep Deconstruction and Reconstruction: A Precise Evaluation Framework Spanning Five Dimensions\\n\\nKOR-Bench constructs a comprehensive evaluation system covering five core dimensions, each meticulously designed to test different aspects of reasoning abilities:\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250507/KOR-Bench-Evaluation-Dimensions.png\\\" alt=\\\"The five core evaluation dimensions of KOR-Bench\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The five core evaluation dimensions of KOR-Bench\\n \u003C/p>\\n\u003C/div>\\n\\n1. Operation: By redefining mathematical symbols and rules, the model's abstract computational ability is tested. For example, a new operator \\\"※\\\" is designed such that when a is a multiple of b, a※b=ba+2, and there are different calculation rules otherwise.\\n\\n2. Logic: Innovative logical symbol systems and reasoning rules are introduced to examine the model's formal logic reasoning ability. This includes multiple levels such as complex propositional logic, predicate logic, and modal logic.\\n\\n3. Cipher: Entirely new encryption and decryption rules are designed to test the model's ability to apply rules and transform information. This covers a range from simple substitution to complex multi-step encryption algorithms.\\n\\n4. Puzzle: Complex problems requiring multi-step reasoning are constructed to assess the model's problem-solving and strategy planning abilities. This includes variants of Sudoku, mazes, and combinatorial optimization problems.\\n\\n5. Counterfactual: Virtual scenarios and rules are created to test the model's reasoning ability in hypothetical situations, with a particular focus on whether the model can break free from the constraints of real-world knowledge.\\n\\n## 3. Innovative Evaluation Methods and In-Depth Performance Analysis\\n\\nKOR-Bench ensures the independence of evaluation tasks from pre-trained knowledge through rigorous mathematical definitions and experimental verification. In the evaluation framework, the research team introduced the Knowledge Impact Factor (β) to quantify the degree of knowledge interference. The purity of the evaluation is ensured through rule-knowledge decoupling measurement and rule centrality verification. This innovative evaluation method not only focuses on the accuracy of task completion, but also deeply analyzes the rationality of the reasoning process, the depth of rule understanding, and the innovativeness of the solution strategy. Through multi-level performance analysis, KOR-Bench can comprehensively evaluate the model's rule learning efficiency, reasoning chain integrity, and result reliability.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250507/KOR-Bench-Data-Construction.png\\\" alt=\\\"The data construction process of KOR-Bench\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The data construction process of KOR-Bench\\n \u003C/p>\\n\u003C/div>\\n\\nIn practical evaluation, the current optimal models \\\\*o1-Preview and o1-Mini achieved accuracy rates of 72.88% and 70.16%, respectively. Meanwhile, the performance of Claude-3.5-Sonnet (58.96%) and GPT-4o (58.00%) revealed the limitations of existing technologies. Particularly in high-difficulty tasks such as encryption and puzzle reasoning, even top-tier models showed significant capability bottlenecks. ( \\\\*As of the latest models at the time of the paper's release on October 9, 2024) These results not only quantify the boundaries of current AI systems' reasoning capabilities but also point the way for future improvements.\\n\\nKOR-Bench provides a unified standard for reasoning capability evaluation and a reproducible evaluation process, offering a reliable basis for performance comparison among models. In terms of technological development, KOR-Bench helps researchers accurately identify model capability weaknesses, guiding algorithm optimization and effectively promoting the improvement of pure reasoning capabilities. Meanwhile, its potential applications are gradually emerging in model selection decisions, educational training evaluations, and academic research innovations.\\n\\nLooking to the future, KOR-Bench will continue to evolve. By expanding the scale and diversity of the dataset, introducing parametric rule generation, and deepening the evaluation of reasoning layers, it will continuously enhance its evaluation capabilities. With the development of multimodal evaluation capabilities, KOR-Bench will play an assessment value in a wider range of fields.\\n\\nAs a participant in this pioneering project, 2077AI played a significant role in the construction and validation of the evaluation framework. Our technical team was deeply involved in the formulation and optimization of evaluation standards, making important contributions, especially in verifying model performance and analyzing results. By open-sourcing and sharing this innovative achievement, 2077AI looks forward to working with the entire AI community to further advance reasoning capability evaluation and contribute to building more powerful artificial intelligence systems.\\n\\n\",\"headings\":[{\"level\":2,\"text\":\"1. Breaking Cognitive Boundaries: Innovative Framework for Evaluating Knowledge-Orthogonal Models' Reasoning Abilities\",\"children\":[]},{\"level\":2,\"text\":\"2. Deep Deconstruction and Reconstruction: A Precise Evaluation Framework Spanning Five Dimensions\",\"children\":[]},{\"level\":2,\"text\":\"3. Innovative Evaluation Methods and In-Depth Performance Analysis\",\"children\":[]}]},{\"slug\":\"OmniDocBench\",\"link\":\"/blog/OmniDocBench\",\"title\":\"A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench\",\"desc\":\"This innovative evaluation framework not only provides a reliable standard for the development of document parsing technologies but also pioneers a new paradigm for document intelligence evaluation.\",\"bannerImg\":\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250410/BannerImg.png\",\"date\":\"2025-04-10\",\"content\":\"\\n# A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench\\n\\n* Project Homepage: \u003Chttps://github.com/opendatalab/OmniDocBench>\\n\\n* Hugging Face: \u003Chttps://huggingface.co/datasets/opendatalab/OmniDocBench>\\n\\n* Paper: \u003Chttps://arxiv.org/abs/2412.07626>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250410/TextAccuracy.png\\\" alt=\\\"The performance of different models in handling various text types on OmniDocBench.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The performance of different models in handling various text types on OmniDocBench.\\n \u003C/p>\\n\u003C/div>\\n\\n\\nDocument content extraction, as one of the fundamental tasks in computer vision, plays a crucial role in acquiring training data for large language models (LLMs) and in retrieval-augmented generation technologies. However, we observed that while LLMs heavily rely on data from academic papers and journals, high-quality but complexly formatted documents such as newspapers and magazines remain underutilized. To address this gap, Shanghai AI Laboratory, in collaboration with 2077AI and other institutions, has open-sourced the OmniDocBench project. By constructing a multi-source document parsing evaluation benchmark that covers nine types of documents (academic papers, textbooks, exam papers, magazines, books, notes, financial reports, newspapers, and slides), this project successfully addresses the limitations of existing evaluation systems in terms of document type diversity and assessment dimension completeness. This innovative evaluation framework not only provides a reliable standard for the development of document parsing technologies but also pioneers a new paradigm for document intelligence evaluation.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250410/Comparison.png\\\" alt=\\\"the Comparison of Related Work\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n the Comparison of Related Work\\n \u003C/p>\\n\u003C/div>\\n\\n\\n## Deep Deconstruction and Reconstruction: The Meticulous Design of an All-Dimensional Evaluation System\\n\\nOmniDocBench pioneers a new paradigm for document parsing evaluation through a systematic data construction process. \\n\\nDuring the data acquisition phase, the project team started with 200,000 initial PDF documents. Using ResNet-50 and Faiss clustering sampling, they extracted 6,000 pages to ensure a reasonable and diverse distribution, allowing the most \\\"representative\\\" data to form the evaluation dataset. These pages were annotated by professional annotators and underwent rigorous screening and balancing, resulting in a high-quality evaluation dataset of 981 pages. This dataset covers nine types of documents, ranging from academic papers to exam papers.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250410/OmniDocBenchOverview.png\\\" alt=\\\"An Overview of the Document Types and Annotation Information in OmniDocBench\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n An Overview of the Document Types and Annotation Information in OmniDocBench\\n \u003C/p>\\n\u003C/div>\\n\\nIn the design of the annotation system, OmniDocBench has constructed an unprecedented multi-level annotation framework. At the layout detection level, the project has not only completed boundary box annotations for 19 types of regions but also innovatively introduced annotations for layout attributes, reading order, and hierarchical relationships. This three-dimensional annotation system enables the dataset to comprehensively evaluate model performance across different scenarios. Particularly in content recognition annotations, differential annotation strategies are applied to different types of content areas: plain text content is annotated directly with text, formula content is annotated in LaTeX format, and tables are provided with annotations in both HTML and LaTeX formats to ensure the comprehensiveness and accuracy of the evaluation.\\n\\nTo ensure annotation quality, OmniDocBench has implemented a rigorous three-tier quality control mechanism. Initially, advanced AI models are used for intelligent pre-annotation, including LayoutLMv3 fine-tuned for layout detection, and PaddleOCR, UniMERNet, and GPT-4o for recognizing text, formulas, and tables, respectively. Subsequently, a professional annotation team conducts a comprehensive review of the pre-annotation results, refining each detection box and supplementing annotations for reading order and hierarchical relationships. In the final expert quality inspection phase, the project innovatively employs CDM rendering technology to identify non-renderable elements, and three domain experts conduct the final review to ensure the utmost reliability of the dataset.\\n\\n## Innovative Evaluation Methods and In-depth Performance Analysis\\n\\nInnovativeness and systematicness are the prominent strengths of OmniDocBench's evaluation framework.\\n\\nIn the extraction module, the project has designed a complete processing workflow. Starting from the preprocessing stage, it focuses on the standardization of details, including basic tasks such as removing images and normalizing markdown tags. In the extraction of special components, a carefully designed extraction sequence is adopted to ensure that different types of content can be accurately identified and extracted. Particularly in the processing of inline formulas, an innovative solution of converting to a unified Unicode format is used to address the issue of inconsistent output formats from different models.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250410/PerformanceEvaluation.png\\\" alt=\\\" Performance Evaluation of Various Document Parsing and Recognition Methods Based on OmniDocBench\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Performance Evaluation of Various Document Parsing and Recognition Methods Based on OmniDocBench\\n \u003C/p>\\n\u003C/div>\\n\\n\\nIn practical evaluations, OmniDocBench has demonstrated excellent model differentiation capabilities. In specific tasks, different models have shown their respective strengths: DocLayout-YOLO excels in layout detection across diverse documents, RapidTable stands out in the language adaptability of table recognition, and PaddleOCR maintains a leading position in traditional OCR tasks. Particularly in the highly challenging task of formula recognition, the outstanding performance of GPT-4o, Mathpix, and UniMERNet in the CDM metric showcases breakthrough progress in this particular field.\\n\\nAs an important member of the open-source community, 2077AI has been deeply involved in the development process of the OmniDocBench project, making significant contributions in multiple aspects such as dataset construction, evaluation framework design, and result validation. Looking ahead, OmniDocBench will continue to expand its evaluation dimensions and application scenarios. By introducing innovative methods such as parametric rule generation and deepening reasoning-level assessment, the completeness of the evaluation system will be further enhanced. Meanwhile, with the development of multimodal evaluation capabilities, OmniDocBench is expected to play a more extensive role in a wider range of fields. 2077AI will continue to work hand in hand with the open-source community to advance document intelligence technology and contribute to building more powerful artificial intelligence systems.\\n\\n\",\"headings\":[{\"level\":2,\"text\":\"Deep Deconstruction and Reconstruction: The Meticulous Design of an All-Dimensional Evaluation System\",\"children\":[]},{\"level\":2,\"text\":\"Innovative Evaluation Methods and In-depth Performance Analysis\",\"children\":[]}]},{\"slug\":\"matrix-dataset\",\"link\":\"/blog/matrix-dataset\",\"title\":\"Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus\",\"desc\":\"As pioneers in AI data standardization and advancement, we are committed to unlocking AI potential through high-quality data, accelerating AI development, and nurturing an efficient, thriving AI data ecosystem. The Matrix Dataset is a crucial component of this vision.\",\"bannerImg\":\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/BannerImg.png\",\"date\":\"2025-01-01\",\"content\":\"\\n\\n# Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus\\n\\n2077AI Foundation proudly announces our open-source project — the Matrix Dataset. As pioneers in AI data standardization and advancement, we are committed to unlocking AI's potential through high-quality data, accelerating AI development, and nurturing an efficient, thriving AI data ecosystem. The Matrix Dataset is a crucial component of this vision.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/Matrix-Pile-Data-Statistics.png\\\" alt=\\\"Statistics of the Matrix Pile Data Distribution: The inner pie chart represents the languagedistribution, while the outer loop indicates the proportion of meta-categories in the corpus.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Statistics of the Matrix Pile Data Distribution: The inner pie chart represents the languagedistribution, while the outer loop indicates the proportion of meta-categories in the corpus.\\n \u003C/p>\\n\u003C/div>\\n\\n## **What is the Matrix Dataset?**\\n\\nMatrix is a high-quality bilingual (Chinese-English) dataset containing 4690 billion tokens. It stands as the only large-scale bilingual pre-training dataset in the open-source community that can be used directly without additional calibration or validation. Its uniqueness lies not only in its scale but also in its meticulously designed data processing pipeline and diverse data sources, ensuring both high quality and broad applicability.\\n\\n## **Diversity of Data Sources**\\n\\nWe built this corpus from the ground up, encompassing a wide range of topics, primarily including:\\n\\n1. Integration of existing open-source pre-training data\\n\\n2. Additional Chinese, mathematics, and science exam data, as well as Wikipedia data collected from Common Crawl (CC)\\n\\n3. PDF documents converted to text via OCR technology and incorporated into the dataset\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/Composition-Sources.png\\\" alt=\\\"The composition sources of re-processed English web subset. The proportion denotes dividing the size of the current dataset by the total size of the whole dataset.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The composition sources of re-processed English web subset. The proportion denotes dividing the size of the current dataset by the total size of the whole dataset.\\n \u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/Composition-Sources-CN.png\\\" alt=\\\"The composition sources of the Chinese web subset.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The composition sources of the Chinese web subset.\\n \u003C/p>\\n\u003C/div>\\n\\n## **Meticulous Data Processing**\\n\\nThe excellence of the Matrix Dataset stems from our carefully designed data processing pipeline:\\n\\n1. High-standard cleaning and filtering thresholds\\n\\n2. Strict deduplication strategies, including substring-level deduplication\\n\\n3. Post-calibration processing for OCR data\\n\\nThrough this rigorous processing workflow, we retained only a small portion of the original data: 4% of existing corpora and 19% of crawled corpora. Each data cleaning rule was repeatedly sampled and confirmed by team members, ensuring extremely high-quality standards for both Chinese and English data.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/Retention-Rates.png\\\" alt=\\\"left: Re-processing retention rates; right: Processing retention rates, Funnel Diagram for the two main data pipelines. The darker part of each row represents the retention proportion for each processing step and the lighter one for the filtered corpora.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n left: Re-processing retention rates; right: Processing retention rates, Funnel Diagram for the two main data pipelines. The darker part of each row represents the retention proportion for each processing step and the lighter one for the filtered corpora.\\n \u003C/p>\\n\u003C/div>\\n\\n## **Optimized Data Composition**\\n\\nTo maximize pre-training effectiveness, we adopted a heuristic data composition strategy:\\n\\n1. Pre-training is divided into two stages, with CC data included in the first stage and removed in the second\\n\\n2. Manual increase in the proportion of code, books, and document-type data\\n\\n3. Integration of deepseek-math and OCR pipelines, significantly enhancing training performance\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/Heuristic-Rules-EN.png\\\" alt=\\\"Details of Heuristic Rules for English Texts\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Details of Heuristic Rules for English Texts\\n \u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/Heuristic-Rules-CN.png\\\" alt=\\\"Details of Heuristic Rules for Chinese Texts\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Details of Heuristic Rules for Chinese Texts\\n \u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250509/Conversion-Framework.png\\\" alt=\\\"The document conversion framework is composed of various sub-models for different parts.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The document conversion framework is composed of various sub-models for different parts.\\n \u003C/p>\\n\u003C/div>\\n\\n## **Commitment to Open Source**\\n\\nAdhering to the principles of openness and sharing, we have not only open-sourced all the data but also made public the code for all processing pipelines, including filtering, deduplication, document conversion, and all other procedures. This allows researchers and developers to delve into every step of data processing and even adjust and optimize according to their needs.\\n\\n## **Join Us in Shaping the Future of AI**\\n\\nThe launch of the Matrix Dataset marks a new milestone in AI pre-training data. We cordially invite AI researchers and developers worldwide to join us in leveraging this rich resource, pushing the boundaries of AI technology, and collectively shaping the future of AI.\\n\",\"headings\":[{\"level\":2,\"text\":\"What is the Matrix Dataset?\",\"children\":[]},{\"level\":2,\"text\":\"Diversity of Data Sources\",\"children\":[]},{\"level\":2,\"text\":\"Meticulous Data Processing\",\"children\":[]},{\"level\":2,\"text\":\"Optimized Data Composition\",\"children\":[]},{\"level\":2,\"text\":\"Commitment to Open Source\",\"children\":[]},{\"level\":2,\"text\":\"Join Us in Shaping the Future of AI\",\"children\":[]}]},{\"slug\":\"OmniHD-Scenes\",\"link\":\"/OmniHD-Scenes\",\"title\":\"OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving\",\"desc\":\"The 2077AI Foundation has released a new generation of datasets for the autonomous driving. These datasets provide large-scale, more modern, and more realistic data that offer perspectives not previously available.\",\"bannerImg\":\"/bg2.png\",\"date\":\"2025-03-25\",\"content\":\"\\n# OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving\\n\\nThe 2077AI Foundation, in collaboration with Automotive Studies, Automotive Studies, Tongji University, CAERI (China Automotive Engineering Research Institute), and Geometrical-PAL, has released a new generation of datasets for the autonomous driving. These datasets provide large-scale, more modern, and more realistic data that offer perspectives not previously available. They support various downstream tasks in autonomous driving research and development.\\n\\n## Overview\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250325/logo2.jpg\\\" alt=\\\"\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n \u003C/p>\\n\u003C/div>\\n\\n## Abstract\\n\\nThe OmniHD-Scenes dataset combines data from 128-beam LiDAR, six cameras, and six 4D imaging radar systems to achieve full environmental perception. The dataset comprises 1501 clips, each approximately 30s long, totaling more than 450K synchronized frames and more than 5.85 million synchronized sensor data points. We also propose a novel 4D annotation pipeline. To date, we have annotated 200 clips with more than 514K precise 3D bounding boxes. These clips also include semantic segmentation annotations for static scene elements. Additionally, we introduce a novel automated pipeline for generation of the dense occupancy ground truth, which effectively leverages information from non-key frames. Alongside the proposed dataset, we establish comprehensive evaluation metrics, baseline models, and benchmarks for 3D detection and semantic occupancy prediction. These benchmarks utilize surround-view cameras and 4D imaging radar to explore cost-effective sensor solutions for autonomous driving applications.\\n\\n\u003Cdiv style=\\\"display: flex; justify-content: space-between; max-width: 100%; overflow: hidden\\\">\\n \u003Cdiv class=\\\"img-wrap has-caption\\\" style=\\\"flex: 1; max-width: 49%; max-height: 60vh; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/Sensor-Deployment-Architecture-Diagram.png\\\" alt=\\\"First Image\\\" style=\\\"width: 100%; height: 80%; object-fit: contain; background: #141414; border-radius: 8px\\\"/>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Sensor Deployment Architecture Diagram\\n \u003C/p>\\n \u003C/div>\\n \u003Cdiv class=\\\"img-wrap has-caption\\\" style=\\\"flex: 1; max-width: 49%; max-height: 60vh; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/Sensor-Coordinate-System.png\\\" alt=\\\"Second Image\\\"style=\\\"width: 100%; height: 80%; object-fit: contain; background: #141414; border-radius: 8px\\\"/>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Sensor Coordinate System\\n \u003C/p>\\n \u003C/div>\\n\u003C/div>\\n\\nThe contributions of this study can be summarized as follows.\\n\\n1. Novel Dataset: Introduces OmniHD-Scenes, the first multimodal dataset with 4D radar point clouds for 3D object detection, multi-object tracking, and occupancy prediction.\\n2. Extensive Data: Features 1501 clips (450K+ frames, 5.85M data points) covering diverse urban driving scenarios, including challenging conditions like rain and night.\\n3. Advanced Annotation: A novel 4D annotation pipeline leverages temporal data for efficient and accurate 3D bounding box and semantic segmentation annotation.\\n4. Comprehensive Benchmarks: Establishes benchmarks for 3D object detection and occupancy prediction, including baseline models using various sensor modalities.\\n\\n## Overview\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/Comparison-with-Publicly-Datasets.png\\\" alt=\\\"\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n \u003C/p>\\n\u003C/div>\\n\\nOur OmniHD-Scenes dataset contains data from **six 4D radar** **systems** acquired with high-resolution cameras and LiDAR, and also includes data from **day/night and bad-weather scenarios** to accommodate more complex working conditions. The proposed dataset also includes **segmentation and occupancy labels**, thereby fully supporting **multi-view and multi-sensor tasks**.\\n\\n## Sensor And Controller Specifications\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/Sensor-And-Controller-Specifications.png\\\" alt=\\\"\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n \u003C/p>\\n\u003C/div>\\n\\n## Annotation\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/MooreData-Platform-Annotation-Pipeline.png\\\" alt=\\\"MooreData Platform Annotation Pipeline\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n MooreData Platform Annotation Pipeline\\n \u003C/p>\\n\u003C/div>\\n\\nSince Tesla introduced the concept of 4D annotation,it has become a pivotal component of the data closed-loop process. This annotation technique utilizes poses to establish temporal relationships and represents traffic participants and road information over a period using dense point cloud reconstruction. Compared to traditional 3D annotation methods, the reconstructed map is denser, exhibits stronger global consistency, offers enhanced visual effects, significantly reduces repetitive tasks, and utilizes more prior information to ensure data reliability. Utilizing 4D tools for data generation can substantially lower data production costs and enhance data quality. In this study, we implemented semi-automatic 4D annotation using the MooreData platform solution, with the data being processed in clips.\\n\\n## Visualization Annotation Result\\n\\n\u003Cdiv class=\\\"img-wrap has-caption\\\" style=\\\"width: 100%\\\">\\n \u003Cvideo \\n controls\\n poster=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/annotation_results_isplay_rain_scene1.jpg\\\"\\n data-title=\\\"Annotation Result Display - Rain Scene\\\"\\n data-description=\\\"Annotation Result Display - Rain Scene\\\"\\n style=\\\"max-height: 60vh; width: 100%; background: #141414\\\"\\n >\\n \u003Csource src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/annotation_results_isplay_rain_scene.mp4\\\" type=\\\"video/mp4\\\">\\n \u003C/video>\\n \u003Cp style=\\\"margin:8px 0 24px; text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">Annotation Result Display - Rain Scene\u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption\\\" style=\\\"width: 100%;\\\">\\n \u003Cvideo \\n controls\\n poster=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/annotation_results_isplay_night_scene1.jpg\\\"\\n data-title=\\\"Annotation Result Display - Night Scene\\\"\\n data-description=\\\"Annotation Result Display - Night Scene\\\"\\n style=\\\"max-height: 60vh; width: 100%; background: #141414\\\"\\n >\\n \u003Csource src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/annotation_results_isplay_night_scene.mp4\\\" type=\\\"video/mp4\\\">\\n \u003C/video>\\n \u003Cp style=\\\"margin:8px 0 24px; text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">Annotation Result Display - Night Scene\u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption\\\" style=\\\"width: 100%\\\">\\n \u003Cvideo \\n controls\\n poster=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/annotation_results_isplay_day_scenario1.jpg\\\"\\n data-title=\\\"Annotation Results Display - Day Scenario\\\"\\n data-description=\\\"Annotation Results Display - Day Scenario\\\"\\n style=\\\"max-height: 60vh; width: 100%; background: #141414\\\"\\n >\\n \u003Csource src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/annotation_results_isplay_day_scenario.mp4\\\" type=\\\"video/mp4\\\">\\n \u003C/video>\\n \u003Cp style=\\\"margin:8px 0 24px; text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">Annotation Results Display - Day Scenario\u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption\\\" style=\\\"width: 100%\\\">\\n \u003Cvideo \\n controls\\n poster=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/occupancy_ground_truth1.jpg\\\"\\n data-title=\\\"Occupancy ground truth\\\"\\n data-description=\\\"Occupancy ground truth\\\"\\n style=\\\"max-height: 60vh; width: 100%; background: #141414\\\"\\n >\\n \u003Csource src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20241212/OmniHD-Scenes/occupancy_ground_truth.mp4\\\" type=\\\"video/mp4\\\">\\n \u003C/video>\\n \u003Cp style=\\\"margin:8px 0 24px; text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">Occupancy ground truth\u003C/p>\\n\u003C/div>\\n\\nOccupancy ground truth\\n\\n## Dataset Statistics\\n\\nThe OmniHD-Scenes dataset predominantly includes data for urban areas and overpass roads with complex traffic patterns, and spans a range of weather conditions and different times of day. Significantly, it comprises data for a substantial number of challenging scenarios, such as rainy conditions (33%) and night scenes (28%), offering valuable opportunities for the development and evaluation of more robust perception algorithms.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/Scene-Element-Distribution.png\\\" alt=\\\"\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n \u003C/p>\\n\u003C/div>\\n\\nMoreover, a notable feature of OmniHD-Scenes is its incorporation of multi-view 4D radar. Fig. 12 illustrates the point-cloud quantity distributions per frame for both the LiDAR and 4D-radar data. The LiDAR point-cloud counts per frame are concentrated between 180K and 210K points, whereas those for the 4D radar primarily range from 2K to 4K points. Therefore,in terms of point-cloud density, the 4D-radar data are conspicuously sparse. This sparseness will inspire researchers to explore more effective ways of leveraging the unique characteristics of 4D radar or to integrate it with other modalities for enhanced perception efficiency.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/Keyframe-PointCloud-Counts.png\\\" alt=\\\"\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n \u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/Comparison-of-3D-Object-Detection.png\\\" alt=\\\"3D OD Model Performance\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n 3D OD Model Performance\\n \u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/abaka/20250325-OmniHD-Scenes/OCC-Model-Performance.png\\\" alt=\\\"OCC Model Performance\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n OCC Model Performance\\n \u003C/p>\\n\u003C/div>\\n\\n## Conclusion And Future Work\\n\\nIn the future, we plan to expand OmniHD-Scenes to facilitate additional tasks and benchmarks, including trajectory prediction, visual grounding, and end-to-end autonomous driving.\\n\\n## ACKNOWLEDGEMENT\\n\\nThanks to Intelligent Connected Vehicle inspection Center (Hunan) of CAERI Co., Ltd and Shanghai Geometrical Perception and Learning Co., Ltd.\\n\\n\\n\",\"headings\":[{\"level\":2,\"text\":\"Overview\",\"children\":[]},{\"level\":2,\"text\":\"Abstract\",\"children\":[]},{\"level\":2,\"text\":\"Overview\",\"children\":[]},{\"level\":2,\"text\":\"Sensor And Controller Specifications\",\"children\":[]},{\"level\":2,\"text\":\"Annotation\",\"children\":[]},{\"level\":2,\"text\":\"Visualization Annotation Result\",\"children\":[]},{\"level\":2,\"text\":\"Dataset Statistics\",\"children\":[]},{\"level\":2,\"text\":\"Conclusion And Future Work\",\"children\":[]},{\"level\":2,\"text\":\"ACKNOWLEDGEMENT\",\"children\":[]}]},{\"slug\":\"pin-dataset\",\"link\":\"/blog/pin-dataset\",\"title\":\"PIN Dataset: A Unified Paradigm for Multimodal Learning\",\"desc\":\"077AI Foundation is proud to introduce our new project, the PIN Multimodal Dataset Document. \",\"bannerImg\":\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250508/BannerImg.png\",\"date\":\"2024-12-01\",\"content\":\"\\n\\n# PIN Dataset: A Unified Paradigm for Multimodal Learning\\n\\n2077AI Foundation is proud to introduce our new project, the PIN Multimodal Dataset Document. This initiative emerges from our in-depth analysis of the current developmental bottlenecks in multimodal large models, aiming to catalyze exponential growth in multimodal AI through innovative data formatting.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250508/PIN-14M-dataset-Statistics.png\\\" alt=\\\"General statisitcs of our PIN-14M dataset\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n General statisitcs of our PIN-14M dataset\\n \u003C/p>\\n\u003C/div>\\n\\n## **PIN: A Unifying Paradigm for Multimodal Learning**\\n\\nThe core philosophy behind the PIN project is to establish a data format that unifies multimodal learning processes and patterns. We've observed that while the text domain has solidified a \\\"text-in, text-out\\\" paradigm, the multimodal realm still lacks a cohesive, efficient training methodology. The PIN format is designed to bridge this gap, offering a more intuitive approach to blending image and text data training.\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250508/PIN-14M-dataset-Comparative-Analysis.png\\\" alt=\\\" Comparative analysis of traditional multimodal formats versus the proposed PIN format. The PIN format preserves rich knowledge attributes (e.g., bolding, highlighting, code), supports semantic interplay between images and text in markdown files, and enhances knowledge representation through an overall image.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Comparative analysis of traditional multimodal formats versus the proposed PIN format. The PIN format preserves rich knowledge attributes (e.g., bolding, highlighting, code), supports semantic interplay between images and text in markdown files, and enhances knowledge representation through an overall image.\\n \u003C/p>\\n\u003C/div>\\n\\n## **Innovative Data Architecture**\\n\\nPIN introduces a unique \\\"holistic pairing with interwoven content\\\" structure:\\n\\n* Holistic Pairing: Each sample comprises a Markdown file coupled with a corresponding comprehensive image.\\n\\n* Interwoven Content: The Markdown file encapsulates text and embedded images that are intricately linked to the comprehensive image.\\n\\nThis architecture addresses the limitations of current image-text pair formats, such as overly concise captions or low relevance issues.\\n\\nMarkdown: The Optimal Knowledge Vehicle\\n\\n* Facilitates detailed articulation of knowledge attributes\\n\\n* Supports multi-modal embedding (text, images, code, etc.)\\n\\n* Preserves document structural integrity\\n\\nComprehensive Image: Dense Visual Knowledge Representation\\n\\n* Encapsulates rich visual data, including page layouts and design elements\\n\\n* Excels in representing complex visual constructs like code blocks and diagrams\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250508/PIN-14M-dataset-Samples.png\\\" alt=\\\"Samples from various subsets of the PIN-14M dataset. For each subset, one entry is extracted, showcasing both its markdown file section and the corresponding overall image.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n Samples from various subsets of the PIN-14M dataset. For each subset, one entry is extracted, showcasing both its markdown file section and the corresponding overall image.\\n \u003C/p>\\n\u003C/div>\\n\\n## **Pioneering a Unified Format for the Future**\\n\\nPIN is not merely a novel data format; it's the genesis of a unified standard for multimodal data. We envision PIN as a blueprint for future dataset creation, encompassing:\\n\\n* Conceptual Framework: Detailed exposition of the dataset's design philosophy and objectives\\n\\n* Procedural Transparency: Full disclosure of data processing workflows\\n\\n* Quality Metrics: Integration of reliable quality indicators within the data\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250508/Process-Workflow.png\\\" alt=\\\"The overview of our process workflow\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The overview of our process workflow\\n \u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250508/Tree-Structure.png\\\" alt=\\\"The file tree structure of an example dataset in PIN format.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n The file tree structure of an example dataset in PIN format.\\n \u003C/p>\\n\u003C/div>\\n\\n\u003Cdiv class=\\\"img-wrap has-caption center\\\" style=\\\"width: 100%; position: relative; margin-bottom: 62px\\\">\\n \u003Cimg src=\\\"https://global-blog.oss-ap-southeast-1.aliyuncs.com/2077ai/20250508/Jsonl-File-Example.png\\\" alt=\\\"An example data sample of JSONL files.\\\" style=\\\"width: 100%; max-height: 60vh; object-fit: contain; background: #141414; border-radius: 8px\\\">\u003C/img>\\n \u003Cp class=\\\"img-text\\\" style=\\\"position: absolute; top: calc(100% + 16px); left: 0; right: 0;text-align: center; overflow: hidden; white-space: nowrap; text-overflow: ellipsis; line-height: 22px; color: #A1A1A1; font-size: 14px\\\">\\n An example data sample of JSONL files\\n \u003C/p>\\n\u003C/div>\\n\\n## **Conclusion**\\n\\nWe have already released initial data sets. Our next iteration aims to expand the data scale to 100M while open-sourcing training models to foster community research.\\n\\nAlthough comprehensive training experiments are pending due to time and resource constraints, the PIN format opens avenues for several innovative training methodologies, including:\\n\\n* Markdown-based prediction of comprehensive images\\n\\n* Extraction of structured textual knowledge from comprehensive images\\n\\n* Multi-task learning: Concurrent training in text comprehension, image understanding, and cross-modal reasoning\\n\\nThe PIN project represents a significant stride towards our vision of a more intuitive, unified, and robust learning paradigm in multimodal AI. By addressing the fundamental challenge of data formatting, we believe PIN will pave the way for transformative advancements in multimodal large models. While there are still many areas to explore, we are confident that collaborative efforts within the community will accelerate progress, ultimately leading to a quantum leap in multimodal AI capabilities.\\n\\nWe invite the global community of researchers and developers to join us in exploring the full potential of the PIN format and pushing the boundaries of multimodal AI. Together, let's shape the future of this exciting new era in artificial intelligence.\\n\\n## **Commitment to Open Source**\\n\\nAdhering to the principles of openness and sharing, we have not only open-sourced all the data but also made public the code for all processing pipelines, including filtering, deduplication, document conversion, and all other procedures. This allows researchers and developers to delve into every step of data processing and even adjust and optimize according to their needs.\\n\\n## **Join Us in Shaping the Future of AI**\\n\\nThe launch of the Matrix Dataset marks a new milestone in AI pre-training data. We cordially invite AI researchers and developers worldwide to join us in leveraging this rich resource, pushing the boundaries of AI technology, and collectively shaping the future of AI.\\n\",\"headings\":[{\"level\":2,\"text\":\"PIN: A Unifying Paradigm for Multimodal Learning\",\"children\":[]},{\"level\":2,\"text\":\"Innovative Data Architecture\",\"children\":[]},{\"level\":2,\"text\":\"Pioneering a Unified Format for the Future\",\"children\":[]},{\"level\":2,\"text\":\"Conclusion\",\"children\":[]},{\"level\":2,\"text\":\"Commitment to Open Source\",\"children\":[]},{\"level\":2,\"text\":\"Join Us in Shaping the Future of AI\",\"children\":[]}]}]",mdc:{components:{prose:true,map:{}},headings:{anchorLinks:{h1:false,h2:true,h3:true,h4:true,h5:false,h6:false}}},"seo-utils":{canonicalQueryWhitelist:["page","sort","filter","search","q","category","tag"]}},app:{baseURL:"/",buildId:"cd88f2e5-8315-4ea8-a7b2-08d87cd92430",buildAssetsDir:"/_nuxt/",cdnURL:""}}