M-A-P Matrix: A Massive Bilingual Dataset for LLM Pretraining