@irvingwhale
I’ve prepared a limited Hugging Face sample page for EN-MY 2.1M Parallel Corpus.
This is an English–Burmese/Myanmar parallel text corpus containing approximately 2.1M aligned segment pairs, developed through professional in-house media localization workflows including transcription, subtitle preparation, translation, alignment, and QA processing.
The buyer-facing release is randomized, de-contextualized, and provided without title-level files, source file names, or project metadata.
Use cases:
Machine translation, bilingual NLP, LLM evaluation, bilingual retrieval, and low-resource language development.
If your team works on Burmese/Myanmar language AI, machine translation, or low-resource language datasets, feel free to connect or message me.
#BurmeseAI #MyanmarLanguage #LowResourceLanguages #MachineTranslation #NLP #LLM #AIData #DataPartnerships #ParallelCorpus #MultilingualAI
https://huggingface.co/datasets/irvingaungzinpyae/en-my-2.1m-parallel-corpus