Longxu Dou 窦隆绪

I am a Research Scientist at Sea AI Lab, with a focus on Natural Language Processing, particularly in multilingual large language model pre-training and text-to-SQL semantic parsing. I earned my Ph.D. and Bachelor's degree in Computer Science from Harbin Institute of Technology. I have also had the opportunity to work as a research intern at Microsoft Research Asia with Jian-Guang Lou and at NUS-WING Lab with Professor Min-Yen KAN.

We are actively hiring (Senior) Research Scientists, Engineers, and Interns in NLP/LLM. Internship positions are available both onsite (in Mainland China, Hong Kong, and Singapore) and remotely. I’m always open to discussions and collaborations. Feel free to reach out via email with your background and interests!

Email  /  Google Scholar  /  LinkedIn  /  Github

profile photo
Selected Publications (# indicates mentorship)
Sailor: Open Language Models for South-East Asia
Longxu Dou, Qian Liu, Guangtao Zeng, Jia Guo, Jiahui Zhou, Xin Mao, Ziqi Jin, Wei Lu, Min Lin
EMNLP Demo, 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao, Qian Liu#, Longxu Dou#, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong
NeurIPS, 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin
Preprint, 2024
Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL
Dingzirui Wang, Longxu Dou#, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che
EMNLP Findings, 2024
Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning Processes
Dingzirui Wang, Longxu Dou#, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che
ACL, 2024
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning of Large Language Models
Dingzirui Wang, Longxu Dou#, Wanxiang Che
AAAI, 2024
ConDA: State-Based Data Augmentation for Context-Dependent Text-to-SQL
Dingzirui Wang, Longxu Dou#, Wanxiang Che
JMLC Journal, 2024
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing
Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian-Guang Lou
AAAI, 2023
UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL Semantic Parsing
Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian-Guang Lou
JMLC Journal, 2023
KnowSQL: Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge
Longxu Dou, Yan Gao, Xuqi Liu, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Min-Yen Kan, Dechen Zhan, Jian-Guang Lou
EMNLP, 2022
Data2Text Studio: Automated Text Generation from Structured Data
Longxu Dou, Guanghui Qin, Jinpeng Wang, Jin-Ge Yao, and Chin-Yew Lin
EMNLP Demo, 2018

Design and source code from Jon Barron.