Centre for Linguistic Science and Technology
Indian Institute of Technology, Guwahati
π§ pankajchoudhury[AT]iitg.ac.in
LinkedIn Β | Β HuggingFace Β | Β Google Scholar
I am a Ph.D. scholar at the Centre for Linguistic Science and Technology, Indian Institute of Technology Guwahati, specializing in the intersection of Natural Language Processing (NLP), Computer Vision (CV), also known as Vision-Language Understanding (VLU).
My research primarily focuses on developing Automatic Image Captioning systems for low-resource languages, with a special emphasis on Assamese language. While mainstream AI technologies are often optimized for resource-rich languages like English, my work seeks to bridge this gap by designing models that are linguistically aware, data-efficient, and culturally inclusive.
Over the course of my doctoral work, I have contributed to:
Introducing novel architectures that integrate semantic priors and spatially encoded transformer models for generating linguistically accurate and contextually meaningful captions.
Demonstrated how language-centric datasets and customized training strategies outperform generic transfer-learning approaches.
Also trained and released AssameseGPT2, a GPT-2 model built from scratch on IndicCorpV2, and extending it into multimodal captioning systems.
I have collaborated on multiple projects in LLM training, multimodal AI, and dataset creation. Beyond technical contributions, I have also served as a faculty resource person and industry trainer, delivering sessions on Generative AI, Machine Learning, and Deep Learning for academic institutions and professional training programs.
My long-term vision is to build scalable, inclusive AI frameworks that not only advance research but also create meaningful real-world impact by enabling AI for everyone, regardless of language or resource availability.
Freelance Faculty β Imarticus Learning (Jan 2023 β Jul 2024)
Taught Artificial Intelligence & Machine Learning.
Tutor β E&ICT Academy, IIT Guwahati (Dec 2022 β May 2023)
Conducted courses on Data Analytics & Data Science.
Junior Research Fellow β Dept. of CSE, IIT Guwahati (2017 β 2019)
Project: e-Varaha β Safe Pork Production in North-East India.
Project Fellow β Dept. of CSE, IIT Guwahati (2016 β 2017)
Project: Reducing Cache Access Time in Tiled Chip Multiprocessors.
My research focuses on the intersection of Natural Language Processing (NLP), Computer Vision (CV) known as VisionβLanguage Understanding (VLU), with an emphasis on low-resource Indian languages.
My doctoral thesis specifically focuses on Automatic Image Caption generation for low-resource Assamese languages. Automatic image caption generation is the task of producing natural language descriptions. My primary research goal is to extend this process to low-resource languages, where models must not only understand visual content but also generate semantically and syntactically accurate descriptions in a linguistically rich setting. Through my work on image captioning, I aim to design computationally efficient multimodal AI systems that are linguistically aware and culturally relevant.
Key areas of interest
1. P. Choudhury, S. Nair, P. Guha, S. Nandi
Image Captioning in Low Resource Assamese Language with Semantic Information Prior and Spatially Encoded Transformer Model
Expert Systems with Applications, 2025, Vol. 297, p.129479
Link
2. P. Choudhury, P. Guha, S. Nandi
Exploring Semantic Attributes for Image Caption Synthesis in Low-Resource Assamese Language
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2025
Link
3. P. Choudhury, P. Guha, S. Nandi
Impact of Language-Specific Training on Image Caption Synthesis: A Case Study on Low-Resource Assamese Language
International Journal of Asian Language Processing (IJAPL), 2024
Link
1. P. Choudhury, P. Guha, S. Nandi
Relevance of Language-Specific Training on Image Caption Synthesis for Low Resource Assamese Language
International Conference on Asian Language Processing (IALP), Singapore, 2023, pp. 13β18
Link
2. P. Choudhury, P. Guha, S. Nandi
Image Caption Synthesis for Low Resource Assamese Language using Bi-LSTM with Bilinear Attention
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation (PACLIC), 2023, pp. 743β752
Link
3. P. Choudhury, Y. Aggarwal, P. Jadhav, P. Guha, S. Nandi
AC-Lite: A Lightweight Image Captioning Model for Low-Resource Assamese Language
Accepted at CVIP-2025 (to appear)
Link
4. N. Rahman, P. Choudhury, P. Guha, A. Anand, S. Nandi
Visual Question Answering in Low-Resource Assamese Language β Datasets and Evaluation
9th International Conference on Computer Vision and Image Processing (CVIP), Springer LNCS, 2024, pp. 159β174
Link
5. Y. Aggarwal, P. Choudhury, P. Guha
Face Detection in Challenging Scenes with a Customized Backbone
8th International Conference on Computer Vision & Image Processing (CVIP-2023), pp. 468β482
Link
6. C. Kirti, P. Choudhury, A. Anand, P. Guha
An Annotated Corpus for Realis Event Detection in Short Stories Written in English and Low Resource Assamese Language
20th International Conference on Natural Language Processing (ICON), 2023, pp. 72β81
Link
7. M. P. Lahkar, A. Gogoi, P. Choudhury
AsCul: Annotated Dataset and a Deep Learning based Framework for Assamese Cultural Object Detection
International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC), 2023, pp. 1β5
Link