CV
Education
- PhD in Informatics — University of Edinburgh — 2020-2024 (submitted, pending viva)
- 3-year doctoral programme at the Institute for Language, Cognition and Computation (ILCC), working with the StatMT group.
- Working on using multi-encoder models to provide additional context to neural machine translation models to analyse and improve them.
- Supervised by Kenneth Heafield and Alexandra Birch.
- MSc in Informatics — University of Edinburgh — 2016-2017
- 1-year postgraduate degree in Informatics, specialised in Natural Language Processing.
- Graduated with distinction.
- Courses included Machine Translation, Accelerated Natural Language Processing, Machine Learning and Pattern Recognition, Machine Learning Practical (Deep Learning).
- MSc in Computer Science — St. Xavier’s College, Kolkata — 2014-2016
- 2-year postgraduate degree in Computer Science.
- Graduated with 82% marks.
- Courses included Artificial Intelligence, Data Mining and Warehousing, Image Processing and Pattern Recognition.
- BSc in Computer Science — St. Xavier’s College, Kolkata — 2011-2014
- 3-year undergraduate degree with honours in Computer Science, and Mathematics and Physics as general subjects.
- Graduated with 80% marks.
Experience
Aveni — Senior NLP Engineer — Edinburgh, Aug 2024-present
Building LLMs for NLP applications in the finance domain.- Efficient Translation Limited — Deep Learning Engineer (part-time) — Edinburgh, Dec 2023-Apr 2024
Corpus extraction and efficient low-resource machine translation.- Trained efficient machine translation and corpus cleaning models for low-resource language pairs.
- Ran and optimised an efficient scalable parallel corpus extraction pipeline on web-scale data.
- Delivered datasets and models to customers on time and meeting requirements
University of Zurich — Visiting Researcher — Zurich, Mar 2023-May 2023
Three-month visit, conducting research on detection and analysis of underspecification of the source sentence in machine translation, supervised by Dr. Rico Sennrich at the Department of Computational Linguistics.Amazon AWS AI — Applied Scientist Intern — Santa Clara, Nov 2022-Feb 2023
Worked on isochronous machine translation for automatic dubbing. Co-organised the dubbing track at IWSLT 2023.- TAUS — Data Engineer — Amsterdam, Jun-Oct 2020
Worked on the EU-funded ParaCrawl project to collect parallel corpora from large-scale web crawls.- Optimised, maintained, and ran a highly scalable processing pipeline to extract, translate, align, and clean parallel corpora from web crawling data.
- Consolidated and released the ParaCrawl corpus v7.0 and v7.1, comprising hundreds of millions of sentence pairs in many languages.
- Unbabel — Junior AI Researcher — Lisbon, Feb-Apr 2020
Machine translation and quality estimation for customer-facing products.- Built domain-specific machine translation models.
- Built quality estimation models to skip human post-editing for high-quality MT output.
- World Intellectual Property Organization (WIPO) — Fellow in Machine Translation — Geneva, Feb 2018-Jan 2020
Development and maintenance of WIPO Translate and related NLP tools and technologies.- WIPO Translate: Built, improved, evaluated and deployed domain-specific neural and statistical machine translation models using the Marian and Moses toolkits.
- IPCCAT: Developed neural text classification systems for patent categorisation.
- Developed a system to retrieve semantically similar content from large collections of text using sentence embeddings and Faiss indexes.
- Instrumental in the training and deployment of neural MT systems at several other international organisations and patent offices including IMF, OECD, WTO, IAEA, and KIPO.
- University of Edinburgh — Research Assistant — Edinburgh, Sep-Dec 2017
Low-resource domain-specific machine translation research on the MeMaT project. Supervised by Kenneth Heafield and Alexandra Birch.- Worked on an EPSRC GCRF-funded project in collaboration with the University of Cape Town to build a machine translation system to facilitate communication in the medical domain between isiXhosa-speaking patients and English-speaking doctors in health centres in South Africa.
- Collected corpora released as a public resource.
Also see: Projects
Technical Skills
- Python
- PyTorch
- NumPy
- scikit-learn
- C++
- Marian
- Git
- BASH
- Perl
- Docker
Language Skills
- Bengali - Mother tongue
- English - Native
- French - Intermediate (B1)
- Hindi - Fluent
- Chinese (Mandarin) - Basic