
Data Science Associate
LISER (Luxembourg Institute of Socio-Economic Research)

Jan 2025 - Apr 2025
- Implemented a scalable multilingual semantic classification pipeline using Pandas and Polars for efficient large-scale text data processing
- Built data preprocessing modules with BeautifulSoup for HTML extraction, spaCy for text normalization and deduplication
- Integrated Stanza for language-specific sentence segmentation across multilingual NLP corpora
- Developed keyword extraction using Sentence-Transformers (Hugging Face) with semantic similarity for AI-related indicator identification
- Benchmarked semantic similarity pipeline against GPT-based models (OpenAI GPT, Mixtral) to assess performance accuracy









