Data Scientist & AI Engineer

Hi, I'm Ismail Ahouari

About

Here is a little background

I am a Master's student in Data Science at the University of Milano-Bicocca with a strong technical foundation in applied mathematics, deep learning, and large-scale data processing. My work spans distributed learning, NLP, and LLM-based infrastructures, with a focus on building scalable, production-ready AI systems that integrate machine learning with robust backend and MLOps components. Passionate about applied AI engineering, I enjoy transforming complex technical concepts into practical, high-impact solutions from image and signal processing models to full MLOps pipelines and agentic LLM architectures. Skilled in Python, PyTorch, FastAPI, Docker, and cloud-native tooling, I thrive in environments that value autonomy, clean architecture, and performance-driven execution.

Experience

Data Science Associate

LISER (Luxembourg Institute of Socio-Economic Research)

Implemented a scalable multilingual semantic classification pipeline using Pandas and Polars for efficient large-scale text data processing
Built data preprocessing modules with BeautifulSoup for HTML extraction, spaCy for text normalization and deduplication
Integrated Stanza for language-specific sentence segmentation across multilingual NLP corpora
Developed keyword extraction using Sentence-Transformers (Hugging Face) with semantic similarity for AI-related indicator identification
Benchmarked semantic similarity pipeline against GPT-based models (OpenAI GPT, Mixtral) to assess performance accuracy

Data Science Intern

C2DH - University of Luxembourg

Implemented a Wikibase-based knowledge graph using Docker containers for isolated, reproducible environments
Automated data ingestion pipelines (Python & Wikibase API) to process structured records with semantic annotations
Developed a modular relational data model managing RDF triples across 19 reusable properties
Created multilingual support system with automated translation capabilities for Luxembourgish accessibility
Established data reconciliation workflows using OpenRefine for cross-platform entity linking
Enabled SPARQL query interface for complex historical research queries and data visualization

Skills

Projects

Project 1: Agentic LLM Infrastructure with MCP Tools

Architected a scalable LLM orchestration system with Cloudflare Workers, Durable Objects, ReAct agents, and MCP tools. Engineered session layer with persistent memory, tenant isolation, and real-time SSE streaming.

Project 2: Split Learning Performance Benchmark (SLPerf)

Benchmarked Vanilla SL, U-Shaped SL, and SplitFed under IID and non-IID settings for vision and GNN tasks using MPI for distributed communication. Master's thesis research on privacy-preserving distributed deep learning.

Project 3: Digital Image & Signal Processing

Face retrieval with MobileNetV2 and KD-Tree for similarity search. Food-101 classification with ResNet50/Inception (51 classes). Music genre classification using SVM/CNN with MFCCs, Spectrograms, and Chroma features.

Project 4: RAG-Based Conversational AI System

Built intelligent document chatbot using RAG architecture with FastAPI backend, LLaMA3 LLM, all-mpnet-base-v2 embeddings, and ChromaDB vector storage. Features document ingestion, session tracking, and Streamlit frontend.

Project 5: MLOps Pipeline Project

Complete MLOps Zoomcamp project: data ingestion, preprocessing, model training, deployment, and monitoring. Deployed containerized ML models on AWS (EC2, S3) with MLflow for experiment tracking and automated CI/CD pipelines.

Project 6: Spotify Data Analysis & Visualization

Music trends analysis (2010-2022) using Spotify API and web scraping for lyrics. Applied TF-IDF for keyword extraction, built Neo4j graph database for network-based exploration of songs, artists, and themes.

Contact

Let's build something amazing together. Get in touch.

ismailahouari123@gmail.com

Milano, Italy

Loading chat...