Ismail Ahouari

Data Scientist & AI Engineer

Hi, I'm Ismail Ahouari

About

Ismail Ahouari

Here is a little background

I am a Master's student in Data Science at the University of Milano-Bicocca with a strong technical foundation in applied mathematics, deep learning, and large-scale data processing. My work spans distributed learning, NLP, and LLM-based infrastructures, with a focus on building scalable, production-ready AI systems that integrate machine learning with robust backend and MLOps components. Passionate about applied AI engineering, I enjoy transforming complex technical concepts into practical, high-impact solutions from image and signal processing models to full MLOps pipelines and agentic LLM architectures. Skilled in Python, PyTorch, FastAPI, Docker, and cloud-native tooling, I thrive in environments that value autonomy, clean architecture, and performance-driven execution.

Experience

LISER (Luxembourg Institute of Socio-Economic Research) logo

Data Science Associate

LISER (Luxembourg Institute of Socio-Economic Research)

PythonHugging FacePolarsPandasspaCyGit
  • Implemented a scalable multilingual semantic classification pipeline using Pandas and Polars for efficient large-scale text data processing
  • Built data preprocessing modules with BeautifulSoup for HTML extraction, spaCy for text normalization and deduplication
  • Integrated Stanza for language-specific sentence segmentation across multilingual NLP corpora
  • Developed keyword extraction using Sentence-Transformers (Hugging Face) with semantic similarity for AI-related indicator identification
  • Benchmarked semantic similarity pipeline against GPT-based models (OpenAI GPT, Mixtral) to assess performance accuracy
C2DH - University of Luxembourg logo

Data Science Intern

C2DH - University of Luxembourg

PythonDockerSPARQLWikibaseOpenRefineGit
  • Implemented a Wikibase-based knowledge graph using Docker containers for isolated, reproducible environments
  • Automated data ingestion pipelines (Python & Wikibase API) to process structured records with semantic annotations
  • Developed a modular relational data model managing RDF triples across 19 reusable properties
  • Created multilingual support system with automated translation capabilities for Luxembourgish accessibility
  • Established data reconciliation workflows using OpenRefine for cross-platform entity linking
  • Enabled SPARQL query interface for complex historical research queries and data visualization

Skills

Python
C/C++
SQL
R
MLflow
Apache Spark
Docker
Airflow
SQL Server
ChromaDB
PyTorch
TensorFlow
Pandas
FastAPI
Grafana
LangGraph
Hugging Face
Cloudflare
Supabase
Tableau
Git
Flask
AWS
PostgreSQL

Projects

Agentic LLM Infrastructure with MCP Tools

Project 1: Agentic LLM Infrastructure with MCP Tools

LangChainCloudflareSupabase

Architected a scalable LLM orchestration system with Cloudflare Workers, Durable Objects, ReAct agents, and MCP tools. Engineered session layer with persistent memory, tenant isolation, and real-time SSE streaming.

Split Learning Performance Benchmark (SLPerf)

Project 2: Split Learning Performance Benchmark (SLPerf)

PythonPyTorchSLPerf Extra

Benchmarked Vanilla SL, U-Shaped SL, and SplitFed under IID and non-IID settings for vision and GNN tasks using MPI for distributed communication. Master's thesis research on privacy-preserving distributed deep learning.

Digital Image & Signal Processing

Project 3: Digital Image & Signal Processing

PythonPyTorchTensorFlow

Face retrieval with MobileNetV2 and KD-Tree for similarity search. Food-101 classification with ResNet50/Inception (51 classes). Music genre classification using SVM/CNN with MFCCs, Spectrograms, and Chroma features.

RAG-Based Conversational AI System

Project 4: RAG-Based Conversational AI System

PythonFastAPIChromaDBStreamlit

Built intelligent document chatbot using RAG architecture with FastAPI backend, LLaMA3 LLM, all-mpnet-base-v2 embeddings, and ChromaDB vector storage. Features document ingestion, session tracking, and Streamlit frontend.

MLOps Pipeline Project

Project 5: MLOps Pipeline Project

PythonDockerAWSMLflow

Complete MLOps Zoomcamp project: data ingestion, preprocessing, model training, deployment, and monitoring. Deployed containerized ML models on AWS (EC2, S3) with MLflow for experiment tracking and automated CI/CD pipelines.

Spotify Data Analysis & Visualization

Project 6: Spotify Data Analysis & Visualization

PythonNeo4jSpotify API

Music trends analysis (2010-2022) using Spotify API and web scraping for lyrics. Applied TF-IDF for keyword extraction, built Neo4j graph database for network-based exploration of songs, artists, and themes.

Contact

Let's build something amazing together. Get in touch.

ismailahouari123@gmail.com

Milano, Italy

Loading chat...