Ismail Ahouari

Data Scientist & AI Engineer

Hi, I'm Ismail Ahouari

About

Ismail Ahouari

Here is a little background

I am a Master's student in Data Science at the University of Milano-Bicocca with a strong technical foundation in applied mathematics, deep learning, and large-scale data processing. My work spans distributed learning, NLP, and LLM-based infrastructures, with a focus on building scalable, production-ready AI systems that integrate machine learning with robust backend and MLOps components. Passionate about applied AI engineering, I enjoy transforming complex technical concepts into practical, high-impact solutions—from image and signal processing models to full MLOps pipelines and agentic LLM architectures. Skilled in Python, PyTorch, FastAPI, Docker, and cloud-native tooling, I thrive in environments that value autonomy, clean architecture, and performance-driven execution.

Experience

LISER (Luxembourg Institute of Socio-Economic Research)

Data Science Associate

LISER (Luxembourg Institute of Socio-Economic Research)

PythonHugging FacePolarsPandasspaCyGit

Jan 2025 - Apr 2025

  • Implemented a scalable multilingual semantic classification pipeline using Pandas and Polars for efficient large-scale text data processing
  • Built data preprocessing modules with BeautifulSoup for HTML extraction, spaCy for text normalization and deduplication
  • Integrated Stanza for language-specific sentence segmentation across multilingual NLP corpora
  • Developed keyword extraction using Sentence-Transformers (Hugging Face) with semantic similarity for AI-related indicator identification
  • Benchmarked semantic similarity pipeline against GPT-based models (OpenAI GPT, Mixtral) to assess performance accuracy
C2DH - University of Luxembourg

Data Science Intern

C2DH - University of Luxembourg

PythonDockerSPARQLWikibaseOpenRefineGit

Jun 2024 - Dec 2024

  • Implemented a Wikibase-based knowledge graph using Docker containers for isolated, reproducible environments
  • Automated data ingestion pipelines (Python & Wikibase API) to process structured records with semantic annotations
  • Developed a modular relational data model managing RDF triples across 19 reusable properties
  • Created multilingual support system with automated translation capabilities for Luxembourgish accessibility
  • Established data reconciliation workflows using OpenRefine for cross-platform entity linking
  • Enabled SPARQL query interface for complex historical research queries and data visualization

Skills

Hover over a skill for current proficiency

Python

95%

C/C++

75%

SQL

85%

R

70%

MLflow

80%

Apache Spark

75%

Docker

85%

Airflow

75%

SQL Server

80%

ChromaDB

80%

PyTorch

90%

TensorFlow

85%

Pandas

95%

FastAPI

85%

Grafana

75%

LangGraph

80%

Hugging Face

85%

Cloudflare

80%

Supabase

75%

Tableau

75%

Git

90%

Flask

80%

AWS

75%

PostgreSQL

80%

Projects

Agentic LLM Infrastructure with MCP Tools

Project 1: Agentic LLM Infrastructure with MCP Tools

LangChainCloudflareSupabase

Architected a scalable LLM orchestration system with Cloudflare Workers, Durable Objects, ReAct agents, and MCP tools. Engineered session layer with persistent memory, tenant isolation, and real-time SSE streaming.

Split Learning Performance Benchmark (SLPerf)

Project 2: Split Learning Performance Benchmark (SLPerf)

PythonPyTorchMPI

Benchmarked Vanilla SL, U-Shaped SL, and SplitFed under IID and non-IID settings for vision and GNN tasks using MPI for distributed communication. Master's thesis research on privacy-preserving distributed deep learning.

Digital Image & Signal Processing

Project 3: Digital Image & Signal Processing

PythonPyTorchTensorFlow

Face retrieval with MobileNetV2 and KD-Tree for similarity search. Food-101 classification with ResNet50/Inception (51 classes). Music genre classification using SVM/CNN with MFCCs, Spectrograms, and Chroma features.

RAG-Based Conversational AI System

Project 4: RAG-Based Conversational AI System

PythonFastAPIChromaDBStreamlit

Built intelligent document chatbot using RAG architecture with FastAPI backend, LLaMA3 LLM, all-mpnet-base-v2 embeddings, and ChromaDB vector storage. Features document ingestion, session tracking, and Streamlit frontend.

MLOps Pipeline Project

Project 5: MLOps Pipeline Project

PythonDockerAWSMLflow

Complete MLOps Zoomcamp project: data ingestion, preprocessing, model training, deployment, and monitoring. Deployed containerized ML models on AWS (EC2, S3) with MLflow for experiment tracking and automated CI/CD pipelines.

Spotify Data Analysis & Visualization

Project 6: Spotify Data Analysis & Visualization

PythonNeo4jSpotify API

Music trends analysis (2010-2022) using Spotify API and web scraping for lyrics. Applied TF-IDF for keyword extraction, built Neo4j graph database for network-based exploration of songs, artists, and themes.

Contact

Let's build something amazing together. Get in touch.

ismailahouari123@gmail.com

Milano, Italy