CV

Houman Rajabi

Data Scientist

houmanrajabi@myyahoo.com
+39 351 9465199
Turin, IT

Summary

Data Scientist & Machine Learning Engineer focused on deciphering complex systems to deliver actionable insights. I bridge the gap between raw data and strategic decision-making by combining the rigor of statistical modeling with the engineering of LLMs and RAG systems. Nearing graduation from the University of Turin, I offer a perspective sharpened by professional collaborations across Italy, Germany, and Iran.

Work Experience

  • Data Scientist Intern
    2025-08 - 2026-01
    Trivago
    Data Science Intern in the central team, architecting the End-to-End ETL Pipelines for Unstructured Data Ingestion.
    • Leveraged Generative AI (NLG) to automate the creation of accommodation descriptions and Computer Vision to optimize image quality.
    • Built semantic search engines using embedding models to improve query-to-accommodation matching accuracy.
    • Engaged in research-driven development, ensuring methodologies met rigorous scientific standards.
  • Researcher & Data Scientist
    2023-09 - 2025-05
    University of Turin (DeepHealth Project)
    Collaboration focused on Federated Learning and Privacy Engineering.
    • Federated Learning: Architected decentralized training workflows across hospital networks to avoid moving raw patient data.
    • Technical Liaison: Aligned clinical requirements with HPC infrastructure for industrial partners like Philips and Thales.
    • Privacy Engineering: Deployed GDPR-compliant algorithms to ensure data sovereignty within strict hospital IT environments.
  • Data Scientist
    2021-03 - 2023-07
    Snapp!
    Data Scientist focusing on forecasting, algorithms, and real-time machine learning optimizations.
    • Forecasting: Developed models to balance driver supply and demand, reducing wait times by 2 minutes.
    • Algorithms: Designed dynamic pricing algorithms to maximize revenue (+3%) during high-demand periods.
    • Machine Learning: Utilized real-time data to enhance ETA prediction accuracy by 2.5%.
    • Analytics: Monitored KPIs to identify and execute optimizations that improved driver utilization and completion rates.
  • Data Scientist
    2019-05 - 2021-03
    Digikala Group
    • Engineered hybrid recommendation engines (Association Rules, Collaborative Filtering) to redesign the 'Frequently Bought Together' module, boosting Average Order Value (AOV) by 5% via cross-selling.
    • Developed price optimization algorithms for Flash Sales ('Shegeftane') to automate candidate selection and discount depth, achieving ~95% sell-through without eroding margins.
    • Built robust demand forecasting pipelines (XGBoost, Prophet) capable of handling 4x traffic loads during Black Friday, reducing logistics bottlenecks by 10% and improving inventory accuracy.
    • Enhanced search relevance by integrating custom Persian NLP models for semantic matching and misspelling correction, reducing 'Zero Search Result' queries by 7%.
  • Project Contributor
    2018-09 - 2019-03
    Sharif University of Technology
    • Institutional Analytics: Created robust analytical dashboards to support institutional decision-making and enhance strategic academic planning.

Education

  • Master of Science in Language Technology & Digital Humanities (NLP)
    2026-06
    University of Turin
  • Bachelor of Science in Computer Science
    2019-03
    Sharif University of Technology

Skills

Languages & Core

  • Python (OOP)
  • SQL
  • Bash

AI & GenAI

  • LLMs
  • RAG Pipelines
  • LangChain
  • Hugging Face
  • Vector Search
  • Fine-tuning (PEFT/LORA)

Machine Learning

  • PyTorch
  • TensorFlow
  • Scikit-learn
  • XGBoost
  • Computer Vision
  • NLP

MLOps & Cloud

  • Docker
  • Kubernetes
  • AWS (SageMaker)
  • CI/CD (GitHub Actions)
  • MLflow
  • Airflow

Big Data

  • PySpark
  • Kafka
  • Databricks
  • Data Warehousing

Publications

  • Identity, Toxicity, or Complexity? (EVALITA 2026)
    EVALITA 2026
    1st Place (Italian Task): Developed a Hybrid Fusion architecture combining BERT embeddings with engineered sociolinguistic features to detect reclaimed slurs; achieved SOTA F1 of 0.8981 by modeling language-specific syntactic patterns.

Languages

  • English
    Full professional proficiency
  • German
    Limited working proficiency
  • Italian
    Elementary proficiency