CV
Houman Rajabi
Data Scientist
Summary
Data Scientist & Machine Learning Engineer focused on deciphering complex systems to deliver actionable insights. I bridge the gap between raw data and strategic decision-making by combining the rigor of statistical modeling with the engineering of LLMs and RAG systems. Nearing graduation from the University of Turin, I offer a perspective sharpened by professional collaborations across Italy, Germany, and Iran.
Work Experience
- Data Scientist Intern2025-08 - 2026-01TrivagoData Science Intern in the central team, architecting the End-to-End ETL Pipelines for Unstructured Data Ingestion.
- Leveraged Generative AI (NLG) to automate the creation of accommodation descriptions and Computer Vision to optimize image quality.
- Built semantic search engines using embedding models to improve query-to-accommodation matching accuracy.
- Engaged in research-driven development, ensuring methodologies met rigorous scientific standards.
- Researcher & Data Scientist2023-09 - 2025-05University of Turin (DeepHealth Project)Collaboration focused on Federated Learning and Privacy Engineering.
- Federated Learning: Architected decentralized training workflows across hospital networks to avoid moving raw patient data.
- Technical Liaison: Aligned clinical requirements with HPC infrastructure for industrial partners like Philips and Thales.
- Privacy Engineering: Deployed GDPR-compliant algorithms to ensure data sovereignty within strict hospital IT environments.
- Data Scientist2021-03 - 2023-07Snapp!Data Scientist focusing on forecasting, algorithms, and real-time machine learning optimizations.
- Forecasting: Developed models to balance driver supply and demand, reducing wait times by 2 minutes.
- Algorithms: Designed dynamic pricing algorithms to maximize revenue (+3%) during high-demand periods.
- Machine Learning: Utilized real-time data to enhance ETA prediction accuracy by 2.5%.
- Analytics: Monitored KPIs to identify and execute optimizations that improved driver utilization and completion rates.
- Data Scientist2019-05 - 2021-03Digikala Group
- Engineered hybrid recommendation engines (Association Rules, Collaborative Filtering) to redesign the 'Frequently Bought Together' module, boosting Average Order Value (AOV) by 5% via cross-selling.
- Developed price optimization algorithms for Flash Sales ('Shegeftane') to automate candidate selection and discount depth, achieving ~95% sell-through without eroding margins.
- Built robust demand forecasting pipelines (XGBoost, Prophet) capable of handling 4x traffic loads during Black Friday, reducing logistics bottlenecks by 10% and improving inventory accuracy.
- Enhanced search relevance by integrating custom Persian NLP models for semantic matching and misspelling correction, reducing 'Zero Search Result' queries by 7%.
- Project Contributor2018-09 - 2019-03Sharif University of Technology
- Institutional Analytics: Created robust analytical dashboards to support institutional decision-making and enhance strategic academic planning.
Education
- Master of Science in Language Technology & Digital Humanities (NLP)2026-06University of Turin
- Bachelor of Science in Computer Science2019-03Sharif University of Technology
Skills
Languages & Core
- Python (OOP)
- SQL
- Bash
AI & GenAI
- LLMs
- RAG Pipelines
- LangChain
- Hugging Face
- Vector Search
- Fine-tuning (PEFT/LORA)
Machine Learning
- PyTorch
- TensorFlow
- Scikit-learn
- XGBoost
- Computer Vision
- NLP
MLOps & Cloud
- Docker
- Kubernetes
- AWS (SageMaker)
- CI/CD (GitHub Actions)
- MLflow
- Airflow
Big Data
- PySpark
- Kafka
- Databricks
- Data Warehousing
Publications
- Identity, Toxicity, or Complexity? (EVALITA 2026)EVALITA 20261st Place (Italian Task): Developed a Hybrid Fusion architecture combining BERT embeddings with engineered sociolinguistic features to detect reclaimed slurs; achieved SOTA F1 of 0.8981 by modeling language-specific syntactic patterns.
Languages
- EnglishFull professional proficiency
- GermanLimited working proficiency
- ItalianElementary proficiency