CV

Houman Rajabi

AI/ML Engineer

rajabi_houman@yahoo.com

+39 351 9465199

https://houmanrajabi.github.io

Turin, IT

Summary

AI/ML Engineer designing and building systems that take models from research into production. Background in linguistics shaped a focus on NLP, LLMs, and RAG. Experience spotting where standard methods break on edge cases — fixing tokenizer vocabulary gaps for crypto terminology, or introducing a new scoring metric to separate agent from patient roles in slur reclamation — and building the fix rather than working around it. Worked across industry and academic settings in Italy, Germany, and Iran.

Work Experience

NLP/ML Engineer Intern
2025-08 - 2026-02
CDC Cartel Damage Claims
- Robust Data Pipeline: Developed an asynchronous, rate-limited ingestion pipeline with xxhash-based duplicate detection for high-throughput data collection.
- Domain-Specific Fine-Tuning: Fine-tuned FinBERT for cryptocurrency market sentiment analysis; conducted out-of-vocabulary token analysis to identify sub-tokenisation inefficiencies and optimise model vocabulary.
- Multi-LLM Consensus Annotation: Engineered a synthetic data annotation pipeline leveraging a multi-LLM ensemble (GPT-4o, Gemini, Mistral) to mitigate single-model bias and improve label quality.
Junior ML Engineer
2020-05 - 2023-03
Digikala Group
Largest e-commerce platform in MENA.
- Search relevance: Built Persian NLP components for semantic matching and misspelling correction, helping cut zero-result queries by 7%.
- Demand forecasting: Built components of XGBoost/Prophet forecasting pipelines that handled 4x Black Friday traffic loads, helping reduce logistics bottlenecks by 10%.
- Flash-sale pricing: Built automation logic for candidate selection and discount depth in 'Shegeftane' flash sales, supporting ~95% sell-through without eroding margins.
- Recommendations: Built a module of a hybrid recommendation engine (Association Rules + Collaborative Filtering) for 'Frequently Bought Together,' supporting a 5% lift in Average Order Value via cross-selling.

Education

Language Technology & Digital Humanities (NLP)
2026-06
University of Turin
GPA: 28.8/30
Courses: Mechanistic interpretability, RAG systems, Multilingual NLP
Professional Development Programme in Computer Science
2021
Sharif University of Technology
Courses: Software systems, Algorithms, Data structures, Distributed systems, Statistical learning
Linguistics
2019
Azad University of Hamedan
Courses: Phonology, Morphology, Syntax, Semantics, Corpus linguistics

Skills

Languages & Core

Python (OOP)
SQL
Bash
Git
Docker

MLOps & Cloud

AWS SageMaker
GCP Vertex AI
Azure ML
MLflow
CI/CD

LLMs & GenAI

PyTorch
HuggingFace Transformers
LangChain
LangGraph
LlamaIndex
vLLM
PEFT/LoRA

Machine Learning

Deep Learning
NLP
Classification/Regression
Recommendation Systems

Data & Big Data

Vector DBs
Apache Spark
Kafka
Databricks
NoSQL

HPC & Systems

Slurm
HPC
CUDA

Visualisation

Matplotlib
Seaborn

Publications

Identity, Toxicity, or Complexity? A Language-Specific Feature Selection Approach to Reclamatory Intent Detection
2026
Proceedings of EVALITA 2026 (Task A: MultiPRIDE), Bari, Italy
1st Place (Italian Task): Developed a Hybrid Fusion architecture combining BERT embeddings with engineered sociolinguistic features to detect reclaimed slurs; achieved SOTA F1 of 0.8981 by modeling language-specific syntactic patterns.
Parametric Stubbornness: Mechanistically Isolating the Layer Shift and Sparsity Gradient of RAG Knowledge Conflicts in Llama-3
2026
ACL Student Research Workshop 2026
Two-phase activation patching on Meta-Llama-3-8B across 452 minimal-pair conflicts; introduces the Sparsity Gradient and Contextual Contamination phenomena in RAG knowledge conflict settings.

Languages

English
Full professional proficiency
Persian
Native
German
Limited working proficiency
Italian
Elementary proficiency

Download CV as PDF View Markdown CV