CV
Houman Rajabi
AI/ML Engineer
Summary
AI/ML Engineer designing and building systems that take models from research into production. Background in linguistics shaped a focus on NLP, LLMs, and RAG. Experience spotting where standard methods break on edge cases — fixing tokenizer vocabulary gaps for crypto terminology, or introducing a new scoring metric to separate agent from patient roles in slur reclamation — and building the fix rather than working around it. Worked across industry and academic settings in Italy, Germany, and Iran.
Work Experience
- NLP/ML Engineer Intern2025-08 - 2026-02CDC Cartel Damage Claims
- Robust Data Pipeline: Developed an asynchronous, rate-limited ingestion pipeline with xxhash-based duplicate detection for high-throughput data collection.
- Domain-Specific Fine-Tuning: Fine-tuned FinBERT for cryptocurrency market sentiment analysis; conducted out-of-vocabulary token analysis to identify sub-tokenisation inefficiencies and optimise model vocabulary.
- Multi-LLM Consensus Annotation: Engineered a synthetic data annotation pipeline leveraging a multi-LLM ensemble (GPT-4o, Gemini, Mistral) to mitigate single-model bias and improve label quality.
- Junior ML Engineer2020-05 - 2023-03Digikala GroupLargest e-commerce platform in MENA.
- Search relevance: Built Persian NLP components for semantic matching and misspelling correction, helping cut zero-result queries by 7%.
- Demand forecasting: Built components of XGBoost/Prophet forecasting pipelines that handled 4x Black Friday traffic loads, helping reduce logistics bottlenecks by 10%.
- Flash-sale pricing: Built automation logic for candidate selection and discount depth in 'Shegeftane' flash sales, supporting ~95% sell-through without eroding margins.
- Recommendations: Built a module of a hybrid recommendation engine (Association Rules + Collaborative Filtering) for 'Frequently Bought Together,' supporting a 5% lift in Average Order Value via cross-selling.
Education
- Language Technology & Digital Humanities (NLP)2026-06University of TurinGPA: 28.8/30Courses: Mechanistic interpretability, RAG systems, Multilingual NLP
- Professional Development Programme in Computer Science2021Sharif University of TechnologyCourses: Software systems, Algorithms, Data structures, Distributed systems, Statistical learning
- Linguistics2019Azad University of HamedanCourses: Phonology, Morphology, Syntax, Semantics, Corpus linguistics
Skills
Languages & Core
- Python (OOP)
- SQL
- Bash
- Git
- Docker
MLOps & Cloud
- AWS SageMaker
- GCP Vertex AI
- Azure ML
- MLflow
- CI/CD
LLMs & GenAI
- PyTorch
- HuggingFace Transformers
- LangChain
- LangGraph
- LlamaIndex
- vLLM
- PEFT/LoRA
Machine Learning
- Deep Learning
- NLP
- Classification/Regression
- Recommendation Systems
Data & Big Data
- Vector DBs
- Apache Spark
- Kafka
- Databricks
- NoSQL
HPC & Systems
- Slurm
- HPC
- CUDA
Visualisation
- Matplotlib
- Seaborn
Publications
- Identity, Toxicity, or Complexity? A Language-Specific Feature Selection Approach to Reclamatory Intent Detection2026Proceedings of EVALITA 2026 (Task A: MultiPRIDE), Bari, Italy1st Place (Italian Task): Developed a Hybrid Fusion architecture combining BERT embeddings with engineered sociolinguistic features to detect reclaimed slurs; achieved SOTA F1 of 0.8981 by modeling language-specific syntactic patterns.
- Parametric Stubbornness: Mechanistically Isolating the Layer Shift and Sparsity Gradient of RAG Knowledge Conflicts in Llama-32026ACL Student Research Workshop 2026Two-phase activation patching on Meta-Llama-3-8B across 452 minimal-pair conflicts; introduces the Sparsity Gradient and Contextual Contamination phenomena in RAG knowledge conflict settings.
Languages
- EnglishFull professional proficiency
- PersianNative
- GermanLimited working proficiency
- ItalianElementary proficiency