I am an enthusiastic and driven data scientist with a strong foundation in
data analytics and a passion for leveraging data to drive business
transformation. I possess a unique blend of technical and analytical
skills, complemented by a solid academic foundation and hands-on experience.
I am a Master's Graduate in Big Data Management and Analytics. Currently, at CiTIUS ~ Centro
Singular de
Investigación en Tecnoloxías Intelixentes in the Universidade de
Santiago de Compostela, Spain, I am an researcher for the Chair of Precision Medicine and Artifical
Intelligence.
Down the line, I aim to contribute to devising highly scalable and efficient solutions
in biomedical data science and machine learning. I am particularly interested in the intersection of
data science and healthcare, where I believe data-driven insights can have a profound impact on
improving patient outcomes and advancing medical research.
Erasmus Mundus Joint Master's Degree (EMJMD) – ULB, UPC, CentraleSupélec
2022 – 2024
Semester 1
Université Libre de Bruxelles
ULB
Semester 2
Universitat Politècnica de Catalunya
Barcelona Tech -
UPC
Semester 3
CentraleSupélec - Université Paris-Saclay
CS - UPS
Semester 4
Master's Thesis
L'Oréal
Achieved a CGPA of 8.21/10.
Coursework: Data Mining, Advanced Databases, Data Warehousing, Machine Learning, Deep Learning, Semantic Data Management, Big Data Systems, Data Visualisation, Data Engineering, Decision Modelling, Reinforcement Learning
Sardar Patel Institute of Technology, Mumbai
2017 – 2021
Achieved a CGPA of 9.58/10.
Coursework: Algorithms, Data Structures, Object-Oriented Programming Database Management Systems, Computer Organisation & Architecture, Operating Systems, Distributed Systems, Web Technology, Machine Learning, Big Data Analytics, Engineering Mathematics, Engineering Physics, Engineering Chemistry
Python, Deep Learning, Machine Learning, Data Engineering, Data Science
Google Cloud Platform (GCP), PowerBI, SQL (BigQuery), ETL, Linux, Agile Methodologies, Jira, Confluence, Machine Learning, Google Looker Studio, Data Analytics, Data Visualisation
PL/SQL, Python, Shell Scripting, Data Warehouse, ETL, Linux, Agile Methodologies, Jira, Confluence
PL/SQL, Python, Shell Scripting, Data Warehouse, ETL, Linux, Agile Methodologies, Jira, Confluence
Python, Django, SQL, Machine Learning, Data Visualisation, Data Science
Sep 2023 – Feb 2024
Leveraged Healthy Minds dataset to present insights into student attitudes and help-seeking patterns around mental health & developed a dynamic dashboard using D3.js.
Sep 2023 – Feb 2024
Collaborated with Amadeus, France, to address computational bottlenecks in NeRF-based 3D reconstruction using clustering techniques and model optimisation for real-time performance.
Feb 2023 – Aug 2023
Built a scalable & comprehensive data platform to orchestrate ETL workflows, ensure governance with knowledge graphs, & deliver personalised financial advisor recommendations.
Feb 2023 – Aug 2023
Built a machine learning pipeline to classify music genres on Spotify using audio features, with feature engineering and model evaluation for accurate genre prediction.
Feb 2023 – Aug 2023
Explored distributed graph processing using Spark GraphX and GraphFrames by implementing core graph algorithms and TLAV-based computations in Java and Python.
Feb 2023 – Aug 2023
Modeled and evolved a property graph in Neo4j using DBLP data to build a hybrid recommender system, applying graph algorithms for collaborative and content-based filtering.
Sep 2022 – Feb 2023
Designed and executed PostgreSQL-based TPC-DS benchmarks at varying scale factors to evaluate query performance and optimise decision support workloads.
Sep 2022 – Feb 2023
Explored and benchmarked multi-model database technologies using ArangoDB and MarkLogic on the Yelp dataset to evaluate performance and storage efficiency.
Problem Statement: Extracting meaningful causal relationships from complex, fragmented ICU datasets like MIMIC-III is a major challenge in healthcare analytics.
Solution: Built a scalable causal discovery pipeline that automates preprocessing, inference using Tetrad, and visual analysis of longitudinal EHR data. Enabled clinicians to explore patient trajectories and identify key factors influencing outcomes.
Tech stack: Python, SQL, PostgreSQL, Tetrad, YAML, Pandas, Matplotlib, DNAnexus
Problem Statement: Understanding multimodal travel patterns and behavioral shifts is critical to incentivising rail usage and reducing car miles across Scotland.
Solution: Built a scalable analysis pipeline to quantify the impact of rail disruptions on mobility using ScotRail and road traffic data. Modelled travel behaviour shifts and estimated carbon savings to inform rail-first transport policies.
Tech stack: Python, PostgreSQL, Scikit-learn, GeoPandas, NetworkX, Knowledge Graphs, Matplotlib, Jupyter, Git
Problem Statement: Enhancing the specificity of mRNA drugs by predicting optimal binding regions for the GAG protein, a key step in targeted therapeutic design.
Solution: Identified potential mRNA-GAG binding sites using BLAST and ColabFold, then applied deep learning to predict and quantify binding potential. Modelled interactions using PST-PRNA4 to support in-silico validation for our industrial partner, bYoRNA.
Tech stack: Python, ColabFold, BLAST, PST-PRNA4, Deep Learning (TensorFlow/Keras), Jupyter, Git
Problem Statement: Design a robust and optimised route for the Olympic flame relay that ensures uninterrupted radio coverage across all regions.
Solution: Engineered a geospatial dashboard integrating geocoding, radio coverage simulation, and real-time visualisation. The solution emphasised reproducibility, robustness, and adaptability, ensuring connectivity throughout the relay path.
Tech stack: Python, OpenStreetMap, Geopandas, Folium, Django, PostgreSQL, Leaflet.js, Git
Presented In: 11th European Big Data Management & Analytics Summer School (eBiSS 2023) Barcelona, Spain
Shah, S.S., Gupta, R.A., Jardosh, P.M., Nimkar, A.V. (2022). Prosodic Speech Synthesis of Narratives Depicting Emotional Diversity Using Deep Learning. In: Gandhi, T.K., Konar, D., Sen, B., Sharma, K. (eds) Advanced Computational Paradigms and Hybrid Intelligent Computing . Advances in Intelligent Systems and Computing, vol 1373. Springer, Singapore
DOI: 10.1007/978-981-16-4369-9_4
Published In: Advances in Intelligent Systems and Computing ((AISC,volume 1373)) Springer, Singapore
R. Khara, D. Pomendkar, R. Gupta, I. Hingorani and D. Kalbande, "Micro Loans for Farmers," 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2020, pp. 1-5
DOI: 10.1109/ICCCNT49239.2020.9225577
Published In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)