Profile Photo

Rishika GUPTA

Researcher in Data Science at CiTIUS, Spain

I am an enthusiastic and driven data scientist with a strong foundation in data analytics and a passion for leveraging data to drive business transformation. I possess a unique blend of technical and analytical skills, complemented by a solid academic foundation and hands-on experience.

I am a Master's Graduate in Big Data Management and Analytics. Currently, at CiTIUS ~ Centro Singular de Investigación en Tecnoloxías Intelixentes in the Universidade de Santiago de Compostela, Spain, I am an researcher for the Chair of Precision Medicine and Artifical Intelligence.

Down the line, I aim to contribute to devising highly scalable and efficient solutions in biomedical data science and machine learning. I am particularly interested in the intersection of data science and healthcare, where I believe data-driven insights can have a profound impact on improving patient outcomes and advancing medical research.

EDUCATION

Master of Science in Big Data Management & Analytics

Erasmus Mundus Joint Master's Degree (EMJMD) – ULB, UPC, CentraleSupélec
2022 – 2024

Semester 1
Université Libre de Bruxelles
ULB

Semester 2
Universitat Politècnica de Catalunya
Barcelona Tech - UPC

Semester 3
CentraleSupélec - Université Paris-Saclay
CS - UPS

Semester 4
Master's Thesis
L'Oréal


Achieved a CGPA of 8.21/10.


Coursework: Data Mining, Advanced Databases, Data Warehousing, Machine Learning, Deep Learning, Semantic Data Management, Big Data Systems, Data Visualisation, Data Engineering, Decision Modelling, Reinforcement Learning



Bachelor of Technology in Computer Engineering

Sardar Patel Institute of Technology, Mumbai
2017 – 2021


Achieved a CGPA of 9.58/10.


Coursework: Algorithms, Data Structures, Object-Oriented Programming Database Management Systems, Computer Organisation & Architecture, Operating Systems, Distributed Systems, Web Technology, Machine Learning, Big Data Analytics, Engineering Mathematics, Engineering Physics, Engineering Chemistry

EXPERIENCES

CiTIUS Icon

May 2025 - Present

Working as a Researcher within the Chair of Precision Medicine and Artificial Intelligence

At Santiago de Compostela, Spain

Responsibilities & Achievements:

  • Research and review state-of-the-art prediction methods of drug-protein toxicity and identify gaps in current research
  • Develop and implement deep learning models to predict drug-protein interactions and toxicity using large-scale biomedical datasets
  • Support reproducible research by contributing to clean codebases and scientific documentation for cross-functional teams

Technologies:

Python, Deep Learning, Machine Learning, Data Engineering, Data Science

L'Oréal Icon

Mar 2024 - Aug 2024

Worked as a Data & AI Intern within the Global Luxe Chief Digital and Marketing Office (CMDO) Data Team

At Levallois-Perret, France

Responsibilities & Achievements:

  • Designed and implemented scalable ETL pipelines in Google Cloud Platform (GCP) to process and transform large volumes of website traffic and sales data for real-time analytics
  • Partnered with business stakeholders to define key KPIs and build interactive dashboards using Looker Studio, enabling data-driven decision-making across the Luxe team
  • Delivered a production-ready dashboard from scratch, resulting in faster insights and enhanced visibility into key brand performance metrics
  • Developed an anomaly detection solution for website traffic, reducing manual efforts by 85%
  • Optimized and refactored core ETL processes, leading to a 25% reduction in data pipeline execution time and improved system reliability
  • Collaborated with cross-functional teams (data, marketing, and supply chain) to streamline data workflows, improving data availability and operational efficiency

Technologies:

Google Cloud Platform (GCP), PowerBI, SQL (BigQuery), ETL, Linux, Agile Methodologies, Jira, Confluence, Machine Learning, Google Looker Studio, Data Analytics, Data Visualisation

ICICI Lombard Icon

Aug 2021 - Jun 2022

Worked as a Data Engineer within the Database Team

At Mumbai, India

Responsibilities & Achievements:

  • Ideated, designed, and developed PL/SQL scripts to support complex business logic, automate recurring data operations, and streamline ETL processes
  • Debugged and optimised PL/SQL procedures, functions, and triggers to ensure high performance and reliability in production environments
  • Collaborated with cross-functional teams including data scientists, BI analysts, and business stakeholders to define data requirements and transform raw data into meaningful insights
  • Ensured data quality and integrity by implementing validation checks, schema evolution tracking, and audit mechanisms
  • Assisted in migrating legacy data workflows to scalable modern infrastructure using tools like Python and Apache Spark
  • Automated routine data tasks including daily/weekly/monthly data loads, reconciliation checks, and report generation that reduced manual intervention by 90% and improved reporting turnaround time by 40%
  • Contributed to the development of a centralised data lake, enabling unified access for underwriting, claims, and fraud detection use cases
  • Led the documentation of data workflows and best practices, which improved onboarding speed for new team members and ensured maintainability

Technologies:

PL/SQL, Python, Shell Scripting, Data Warehouse, ETL, Linux, Agile Methodologies, Jira, Confluence

ICICI Lombard Icon

Feb 2021 - Aug 2021

Worked as a Technology Intern within the Database Team

At Mumbai, India

Responsibilities & Achievements:

  • Assisted in the development & maintenance of PL/SQL scripts for data extraction, transformation, & loading (ETL) processes
  • Collaborated with senior engineers to design and implement data models for various business use cases
  • Participated in code reviews and contributed to the documentation of best practices for PL/SQL development
  • Conducted performance tuning and optimisation of existing PL/SQL procedures and functions
  • Supported the team in troubleshooting and resolving data-related issues in production environments

Technologies:

PL/SQL, Python, Shell Scripting, Data Warehouse, ETL, Linux, Agile Methodologies, Jira, Confluence

SECCPL Icon

May 2019 - Jul 2019

Worked as an AI Intern within the Technology Team

At Mumbai, India

Responsibilities & Achievements:

  • Designed and developed a comprehensive dashboard using Django, Python and SQL integrating automated data ingestion from Google Sheets, reducing manual reporting time by 70%
  • Implemented machine learning algorithms (e.g., clustering and classification) to analyze customer behavior—increasing targeted outreach efficiency by 30% and informing personalised service strategies
  • Collaborated with sales, operations, and IT teams to translate business needs into data solutions, leading to quicker decision-making cycles and smoother project execution
  • Assisted in testing and debugging end-to-end flows of the dashboard, ensuring >95% accuracy in reported KPIs and analytics

Technologies:

Python, Django, SQL, Machine Learning, Data Visualisation, Data Science

PROJECTS

Master's Project

Analysing Help-Seeking Behavior and Peer Attitudes in University Mental Health

Sep 2023 – Feb 2024

Visual-Analytics Project

Leveraged Healthy Minds dataset to present insights into student attitudes and help-seeking patterns around mental health & developed a dynamic dashboard using D3.js.

Master's Project

Real-Time 3D Reconstruction Optimisation with Neural Radiance Fields (NeRF)

Sep 2023 – Feb 2024

Frame-Selection-NeRF Project

Collaborated with Amadeus, France, to address computational bottlenecks in NeRF-based 3D reconstruction using clustering techniques and model optimisation for real-time performance.

Master's Project

NutriScore Decision Modeling & Evaluation Framework

Sep 2023 – Feb 2024

Decision-Modelling Project

Engineered and evaluated decision models—including additive, Electri-tri, and ML classifiers—to analyse and improve NutriScore labeling using OpenFoodFacts data.

Master's Project

BIQUE: Scalable Financial Intelligence Platform & Advisory Engine

Feb 2023 – Aug 2023

Big-Data Project

Built a scalable & comprehensive data platform to orchestrate ETL workflows, ensure governance with knowledge graphs, & deliver personalised financial advisor recommendations.

Master's Project

Spotify Genre Classification based on the Song Attributes like Duration, Tempo, etc.

Feb 2023 – Aug 2023

Genre-Classification Project

Built a machine learning pipeline to classify music genres on Spotify using audio features, with feature engineering and model evaluation for accurate genre prediction.

Master's Project

Semantic Modeling of Research Lifecycle with RDF and SPARQL

Feb 2023 – Aug 2023

Knowledge-Graph Project

Engineered a knowledge graph using RDF & SPARQL modelling research publication lifecycle, integrating Semantic Scholar data & applying graph algorithms for semantic querying & insights.

Master's Project

Distributed Graph Processing with Spark GraphX

Feb 2023 – Aug 2023

GraphX Project

Explored distributed graph processing using Spark GraphX and GraphFrames by implementing core graph algorithms and TLAV-based computations in Java and Python.

Master's Project

Graph-Based Recommender System with Neo4j on DBLP Data

Feb 2023 – Aug 2023

Property-Graph Project

Modeled and evolved a property graph in Neo4j using DBLP data to build a hybrid recommender system, applying graph algorithms for collaborative and content-based filtering.

Master's Project

Custom URL Data Type Extension for PostgreSQL

Sep 2022 – Feb 2023

Custom-URL Extension Project

Extended PostgreSQL with a custom URL data type in C, replicating java.net.URL and enabling advanced indexing and query capabilities with semantic predicates.

Master's Project

TPC-DS Performance Benchmarking with PostgreSQL

Sep 2022 – Feb 2023

TPC-DS Project

Designed and executed PostgreSQL-based TPC-DS benchmarks at varying scale factors to evaluate query performance and optimise decision support workloads.

Master's Project

Implementing Yelp Dataset with Document-Based Databases

Sep 2022 – Feb 2023

Yelp Project

Explored and benchmarked multi-model database technologies using ArangoDB and MarkLogic on the Yelp dataset to evaluate performance and storage efficiency.

Master's Project

QoS Analysis for STIB-MIVB Public Transit System

Sep 2022 – Feb 2023

STIB-MIVB Project

Conducted time-series clustering and regularity analysis on geospatial data to evaluate and enhance transit performance for STIB-MIVB network of Brussels, Belgium.

SKILLS

Programming Languages

Databases

Frameworks & Tools

Data Engineering

Cloud Platforms

Data Science & AI

BI & Visualisation

Power BI

Project Management

HACKATHONS

Problem Statement: Extracting meaningful causal relationships from complex, fragmented ICU datasets like MIMIC-III is a major challenge in healthcare analytics.


Solution: Built a scalable causal discovery pipeline that automates preprocessing, inference using Tetrad, and visual analysis of longitudinal EHR data. Enabled clinicians to explore patient trajectories and identify key factors influencing outcomes.

Tech stack: Python, SQL, PostgreSQL, Tetrad, YAML, Pandas, Matplotlib, DNAnexus

EHR Team at CMU x DNAnexus 2025

Problem Statement: Understanding multimodal travel patterns and behavioral shifts is critical to incentivising rail usage and reducing car miles across Scotland.


Solution: Built a scalable analysis pipeline to quantify the impact of rail disruptions on mobility using ScotRail and road traffic data. Modelled travel behaviour shifts and estimated carbon savings to inform rail-first transport policies.

Tech stack: Python, PostgreSQL, Scikit-learn, GeoPandas, NetworkX, Knowledge Graphs, Matplotlib, Jupyter, Git

ScotRail Team at DSG 2025

Problem Statement: Enhancing the specificity of mRNA drugs by predicting optimal binding regions for the GAG protein, a key step in targeted therapeutic design.


Solution: Identified potential mRNA-GAG binding sites using BLAST and ColabFold, then applied deep learning to predict and quantify binding potential. Modelled interactions using PST-PRNA4 to support in-silico validation for our industrial partner, bYoRNA.

Tech stack: Python, ColabFold, BLAST, PST-PRNA4, Deep Learning (TensorFlow/Keras), Jupyter, Git

Team at Genopole Hackathon 2024

Problem Statement: Design a robust and optimised route for the Olympic flame relay that ensures uninterrupted radio coverage across all regions.

Solution: Engineered a geospatial dashboard integrating geocoding, radio coverage simulation, and real-time visualisation. The solution emphasised reproducibility, robustness, and adaptability, ensuring connectivity throughout the relay path.

Tech stack: Python, OpenStreetMap, Geopandas, Folium, Django, PostgreSQL, Leaflet.js, Git

Team at Genopole Hackathon 2024

PUBLICATIONS

Presented In: 11th European Big Data Management & Analytics Summer School (eBiSS 2023) Barcelona, Spain

Shah, S.S., Gupta, R.A., Jardosh, P.M., Nimkar, A.V. (2022). Prosodic Speech Synthesis of Narratives Depicting Emotional Diversity Using Deep Learning. In: Gandhi, T.K., Konar, D., Sen, B., Sharma, K. (eds) Advanced Computational Paradigms and Hybrid Intelligent Computing . Advances in Intelligent Systems and Computing, vol 1373. Springer, Singapore


DOI: 10.1007/978-981-16-4369-9_4

Published In: Advances in Intelligent Systems and Computing ((AISC,volume 1373)) Springer, Singapore

R. Khara, D. Pomendkar, R. Gupta, I. Hingorani and D. Kalbande, "Micro Loans for Farmers," 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2020, pp. 1-5


DOI: 10.1109/ICCCNT49239.2020.9225577

Published In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)

Contact Me