University of Wollongong
Browse

Enhancing Privacy and Security in Cloud-Based Machine Learning

Download (5.29 MB)
thesis
posted on 2025-10-14, 01:28 authored by Duy Tung Khanh Nguyen
<p dir="ltr">Machine Learning (ML) has been widely applied across various domains, including computer vision (CV), natural language processing (NLP), automatic speech recognition (ASR), and recommender systems (RS). The success of ML models largely depends on data availability and computational resources. For instance, ChatGPT , a widely used large language model, was trained on an extensive dataset derived from books, websites, and other text sources, encompassing approximately 570 GB of text data and containing a total of 175 billion parameters. Training such a model requires massive computational power, involving thousands of GPUs and consuming weeks or months of training time. Therefore, training a ML model is a challenging for users who lack access to sufficient data or computational resources. To address this issue, cloud-based ML - also known as ML as a Service (MLaaS) - has emerged as a scalable solution, enabling users to leverage pre-trained models or train custom models using cloud infrastructure.</p><p dir="ltr">MLaaS offers two primary services: inference services (IS) and training services (TS). In IS, a client sends data to a cloud server hosting a pretrained model and receives prediction results. In TS, a client outsources model training to a cloud server with high computational capacity and obtains the trained model. Despite its advantages, MLaaS presents significant privacy and security challenges. In IS, client data often contains sensitive information; thus, exposing it to the server raises privacy concerns. In TS, the outsourced training process is vulnerable to backdoor attacks, where a malicious server implants hidden functionalities into the model, raising security concerns.</p><p dir="ltr">While extensive research has been conducted on privacy and security in MLaaS, several research gaps remain. For privacy concerns in IS, prior works propose secure inference (also known as oblivious inference) methods, enabling clients to obtain predictions without exposing plain data to the server. Cryptographic techniques such as homomorphic encryption (HE) and secure multi-party computation (MPC) have been widely adopted in secure inference. However, these approaches suffer from high computational and communication overheads. This thesis introduces more efficient protocols that achieve a better balance between computational cost and communication efficiency for secure inference. For security concerns in TS, existing research extensively examines backdoor attacks in CV, NLP, and ASR, but their impact on RS remains unexplored. Furthermore, the recommender systems as a service (RaaS) paradigm is increasingly adopted, where e-commerce companies outsource RS model training to the cloud (e.g., Amazon Personalize). This thesis investigates backdoor attacks in RaaS and proposes robust mitigation strategies to enhance its security.</p><p dir="ltr">By addressing both privacy and security challenges in MLaaS, this thesis contributes to a robust framework for mitigating privacy risks in IS and security threats in TS, making safer and more trustworthy MLaaS.</p>

History

Year

2025

Thesis type

  • Doctoral thesis

Faculty/School

School of Computing and Information Technology

Language

English

Disclaimer

Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC