Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

arXiv:2601.04553v1 Announce Type: new
Abstract: According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost and foster innovation, it is often the case that pre-trained models are fetched from model hubs like Hugging Face or TensorFlow Hub. However, this introduces a security risk where attackers can inject malicious code into the models they upload to these hubs, leading to various kinds of attacks including remote code execution (RCE), sensitive data exfiltration, and system file modification when these models are loaded or executed (predict function). Since AI models play a critical role in digital transformation, this would drastically increase the number of software supply chain attacks. While there are several efforts at detecting malware when deserializing pickle based saved models (hiding malware in model parameters), the risk of abusing DL APIs (e.g. TensorFlow APIs) is understudied. Specifically, we show how one can abuse hidden functionalities of TensorFlow APIs such as file read/write and network send/receive along with their persistence APIs to launch attacks. It is concerning to note that existing scanners in model hubs like Hugging Face and TensorFlow Hub are unable to detect some of the stealthy abuse of such APIs. This is because scanning tools only have a syntactically identified set of suspicious functionality that is being analysed. They often do not have a semantic-level understanding of the functionality utilized. After demonstrating the possible attacks, we show how one may identify potentially abusable hidden API functionalities using LLMs and build scanners to detect such abuses.