A. The Model and Wights are accessible to the attacker
B. Model Endpoint (API) is hacked to access the model without limits
C. Model Extraction in which the attacker uses the model to label unlabelled data and generate a surrogate model
Access scenarios for ML models. (A) A white-box setting allows the attacker full access to the model and all of its parameters but not (necessarily) to the model’s training data. (B) In a black-box scenario, the attacker has no direct access to the model but instead interacts with it over an application programming interface (API).
Process of a model extraction attack. The attacker holds auxiliary data from a similar distribution as the target model’s training data. Through query access, the attacker obtains corresponding labels for the auxiliary data. Based on that data and the labels, a surrogate model can be trained that exhibits a similar functionality to the original model.
Categories of Watermarking techniques
- Embedding Watermarks into model parameters
- Using Pre-Defined Inputs as Triggers
- Trigger Dataset Creation Based on Original Training Data
- Robust watermarking
- Unique Watermarking
- Fingerprinting
Not really, but pretty good. One could generate output using the GPT model and then use another model to reword the output. Replacing a few words is still likely to maintain the signature in the text generated by GPT, ChatGPT, and InstructGPT.
The shallower the depth of the network, the easier it is to remove or evade the watermarking. The watermarking techniques that use a separate set of nodes for “tagging” are relatively easier to remove too.
Closing thoughts
We hear a lot about Ethics in AI. It is a fuzzy concept. But IP is more concrete. Regulation of IP in AI is as important as the AI itself. It needs a concrete framework. The research by the experts in the cryptography, AI, and IP protection fields is invaluable to protect a potentially trillion-dollar industry.
Important Links
Comments
Post a Comment