Fighting AI Fire with ML Firepower – University of California San Diego

Posted: May 5, 2024 at 2:42 am


without comments

Zhifeng Kong, a UC San Diego computer science PhD graduate, is the first author on the story.

Modern deep generative models often produce undesirable outputs such as offensive texts, malicious images, or fabricated speech and there is no reliable way to control them. This paper is about how to prevent this from happening technically, said Zhifeng Kong, a UC San Diego Computer Science and Engineering Department PhD graduate and lead author of the paper.

The main contribution of this work is to formalize how to think about this problem and howto frame it properly so that it can be solved, said UC San Diego computer science Professor Kamalika Chaudhuri.

Traditional mitigation methods have taken one of two approaches. The first method is to re-train the model from scratch using a training set that excludes all undesirable samples; the alternative is to apply a classifier that filters undesirable outputs or edits outputs after the content has been generated.

These solutions have certain limitations for most modern, large models. Besides being cost-prohibitiverequiring millions of dollars to retrain industry scale models from scratch these mitigation methods are computationally heavy, and theres no way to control whether third parties will implement available filters or editing tools once they obtain the source code. Additionally, they might not even solve the problem: sometimes undesirable outputs, such as images with artifacts, appear even though they are not present in the training data.

Read more here:

Fighting AI Fire with ML Firepower - University of California San Diego

Related Posts

Written by admin |

May 5th, 2024 at 2:42 am

Posted in Machine Learning

Tagged with




matomo tracker