Trusted AI and Cybersecurity with Dr. Yingjie Lao
Listen Now
Yingjie Lao [00:00:00] Â I think the industry is driven by the performance. The security and the privacy will only be a secondary consideration, but our researchers feel that’s very, very important. So that’s why we’re working on this field.
Patricia [00:00:30] Welcome to another episode of The AI Purity Podcast, the show where we explore the complex intersection of artificial intelligence, ethics and its societal impact. Today’s guest is an esteemed researcher and educator whose expertise spans the intricate domains of hardware security, cybersecurity, and trusted AI. As an associate professor at Tufts University and a recipient of prestigious accolades including the NSF Career Award, our guest’s groundbreaking research has illuminated the critical intersection of AI technologies and cybersecurity, unraveling the complexities of safeguarding AI systems in an era rife with emerging threats. Join us as we discuss the challenges of ensuring trustworthy AI and the evolving landscape of cybersecurity in the age of artificial intelligence. Welcome to the show, Dr. Yingjie Lao!
Yingjie Lao [00:01:16] Yeah, thank you for the introduction! And I’m happy to to chat about hardware security, trusted AI in general.
Patricia [00:01:24] Thank you so much for being here! Well, please tell us what initially sparked your interest in computer science and engineering.
Yingjie Lao [00:01:30] Just like when I grew up, you know, all those technologies that keep evolving, but not, I mean, it’s modern right now because of AI. But at the time, still thinking about like 20 years back, like a lot changed because of the tech knowledge. So then, I just feel like I’m very interested, and I feel like the technology to be like a very fascinating to me. So then I decide to, pursue a career in the, like, computer science, engineering.
Patricia [00:02:01] And was there a pivotal moment or experience that solidified your decision to pursue a career in academia and research?
Yingjie Lao [00:02:09] A great one for me is when I was just about to graduate from my PhD study. So, I applied for both the industrial jobs and the academic jobs. But I end up going to Broadcom to work for like a little bit over a year. And then I decided to come back to academia. And a bit of history is at that time, Broadcom was acquired by Avago. So you know, so there were a lot of like concerns raised in the industry. But for me, that also played a part, but on the other hand, then I get the idea about what the industry is doing, and obviously doing my PhD, I kind of know it was the type of research, the type of work in the academia setting. So then, I figure I probably like, the academia setting more. So I decided to come back to academia and then just like, working on those different type of research.
Patricia [00:03:17] [00:03:17]And what was it that inspired your focus on hardware security and cybersecurity within the realm of computer science? [6.1s]
Yingjie Lao [00:03:23] [00:03:23]Yeah. So, my PhD actually is more about hardware security because I do have a background of, like, integrated circuit design, like those hardware architecture and the [10.6s] [00:03:34]various [0.0s] [00:03:35]stuff. But there are interesting security, because at the time, I was just trying to explore different, areas, different, research topics. But for the security, it’s really like you have the attack and the defense. You need to to be in the adversary. So always like, you need to keep the other party or the other counterpart in mind when you, for example, when you develop some defense mechanisms. You need to be aware of the existing or the most advanced attacks. So then try to come up with a countermeasure against those attacks. So then, not only just that. It’s also like, you can always keep going. You have an attack, you have a defense that you can just keep going. So, I just find that, like, very interesting to me. And also, at that time, the hardware security is pretty new, like a very emerging field of research, of study. So, a lot of room for improvement. So, that’s why, like, I got very interested and decide to pursue my PhD in that direction. But now, I’m actually expanding my research to a lot of different domains. But mostly, it centers around the security and the privacy. Not only hardware security, as you mentioned, the cybersecurity and the trusted AI. So trusted AI is also also, security, privacy of AI systems. Now, there’s a lot of other colleagues or researchers too. So, I’m actually focusing on the large language models. So, so we’re working on, like, the security and the privacy issues of, large language models. So, those may not be, like, directly relevant to the hardware, but, about security. [117.0s]
Patricia [00:05:33] [00:05:33]I would love to talk about that later on as well, how we can, you know, secure our LMS. But I wanted to ask you, how do you think researchers and developers approach the balance for the need for security and privacy in AI systems with the desire for innovation and performance? [15.2s]
Yingjie Lao [00:05:49] [00:05:49]Yeah, I think that’s a very important question. I’m not sure, like, how to address that. My opinion is there clearly is the need. But when you look at the industry, so not only just for the hardware security, but I think also for the overall cybersecurity or large language model security privacy. So the same thing is like, industry most of the time it’s driven by performance, but when we want to add some security and privacy mechanisms, so there are always going to be a trade off. So it will either be the performance or the efficiency or like some other like performance relevant metrics. Then I think from like what that happened so far, the industry is always like kind of reluctant to deploy those security or privacy mechanisms unless they have to. So, because there is a trade off, and also for the hardware, for example, the hardware security, when we add something, there is always going to be some overhead. It could be like error, consumption speed, power and energy consumption. So even if it’s as small as like 1 or 2 or 3%, I think the company typically will think of that to be too much. And also those may not necessarily help with them to attract more customers. From the profit perspective, it may not help, actually. So then, I just find it might be very difficult to convince the industrial companies to deploy those hardware security countermeasures. But I think that’s the same for the trusted AI. So now just the model, I think you’re aware they’re subfield of so called interpretable and explainable AI. So those models, in order to, try to explain what’s going on, so the [126.7s] [00:07:56]method [0.0s] [00:07:57]can only work with the relatively smaller models, and you can, we can understand, like what’s going on, why this model arrived to this decision. We can come up with some explanation for that particular decision, but now, it comes to those state of the art models, very deep, neural networks, or even now, the large language models. So there are definitely there’s like, not such tools can help us to understand why the model behaved like this, why the model like, outputs something – or like their works. Try to understand that, but still, like, we all consider, nowadays, those large models, they are like black box instead of white box. So there, I think there is clearly the need to understand that, but I think the industry is driven by like, the performance, not I will say, security and the privacy will only be like some secondary considerations. But, our researchers, like we all feel that’s very, very important, for both the hardware security and thr cybersecurity, or even, like, trusted AI. So that’s why, like, we’re working on this field. So along with the industrial development, we hope we can come up with some protective mechanisms. We can come up with some mechanisms that can help enhance the security and privacy of those different systems, different applications, so that there will not be a very severe, I mean, bad consequence later on. [105.1s]
Patricia [00:09:42] [00:09:42]Yeah. And could you share some insights of the strategies that researchers and developers employ to stay ahead of these emerging threats in the field of hardware security and AI, just because it’s as important, like you said, with performance. [13.2s]
Yingjie Lao [00:09:57] [00:09:57]Yeah, yeah. So, and actually, that’s always the question we get when we try to publish a paper or write a grant proposal for emerging, like, attack, those vulnerability threats. I mean, as the people always tended to say, “This never happened. Maybe it will not be an issue later on.” But I think that’s like a really is the nature of the research. So, we always try to stay ahead of what’s happening right now. So we want to be proactive instead of waiting for something to happen then we try to find a countermeasure at that time. So we always try to think ahead of the time and try to think about what’s going to happen. And then if that happens, what kind of countermeasure we can come up with The so called, like, a threat model. So, we always try to think about a threat model that may not actually happen like today or this year. Maybe something can happen, like in two years, in five years down the road. So that’s why we tried to publish a paper or come up with a research, like, say, focused on the attacking side of this security. And then once we have, like, an emerging attack and the way we think that’s possible that could be a realistic threat later on. And then we can start to develop those defensive solutions, countermeasures to try to detect those possible attacks and then try to mitigate those threats. But I think it’s very important. We need the whole community to just think ahead. So we don’t really want those attacks to happen, actually, in the real world. So we want to, address them before they can happen. [123.5s]
Patricia [00:12:01] [00:12:01]And talking about, like, the community looking ahead and kind of like, preparing for these, like, threats and attacks. How do you see the role of interdisciplinary collaborations between developers, researchers, even social scientists and computer scientists in advancing efforts in hardware security and AI? [17.6s]
Yingjie Lao [00:12:20] [00:12:20]That’s actually also very important. I have been involved in many of those interdisciplinary projects. So we look at the different security or privacy issues, but probably not so much for the hardware security. Hardware security, like, we definitely need input from the industry in terms of – because, you know in the academia setting, I mean, the kind of product we target is relatively small. It’s [28.5s] small in scale. [00:12:51]But if for, like, those methods we developed can scale up to those commercial industrial products. So, that’s a question we have to study. So then we needed to work with our industrial partners, developers, to understand, like, what are the potential issues when we scale our method from the academic setting to the industrial set. But I think it’s more important for the collaboration with, like, social science and other areas of researchers on trusted AI. So, there’s a big discussion nowadays on those so-called power-seeking AI or the AI with subjective experience. So, there is huge debate, like, do we need to, to make sure the AI can have some sentiment or self-awareness. Before, like, the AI can be power-seeking. So, a lot of those questions are not only engineering or technology questions. So, they are also very relevant to those social science, even like neuroscience, and the medical field, like, researchers who study the brain, like how human behavior, human actions that are relevant to the environment. So all those, I think, are pretty essential to address. And the big question about the AI, like how the AI can actually go, what will be our goal to actually develop AI to mimic humans or do we want to develop AI to just stay as AI but not really as intelligent as humans. So, I mean, for me, I cannot answer those questions for sure. So, we certainly need, interdisciplinary efforts. [122.5s]
Patricia [00:14:55] [00:14:55]And as an educator, how do you integrate your research findings into your teaching curriculum to prepare students for the evolving landscape of this technology? [7.7s]
Yingjie Lao [00:15:04] [00:15:04]We always do. We always try to integrate the latest findings in the research to the course. So, this semester actually, I mean, just the past, we just wrapped up this semester, of course. So, I was teaching, like, a course called AI Security and Privacy. Then, students are all very interested in like those very recent threats. Like, most of them centered around the large language model like ChatGPT, those type of security vulnerabilities. So, then obviously, we need to integrate those nearest findings from the research community or from the developers into the curriculum. But I would say, it will be easier for special topic courses, what I was teaching, just like where we focus on one, like, kind of research topic, [59.9s] but it would be slightly challenging [00:16:07]to integrate nearest findings into, like, more regular courses, like required courses in the undergraduate lab. So, for the graduate level, we always try to come up with course projects to ask a student to work on some emerging directions and the area they’re mostly interested in. And as for those projects or research to- I mean, the course topics, most of the time, so they are like very, novel, I mean, very new, so then we have to integrate the research into those courses, and I think that’s a beneficial, like, to have the students, like, be informed of the state of the art and the newest development in this research topic. [51.9s]
Patricia [00:17:00] [00:17:00]Absolutely agree. And earlier, you were talking about large language models. I wanted to get your thoughts on generative AI tools and large language models like ChatGPT and its widespread use for both as a tool and being used for bad in educational settings, because, obviously, you are an educator as well. [16.9s]
Yingjie Lao [00:17:18] [00:17:18]I think that actually helps a lot, but not so much on the education setting but like, I mean, research wise. That’s my own experience. So, I work with, like, a group of graduate students. So now, like, I think definitely they are very good at using ChatGPT. So, the paper, like, they can write a much better paper, than say like, 2 or 3 years ago, than the time, like, there were no ChatGPT or other large language models. So, I think that’s definitely a very good, a very powerful tool, but just we cannot completely rely on the ChatGPT. So, my suggestion for my own students is if you want to use ChatGPT to help you to write some paragraphs, you’d better not copy the entire paragraph, because it’s still, like, very clear, and not only for those research papers, even for the emails, like, when I receive email from students, like, you can clearly tell this email is from, like, ChatGPT, or like, another email is, like, human, like handwritten, like manually written. So, there’s a clear difference. But when we submit a research paper, you don’t want others to know, like, the paper writing is done by ChatGPT. And we do have concern about those missing information as well, because sometimes, ChatGPT might, or other large language models might put something that just doesn’t make sense. So, we have to do quality control. But overall, I think, by using the ChatGPT, my work and the students work, well yes, like, much easier than before. We’re still thinking about how we can integrate those large language models into the education setting. So, I’m not working on it right now, but I know a few colleagues. So, they’re working on – one example is, like a group of colleagues of mine, not at Tufts, in our hardware community. So, they’re trying to develop a tool that can used to help students to debug hardware, like, those coding, so-called hardware description language. So, because the current commercial large language models are not that focused on those HDL. So, they’re very good at Python or other programing code, but not so much on the, like, hardware-related codes. So then, just like a group of researchers, they are trying to develop a large language model tool that can – that is dedicated for the hardware development. So, they are working on those, and I think those could be, like, very valuable and useful for students, likely, in those areas of hardware design, hardware development, and the testing. [208.8s]
Patricia [00:20:48] [00:20:48]Well, on the other hand, I wanted to get your thoughts on AI-generated content detection platforms or AI text detectors, for example. Are you for the use of these? Because I’ve talked to educators who are, you know, open to the idea. Some of them are not, because it might impinge on, obviously, the essays are the intellectual property of the students, and you know, professors or the educational institutions may not have the right to input their essays or their data into these platforms. So, and I think that would impede on their, you know, data privacy and security. So, I wanted to get your thoughts on AI content detectors. [36.9s]
Yingjie Lao [00:21:26] [00:21:26]Right. So actually, well, my current project is large language model work, [5.9s] [00:21:32]Tamaki. [0.0s] [00:21:32]Just try to give the generator output a watermark, and then later on, just a mechanism that can detect if this, I mean, the generated content is from, like, a particular larger data model, I mean, the watermarked large language model. So, in general, I think that’s a very important, like, question, a very important challenge to address, but there are several concerns about like how to do it. So first, like, I want to mention about why it’s important, because I think now still, like, the best strategy as I mentioned earlier for the students, like, they should not completely use, like, whatever generator from the large language model. So, they still need to spend a good amount of effort to, like, edit the generated text before they can actually use it. So, that’s a starting point. But I think later on, like, maybe people or maybe there is already someone, like, they feel very confident with the generated text from the large language models, and they directly use that for certain tasks. So, there may be dangers, like, some of those information may not be true, right? So, that could potentially cause some harms. And also, I think the main issue later on will be we are in the era of big data. So, a lot of data are generated by sensors, like different type of sensors, cameras or, like, social network. A lot of those contents are created every day, every minute. So, that’s big data. So, we can use those big data to train some models to perform, like, a variety of tasks. But I think the issue later on will be we’re now using large learning model to generate the data, and then, I don’t know, like maybe ten years from now, so the the world, I mean, the data in the entire world, maybe, like, half of them, they are real data and then half of them are generated data, and then those generated data may be dominant later on. Then, like, what we are going to deal with, just like all those generated data. So, if everything is going well, then maybe it’s fine. I don’t know. Even if as soon, like everything is going well, it still may not be fine, because you’re kind of feeding those generated data back to large language model. Maybe [166.2s] that are required [00:24:19]to something else. May not be just some kind of model at that time, but those data are not real data, right? They are generated data. So then, something may happen. So, I don’t know. So, I have concern about that. But it will be more concerning if those generated data, like, they have some bias. They have some specific distribution that is different from the real data. So then, like all of those data, like, I mean, what kind of shift from what we are right now, I’m not sure if that will be a good thing or that would be a bad thing. But certainly, that will be something to be, like, concerned of. And then, in terms of detecting the AI-generated content, in my opinion, I feel it will become more and more difficult. So, [57.5s] on one hand. [00:25:18]I know, like, OpenAI, they tried to develop some tools to detect the generated content. They are OpenAI, right? They know what they are doing. So, they know, like, how they developed the ChatGPT. But they – I don’t know what’s the current status, but last time I checked, they stopped their development for the AI detector in July last year. So, the issue is just, like, the accuracy of the detection seem to be very low. In order to use those AI detectors, I think those detectors have to have a very high competence and a very high accuracy. Then you can actually reliably use that to say something. So, just for example, if we have a paragraph that’s detected by, like, some detector says it’s 80% of chance that it’s generated by AI, but 80% doesn’t really mean anything. So, you cannot use the 80% to do anything. So, there were news that – I think it was last year, like, one faculty member in one university failed the entire class, because they – the class, they put, like, a project, or report, or essay into some kind of detector. They even asked ChatGPT and just got a response, like all of those are generated by AI. So then, they think that’s the AI instead of plagiarism. It’s AI plagiarism. Then, they kind of failed the entire class. But I think just, like, in order to do something like that, you have to have, like, a detector, it’s proving to be very reliable, very accurate, with very high competence. 80% is not good enough. So, that’s a challenge, in my opinion. But on the other hand, I think the another challenge will be – I’m not sure what those companies, like large language developers, what they’re thinking. But if they try to really mimic the generated text as like a human-written language, so try to close the gap between the large language model and human, then the gap will become smaller and smaller. So now, you probably can see this. Do you see a good difference between, like, the emails from ChatGPT or human written, but as the technology evolved, I mean, maybe some companies want to close that gap. And then later on, when ChatGPT writes an email, maybe I cannot tell the difference. So, at that time, then obviously, it’ll be extremely difficult to detect whether, like, a text or email is generated by a large language model by AI or not. But I think that’s a trend. So, the gap is gonna be smaller and smaller. Then how we can detect that? I think it will be, like, more and more challenging. But on the other hand, I think it might be a solution. Just like instead of relying on the large language model, we can add some watermark, right? We can add something into the generated text. So, on purpose. And then later on, we might be able to use that to detect if the text is – if the content is generated by AI or not. [233.8s]
Patricia [00:29:13] [00:29:13]Oh, and could you talk about that research part that you were talking about? About watermarking. So basically, if ever you use an LLM, and it generates something, your tool would then put a watermark, so then, it would be known that this was an AI-generated content basically?[15.4s]
Yingjie Lao [00:29:30] [00:29:30]Right, right. Something like that, but the watermark also has to be, like, still stealthy enough, like invisible enough. So, we don’t want it to affect the performance. So, there are, like, a good amount of research, papers already. So, we try to watermark the large language model, but so it’s just like the generated text will be different from those [25.0s] original model. [00:29:58]But there are several considerations. One is just, like, it has to maintain the original performance. You don’t want to embed the watermark then the performance becomes, like, very low. On the other hand, you kind of don’t want others to know that the text is being watermarked. So, it’s similar to, like, maybe, like, deepfake. To create a video, you want to have the video, like, people can, I mean, from the deepfake perspective, you want the video to look like it’s original, as true as possible. But on the other hand, you also want to like there’s a [44.9s] method, [00:30:45]and there is a mark hidden somewhere, later can be used to detect if this video is real or it’s generated by some kind of AI or deepfake. [12.3s]
Patricia [00:30:59] [00:30:59]And that’s really fascinating, and I think that would be a really innovative technology. And I think that also helps with transparency, because I feel like, you know, the use of generative AI tools and using LLMs, not just in educational settings, is going to be more and more adopted. I don’t think there’s, like you said, there’s no sure way right now to detect the AI-generated content. So, the least we can do is be transparent about what is AI-generated and what is human-written. And earlier, you talked about deepfakes. I wanted to get your opinion on that as well and your insight, because there are other types of generative AI tools that are available right now where it’s so easy to create a deepfake, it’s so easy for you to upload a photo of yourself and have it be, uploaded on another person’s body or, you know, create an entirely new image. What do you think are the security threats that this could pose to users? Do you think that these applications take the data of its users? Is it being used to train that model? What would you have to say about these other generative AI tools? [64.9s]
Yingjie Lao [00:32:04] [00:32:04]Yeah, yeah. So for us, that’s actually another, like, project I’m currently working on, like for AI privacy, but that’s the current practice. So, in order to use, some AI tool, whatever it is, it could be, like, a large language model or those, like, video generation, or those just 3D reconstruction tools, so you have to – there’s, like, a policy agreement, and you have to click “Yes” or “Agree” to use that. So, if you click “Agree” or “Yes”, then basically, you agree, like, you’re giving away, like, all of your data. So, we actually did those [46.5s] [00:32:51]companies [0.0s] [00:32:52]training. So, the basic assumption is whatever data you put to ChatGPT or any large language model, we need to assume it is the same as, like, we upload those data, into the public, right? On the internet. So, that’s the same thing. So then, whatever data we send to the ChatGPT, they are going to log them. That’s assumption. I don’t know, like, those companies, if they actually use the data or how they use the data, but we just need to assume that. Obviously, like, all those data later on could be used for training or could be leaked to somewhere, and we don’t have control of that. So, I know there are, like, some efforts, and the startup companies, they are trying to develop some kind of firewall just to detect if a particular input or particular query contains sensitive information or not. If not, then they’re just gonna be interfaced between the local company data and the ChatGPT. So before, all those data leaving the companies local, like, a network, so there is a firewall to check. So, if the firewall thinks the particular input contains something sensitive or private, it will just block those out or try to rephrase them before sending to ChatGPT. So, there are efforts of doing that, but I think it’s just very hard to address that issue. So, from technology wise, there are different ways of protecting the privacy. So there are like differential privacy, and then, there’s another area I’m working on, which is a fully homomorphic encryption. So, we’re, like, a ll the [123.4s] computation [00:34:56]will be performed in the so-called server attacks, which is, like, an encrypted message instead of the plain text. But at this moment, it’s still, like, too slow. And obviously, you need to work with the – those companies, because all the [17.1s] computation [00:35:14]need to be in the encrypted form as well. So, instead of the large language model, then now, OpenAI need to host the so-called the encrypted chatGPT. So then, they need to be willing to do that, but I think it still will be very, very difficult, because there are huge incentives for those companies to lock all the data. So, I don’t know if they are going to use that for a different purpose. And also, I think another very important aspect will be, like, AI regulation and legislation. So, in Europe, they have those GDPR policy, which is a privacy policy. So, a lot of companies actually received huge fines for violating those policies in Europe, but I know the US is also in the process of working on, like, those AI regulation and legislation. So, that might help to improve the user privacy. But in general, I think that’s definitely a concern, how to how to use that, how to use those tools without compromising the user data privacy. So, that’s a big challenge. So, one way, just as I mentioned, where you can use differential privacy or you can use something, try to use those tools without giving up your personal information. But just sometimes, you cannot get the answer you want if you ask for something else as opposed to, like, the direct input where it contains those private information so you can rephrase it. But then, you can also get some kind of response, but it will not be that direct response to the original question. So then, how to balance those, and there are also, like, a variety of so-called privacy-preserving computing technology. But how to use that, which one to use, and how to improve the performance… these are very important research questions. [146.6s]
Patricia [00:37:42] [00:37:42]For those who might not be familiar. Could you explain what trusted AI means and why it’s significant in today’s digital landscape? [7.4s]
Yingjie Lao [00:37:50] [00:37:50]Yeah, I think that, I mean, trusted AI or responsible AI, some people use trusted AI, I think it’s really a broader term. That means we need to be able to trust the AI’s decision, their behavior, so we should not worry about there might be some bad [25.0s] [00:38:16]on active [0.2s] [00:38:17]impact that can be generated by AI. So now, I think in terms of the research technology perspective, it encompasses, like, a variety of research topics such as, like, AI security, privacy, explainability, interpretability, fairness, and also, like, privacy, so there could be more of those, but these are different considerations of AI. So, we hope AI to have those characteristics that can be fair, and it can be trusted, it can be reliable. But in general, I think the idea of just, like, we want to make sure the AI will not lead to some negative impact on the overall, like, human daily life. That’s a huge concern, and there is, like, a good debate, if we need to continue developing AI to do more tasks or if we really need to regulate AI first. When we talk about AI, I think you mentioned earlier, do we really want it to close the gap between the AI and the human? Or do we want to develop AI as just like something we know is very powerful, but is distinctively different from human, so we can tell it’s not human? Yeah, I think I don’t know. So, to me, I’m not sure. I’m looking into different topics, different aspects of those, but definitely, these are very important questions, and also very interesting, and the exciting research areas. In my opinion, trusted AI is very important. So, we all want to trust AI, so that we can comfortably use that. Otherwise, to me, like, still I don’t really trust, [128.9s] the text of a large language model as of now. [00:40:31]But on the other hand, I think that there’s also, like, a dilemma. Like, now we don’t really trust the AI, at least for me. So, I will take that extra caution to look at those content generated by AI. But if later on, like, we all feel very confident with AI, then we take any action, any suggestion by the AI. Put them into tasks, put them into actions. Even if there’s, like, 0.001% of chance it’s wrong, then it might lead to severe consequences. [41.7s]
Patricia [00:41:14] [00:41:14]Well, could you tell us how exactly to develop AI models that are trustworthy? [4.6s]
Yingjie Lao [00:41:20] [00:41:20]Yeah, but I think that’s a very tough question to answer. So, there are, like, several different topics as I mentioned earlier. So, we have to assume, we have to, like, take all of those into consideration. But these are really, like, all different tasks for the security. So, there are, like, different attacks, so we have to develop those countermeasures against all of those. Security and the privacy, there’s also a lot of issues, like, a lot of technology, I mean, there are different, possible ways to address that, but which one is the best? So, we’re still, like, I mean, the community is still, like, investigating that, but I think it’s just important to be aware of those security or privacy threats, to be aware of those concerns. And when we develop those AI models, like, at least we need to know, like, those obvious consequences for a particular attack, and then we try to come up with certain ways to protect or to ensure the trustworthiness of AI to a certain degree. But I think there’s a long way to go, but a good starting point is just we have to be aware of those issues. [89.8s]
Patricia [00:42:50] [00:42:50]And besides being aware of those issues, what else would you say are the most pressing challenges in ensuring the trustworthiness of AI systems? [7.1s]
Yingjie Lao [00:42:58] [00:42:58]I’m not sure, like, which one is the pressing needs. But now, it’s just, like, it’s really from the industry, those big companies, because we don’t really know. I mean, except for those open source large language models. So, there are, like, a few of – some companies, they are willing to share their large language models, then we can do some good study, but another issue for the academia is we don’t really have those computing power, like GPU nodes, as in the industry. So, we cannot really run those very large, like, the newest large language models here. So, for us, when we do the research, we just tend to look at those smaller large language models. They might be several, like, iterations behind the current state of the art. So, that means, if the current state of the art like the GPT 4.0 or something like that, which was announced yesterday, if it’s, like, qualitatively different from the open source large language model, then what we are doing may not be able to apply to newest large language models. So, we can do a good amount of research on those smaller models. I mean, the smaller large language models are already very large, right? But if those cannot be applied to, to those newest model, and also, those companies, they are not considering those, considering, like, what we’re doing, considering those security and privacy threats while they’re developing, then that might be a concern. But that part, I think, really those big companies need to take some initiative, try to protect the AI, like, security and privacy, try to ensure the trustworthiness of those AI systems.[129.2s]
Patricia [00:45:09] [00:45:09]And earlier, you were talking about this. Could you elaborate on the concept of homomorphic encryption and its relevance to ensuring privacy in AI applications? [9.0s]
Yingjie Lao [00:45:19] [00:45:19]So, the, homomorphic encryption [2.3s] idea is currently [00:45:24]on the cloud computing. I would say most of the cloud computing without using the homomorphic encryption. So, we do encrypted data during the communication. But once the data arrive at the cloud, if you want to do a certain operation, you have to decrypt it. On the other hand, the cloud will know – the cloud has access to all of your original data. So, we can think about maybe just, like, a certain cloud, right? We are uploading our pictures. So, if someone can attack the cloud server, they will be able to see all of our pictures in their original form. But for the homomorphic encryption, the concept is that all those data will be encrypted not only during the communication, but also during the computation on the cloud. That means even someone can attack the cloud server, they will only be able to see our image in the encrypted form, then they just basically cannot see anything, right? Those are encrypted, look like some kind of random image. They don’t know what’s really going on. So, there could be sensitive, private, like, image or information, but it will not be leaked to anyone else. So then, for the homomorphic encryption, like, we can still do computation on the cloud. And once the computation is done, the cloud can send those results to us, but those results will also be encrypted. So then, we can only use our personal key or private key to decrypt the result to get what we want. So, the private key, we don’t need share it with anyone else. So, that means, like, the entire communication and the computation will be 100%, private. So, that’s a very promising technology. But the issue is, now, it’s just still, like, in the early development stage. It’s slow, so it cannot scale to a very large neural network or the large language model as of now. But as the computing power evolves and also the [152.8s] government and [00:47:58]development, yeah, I think there could be a solution to completely address that, because there will not be any privacy leakage, because all the data will be in the encrypted form.[12.9s]
Patricia [00:48:12] [00:48:12]You know, I wanted to talk about the cybersecurity that’s in the age of AI. With the increasing integration of AI technologies into various sectors, what would you say are the primary cybersecurity threats that organizations should be most vigilant about? [15.3s]
Yingjie Lao [00:48:28] [00:48:28]So, I’m not, like, directly working on the cybersecurity, like, a different area, so I’m not 100% sure, but I think the issue now is just, like, it’s two [10.0s] [00:48:38]votes. [0.0s] [00:48:40]One is a lot of the attacks nowadays can be used by, I mean, you can use AI to enhance those attacks. So, there are, like, a variety of attacks. So, previously, you just collect the data and then use [17.9s] [00:48:58]computational [0.0s] [00:49:00]methods or optimization method to try to figure out, I mean, the keys, the password or something like that. But now, with the help of AI, all those methods, those attacks can be, like, way more effective or way easier than before. So, that means the defense mechanism, like, is a requirement for the defense, it will also be much higher than before. So, that’s the one concern. And the another concern to me will be, like, we know AI is powerful, and a lot of those detection or defense solutions, there are now, actually, a lot of startup companies or there are a lot of work efforts there by using AI to defend against those threats. But it’s also very important to be aware of that, like, when you have an AI in your system. So, also, the AI can do a very good job to detect those cyber security threats, but it’s also possible that AI could be the weakest link in the entire system. So, that means the adversary, the attacker, can target the AI model itself instead of targeting, like, the original cybersecurity problem application, right? Because we have the AI, and then, they can attack the AI to bypass the entire defensive solution, defensive system. So, that’s also another concern. I’m not quite sure, like, what’s the current state of art in that area, but I think just like when we use AI, we have to make sure those AI, they’re, like, reliable. But at this moment, there’s there’s still, like, a lot of potential issues of using AI. [108.0s]
Patricia [00:50:49] [00:50:49]And are there any strategies that you can share for organizations to employ to detect and respond to AI-generated threats effectively? [7.6s]
Yingjie Lao [00:50:58] I think that will just require… So, first, it has to be, like, too broad. It still will be very hard. I think nowadays, there’s no, like, universal solution that can be applied to, say, different attacks or different vulnerabilities. It could be, like, fairness, virus, privacy, security. So, there’s no such universal solution. So, what we can do, I think, is just like try to be aware of those issues, and then try to think about what’s really going to happen. So then, we need to narrow down the possible threat, and then we can try to address that. But if it’s too broad, I think it will be very hard. So at this point, we can only do something, come up with some mechanism or defensive solutions against that particular threat. So, we have to just understand, as you mentioned earlier, we need to think ahead of the time, and then just try to be aware of what’s going on right now in the world. So, try to stay informed. And also, when we develop, those different products, we need to be aware of those potential issues of AI.
Patricia [00:52:24] [00:52:24]And on a smaller scale. What would you say are some of the things people, users of these AI tools, can do to protect themselves of any online threat, especially with more and more people becoming more open to using AI tools? [13.5s]
Yingjie Lao [00:52:38] [00:52:38]Yeah, I think just, we need to be, like, to have some, like, critical, thinking. At least, I think, at this moment, when we use AI, we need it to not, like, completely trust the generated content. So, we need to think about, like, if that’s, like, if there’s some issues with that. So, when we use the AI – so, it’s very helpful. It’s a very good tool, but still, I think it will be a better practice that – not to completely trust the current tool. So, just think about what can go wrong. And also, when we use it, we probably can try to protect our private information. So, just to be aware, I think, like, maybe a lot of users may not aware be that whatever information you put into ChatGPT actually may leak to the public. So, just need to all be educated, aware of that. And then, when we use those tools, we try to protect our privacy without giving too much sensitive information. So, that’s from the user perspective. And also, as I mentioned, when we get to those content, when we get those responses, we need to think about, really try to judge those, how to use those response, how to use those content, and there are also ethical concerns. Like, we just need to be educated. So, just something you cannot use, just for example, the deepfake. I think a lot of people understand, like, you cannot use that just to kind of impersonate someone else, but there are still people who do. And also, to enforce that. Maybe AI regulation could be very useful, but how to take that in place and how to actually enforce those regulations, that’s a question. [139.3s]
Patricia [00:54:58] [00:54:58]And can you highlight any ongoing projects or initiatives that you’re particularly excited about in the realm of hardware security and trusted AI? [7.7s]
Yingjie Lao [00:55:07] Yeah. So, for the hardware security, I’m currently working on the homomorphic encryption, but we’re not really working on, like, the methodology perspective. So, because I mentioned earlier, those homomorphic encryption, they tend to be slow. So, we’re working on the hardware acceleration of those homomorphic encryption. But for the trusted AI, nowadays, we’re just mostly looking to those large language models. We look at the different type of attacks and privacy issues. And as I mentioned earlier, those large language model watermarking. So, another particular area could be interesting is I think there’s a huge push nowadays trying to deploy large language models on to, like, mobile device, the cell phones. So, that will also open up to, like, a lot of questions like how to secure those edge deployment of large language models. On the one hand, it can help protect the privacy, because those data, if you have everything wrong on the edge device, some of the data may not need to leave the mobile phone. But on the other hand, it can also open up additional, like, avenues for attacking vulnerability. But also during the development of those edge computing for the large language model, I mean, the model may also behave differently from the cloud counterpart. And then, we definitely also need to study those differences and thinking about if there’s any implications, the trusted AI perspective.
Patricia [00:57:04] [00:57:04]And just one last question before I let you go, how do you envision the future evolution of AI in terms of capabilities and security measures? [8.2s]
Yingjie Lao [00:57:13] Yeah, I think this will always be an important question in part of the field of study in my opinion. Just because according to the, like, the past a few years, the trend or the current development, I mean, the AI development strategy have been so fast. So, a lot of new technology will be popping up later on. And then, just those security privacy mechanisms and those different considerations for [00:57:52]email [0.0s] will always kind of lag behind, always the development. So, it’s very hard to catch up with, but it’s very important for us to understand those issues. But on the other hand, for those AI regulation, and I think it’s also important to address those issues, like, we confront the technology perspective and also from the regulation perspective, but I think that’s going to happen. Yes, both of those aspects will actually lag behind the development. So, there’s also, like, huge needs for these aspects, so we have to study, we have to be [00:58:33]ourselves yet. [0.5s] But I think later on, maybe we can, like, all work together, developers, legislators, and the researchers, different industrial sectors can work together to ensure the trustworthiness of AI, to make sure once we have a, like, much better understanding about the capability of AI or potential threats, risks of AI, then we might be able to come up with a much better practice in developing the next generation AI systems.
Patricia [00:59:12] Thank you so much, Dr. Lao! And before we go, is there anything else you’d like to share with our audience? Advice? How to protect themselves online? Anything you’d like to say?
Yingjie Lao [00:59:20] No, I think that’s pretty much. So, thank you for all the insightful questions. Thank you for having me here!
Patricia [00:59:30] Thank you so much! I’ve learned so much, and I’m sure our audience is going to learn so much as well from you. Thank you for gracing our podcast, the time and the valuable insights you’ve shared with us. And of course, thank you to everyone who has joined us on another enlightening episode of The AI Purity Podcast. We hope you’ve enjoyed uncovering the mysteries of AI-generated text and the cutting edge solutions offered by AI Purity. Stay tuned for more in-depth discussions and exclusive insights into the world of artificial intelligence, text analysis, and beyond. Don’t forget to visit our website, that’s www.ai-purity.com, and share this podcast to spread the word about the remarkable possibilities that AI Purity offers. Until next time, keep exploring, keep innovating, and keep unmasking in the AI. Goodbye! Thank you so much Dr. Lao! Have a great day ahead! Thank you, goodbye