More episodes
Telemetry Now  |  Season 2 - Episode 22  |  November 26, 2024

Understanding AI and LLM Security Risks with TJ Sayers

Play now

 
TJ Sayers, cybersecurity expert with the Center for Internet Security, joins us to explore the security concerns around AI and, specifically, large language models. From integrating AI and LLMs into workflows, safeguarding sensitive data with techniques like anonymization and tokenization, to navigating security risks and privacy concerns, we discuss practical strategies to mitigate AI risks. We also examine AI’s role in national defense, and we address the growing challenge of verifying the authenticity of content in an AI-driven information age.

Transcript

Most of us are using large language models pretty much every day, whether that's to help write some code, create an outline for a presentation, summarize a text, or maybe just as our new search engine. And this has led to some serious concerns about the security implications of using LLMs, especially the popular publicly available models.

And it's really gone beyond just the finance and healthcare industries. It's become a concern for all of us, for society at large, with reports of people sharing very personal information, perhaps accidentally, and experiencing very serious consequences using chat assistance and the various front ends of the world's most popular LLMs. So joining us again today to unpack this is my good friend TJ Sayers, a cybersecurity expert with the Center for Internet Security.

We'll be talking about how LLMs are being used by the less sophisticated and some of the most sophisticated threat actors that we face today. And we'll talk about how we need to think about security when it comes to this new realm of AI.

My name is Philip Gervasi, and this is Telemetry Now.

Hey, TJ. It is so great to have you back on the podcast. I love having you on because of your unique perspective, at least unique to me, on cybersecurity matters. I mean, I do remember we had a conversation once a while back on how, like, how specifically, you know, TikTok exfiltrates data off my phone, and that was really interesting. But I love talking to you and hearing your unique perspective about some of those macro things, like nation state actors finding new attack vectors and so things like that. So today, I wanna get into artificial intelligence.

Now I have been talking about it on Telemetry Now for quite a while from a a technical perspective, like, what it is, as far as large language models and generative AI and things like that. But there is a growing concern out there, a security concern, people putting information into prompts that perhaps they shouldn't be, and perhaps that there are bad guys using AI as a new attack mechanism.

And so that's what I wanna get into today. TJ, welcome. Great to have you again.

Yeah. Thanks, Phil. Always a pleasure coming on. Looking forward to the discussion.

AI has obviously burst onto the scene, and everybody's talking about it, but it certainly impacts the cybersecurity arena. And, victims are experiencing some pretty novel, tech attack techniques, that are kinda supported by the AI space. So looking forward to getting into some of that.

Yeah. Absolutely.

Now I do think it's important that we level set here because well, I mean, what do you mean by AI? We've been talking about AI on this podcast for a while.

I have just gave a talk out in Denver at AutoCon 2 about, large language models specifically, part of, you know, generative AI.

So, you know, is that what we're talking about, or are we talking about the heuristics and machine learning and data analysis, practices and workflows that we've been using for years to find patterns in data? My my company even does that to identify DDoS, attack, threats.

Is that what we're talking about? And then, of course, the subsequent attack vectors, the subsequent mechanisms that the bad guys can use to mine data and then to use that for nefarious purposes.

Yeah. It's a really good question, Phil, and it's it's been kind of a novelty too to see how things have evolved over the years because I still remember in the security space a lot of emphasis on, you know, heuristics based detection and behavioral based detection and all that type of stuff. And that really does have a lot of foundation in looking for anomalies and using, you know, automation and tools to find things that the naked eye would not really be able to decipher.

And AI has really, I think, increased and bolstered some of those technologies.

But mostly, at least from a security perspective, when I think of AI, I'm thinking of things like large language models, tools that folks are able to augment their voice to replicate near indecipherably someone else's voice. They're able to create deep fakes of content, both images and video, to deceive other users or socially engineer them, and that is certainly a different arena prior to five years ago even. Deepfakes was a point of discussion for a while, but a lot of the ones that were out there were just not really believable.

And we're just seeing deep fakes come out now where, really, you're it's almost indecipherable from reality if you don't know specifically what you're looking for in some of those videos or images.

It it is amazing what is going on in that space with the artificial generation of image and video.

I am a Trekkie. You may know that, and our audience probably and certainly does.

And I recently saw this video on YouTube that was William Shatner. It was captain Kirk.

Yep.

And he was saying goodbye to Spock, Leonard Nimoy. And it was, it was all AI generated.

Leonard Nimoy passed away in twenty fifteen, so, nine years ago now. It was unbelievable how accurate what how how accurate they looked and sounded.

But I will say that my personal experience and large language model side. And so, that's really what what's been more top of mind for me. How folks are, how that could be a security concern. You know, maybe people just dumping stuff into chat GPU that they shouldn't be.

Of course, it's more sophisticated than that. But, I I do wanna ask though, do you think that this is really just fear, uncertainty, and doubt? Is this just FUD? Are we overblowing this security concern, or is this actually something that keeps you up at night as a cybersecurity professional?

Well, I think I I'll give a two pronged answer. I think from a security perspective, again, kind of thinking as if you're a victim or a potential target of an attack, it's definitely not fear, uncertainty, and and doubt, because it's basically lowering the bar of entry for less sophisticated actors to be able to really, really effectively socially engineer, humans or the end user, in the initial phases of an, of an attack.

So historically, you'd use a phishing email or you'd make a phone call and you try to impersonate somebody.

AI models are making it really easy to do that, and it's making it extremely believable.

So from a victim and a security standpoint, absolutely not. I think commercially, though, I mean, it's completely ubiquitous now where you can see every single company, you know, every tool that we have now, there's some type of, you know, at least in name or claim only, right, AI features built into it. And I think some of that is fear, uncertainty, and doubt, and it's not really true AI, or even true, you know, applications in certain, contexts with things that hasn't been there. They're just rebranding it.

Right? I kinda talked about the heuristic side, behavioral analysis. That's now just kind of been whitewashed and branded AI and some commercial tools and whatnot. So I think there's a little fud there, but really on terms of how threat actors are using some of these technologies has definitely, changed the the world we are operating in and is making it much easier for victims to be targeted and compromised.

You know, it is so interesting that you say that. I mean, now that you say it, it's obvious to me, but I never thought of it that way. And and the thing that's really interesting to me is that that is one of the major, like, positive use cases, one of several, for using large language models in particular in network operations and IT operations. So imagine, you know, the democratization of information.

So imagine having, like, a level one NOC or a level one SOC engineer who doesn't know SQL queries or how to write in Python or what a what a Pandas library is or anything like that. Right? So they don't know how to how to do any of that stuff, but they can certainly, in a in a an advanced chat window, speak in or write in natural language, and a large language model can then translate that and, you know, pass that along to a greater AI workflow to do those data analysis operations. And we look at that as a net positive as a lowering the barrier of entry for anyone to do advanced data analysis easily, reducing mean time to resolution, making managing IT, infrastructures easier.

Wow. That's really neat that you brought it you know, kinda turned it around and said, it also makes it easier for the bad guys.

Yep. Yeah. Exactly.

Yeah. No. I I concur completely. And I I think the interesting thing with the security element is that, it it has, as you noted, enabled security defenders and network folks to not have as much hard skills on the coding front or things like that to kinda enter into the space and be able to do things. And we're seeing the same thing on the the threat actor side. So, you know, the human element, is really interesting from the social engineering perspective.

Sometimes you have people with foreign languages who are interested in targeting targeting, you know, English speakers or speakers of a language that they don't have. Right? They're able to use these models now to create a perfectly believable phishing email to go after that kinda user base in a different country or different language.

There's a lot of things you can do there now that you couldn't do previously. Right? And also same thing with, you know, malware payloads and and creating certain things with an initial code base. We're seeing large language models do that as well. Right? So we were actually able to leverage some of the more public language models out there and create, after a lot of prompt engineering and trying to circumvent some of the safeguards, a very, very basic ransomware sample that was just some super minor tweaks could have actually been leveraged in an attack.

So again, you don't have to understand how to program or script or anything like that. You can use a chat model to, you know, little bit of prompt engineering to get it to do what you want it to do. And next thing you know, it's pumping out some code for you to to have a a working base, with.

Yeah. For sure. And, I mean, technologists like to mess around with and break and figure out the weaknesses of new technology. So I I kinda, you know, I relate to that.

But at the same time, I I do understand what you're saying that these publicly available models so I'm thinking like using Chat GPT, which is the front end for GPT or Claude or or whatever else that's out there. Well, they're they're public models in the sense that we consume them, and they have whatever safeguards and guardrails, that they've chosen to build into it. And, you know, that's what we have. So, you know, as opposed to, like, a local model where maybe our IT department and regulatory bodies and CISO's office have some oversight into how restrictive or or how we scrub data, things like that.

So do you think then that, like, that enterprises should consider this beta technology in the sense that it's just not ready for prime time from a security perspective and therefore shouldn't allow it for folks in their organization?

Or can we use it in a safe way?

I mean, actually, are you aware of any organizations you know, you don't have to divulge if they're organizations that you're working with. But are you aware of organizations that actually just completely restrict and prohibit the use of things like ChatGPT?

There are some organizations who are still doing this, undoubtedly.

I my personal opinion is that's typically not a good idea, just because of how widely accessible and prolific this technology is nowadays that what will more than likely occur if it's completely restricted is that the employees at your company will just use it on their own system. Right? So they'll kinda be using it behind the scenes or it'll still be getting leveraged some way, somehow, and the organization won't have good control over, its usage.

Yeah. Like the whole shadow IT thing. Right?

Exactly. Yeah. Put in a policy, you know, make it accessible, but have guardrails around it's the best approach, not just to try to, you know, completely cut it out.

Right. Right. I mean, this is a really interesting conversation because it sort of presupposes a few things. First, that, you know, we we all believe that there should be guardrails and safety measures. And I think most security people and people in technology would agree with that, but also that, we all agree what the guardrails and safety measures should be in the first place. So for the publicly available models, you know, the main concern is probably that we're sending sensitive information over the public Internet. Well, we know how to do that, though, safely.

But we're sending it into, you know, OpenAI's, infrastructure or into Anthropic's infrastructure or Google's infrastructure, whatever model that we're using that's publicly available. How is that being secured on their end? Well, we work with companies like AWS and Google and and Azure to do that now too. So it really does raise a host of questions for me.

What what are the guardrails? What are we really trying to do here? And then that is actually, for me at least, in contrast to what some companies and organizations are doing, which is just run a model locally. So you download the compiled code of, like, llama three point two, and you can run it on your own physical hardware, perhaps completely air gapped altogether.

And that sort of solves some of those security problems. I'm sure there's more that we can discuss, though.

Yeah. Yep. Yeah. That's so you you talk about the largely, like, public models and whatnot.

And I think that's typically the concern is that, you know, we're using chat GPT or Google or Microsoft's version or or whatever. Right? And as a, you know, company that's concerned about, our own proprietary technology or information or customer data and whatnot, What's happening to that information if our employees are putting it into a public chat model? And I think that's a a large concern for a lot of organizations.

And, yes, local versions have been a method to get around that.

So I know in the cybersecurity space, the big cybersecurity vendors are using local models. Right? They're using it to speed up their workflows. They're using it to more quickly write things that they're seeing in the threat environment and get them published.

You know, it has that human oversight, but everything is just being used on some of these local models to speed up the processes that they have in place, but they're not using public models to to do that.

There's some interesting stuff being done, if you are gonna use a more public model, to try to safeguard your information or at least obfuscate the information that you are putting into these models. So even when you have, like, a commercial subscription, you know, the commercial subscription is, you know, your there's some more privacy along with that instead of just going right on to one of these models that you can browse to. No login, no no, you know, subscription, and you're putting stuff in there. That's kinda worst case scenario with where that data is going, but a lot of organizations have commercial subscriptions.

They're using these large chat models, but it's through the subscription element. And there's some talk about how do we protect or redact data that's being sent into these models. And, one of the ideas is, like, you can just send the minimum amount of data required. So you kinda know what you're asking of the model when you're putting in text, but you're going to limit including things like, you know, PII, people's names, addresses, like specific revenue numbers, maybe certain, like cybersecurity terms. If you're investigating a certain threat actor and you have a naming convention for them, you'd not you might not wanna put in the name of the threat actor and any specific dates that you're looking at just to try to give less context to the information going in there.

Which is interesting. I mean, I I know what you mean. What you're saying, correct me if I'm wrong, is that we are withholding information. We are withholding context that won't adversely affect the result of our our query or our prompt.

We'll get the result an accurate result back. And, and that's what we're holding in order to safeguard that transmission of information and and whatever, you know, with whatever model that we're using. But in IT operations, what we're trying to do is add context so that way we can get more accurate results. Now part of that is actually limiting the database from everything that exists, to just your maybe external database of metrics and logs and that sort of thing.

So you're using a rag system or perhaps you're fine tuning a model or both.

And so in a in a sense, you're limiting the context of the the data that the large language model is allowed to use to respond to you. But we wanna add to that database all of the important information so that it produces context. Right? So it produces the most accurate result possible. But I do understand what you mean, that we want to remove anything that doesn't adversely affect the response.

But, I mean, that's to me, that's kinda like a hard sell to tell an entire organization's worth of employees after we've been teaching them for two years all about prompt engineering. Like, don't put anything into the prompt that shouldn't be there. I mean, people are gonna do what they're gonna do. So, it feels like regardless of a lot of the technological safeguards and guardrails and and mechanisms that we put in place, we still have a people problem really at at the core. Right?

So that's that's you you nailed it. That's inherently the problem. Right? Is there if you're trusting the end user to put stuff in there, you know, are they redacting enough? Is redacting really helpful?

So there's methods to try to do this in an automated way. Right? So, before prompts are put into the model, you essentially have, like, a, you know, privacy screening layer as it were where any prompts go through that layer first, and it would try to automatically strip out certain things. So you could have certain parameters in there like names of employees or employee email addresses or, dates of birth, certain things that might fall within, like, HIPAA or PII or some of these are other larger privacy, compliances, maybe credit card numbers, things like that. Right? And that would strip that stuff out and then pass the prompt over to the model that you're using.

Another one is, what they call, like, tokenization, which is it seems new and novel, but it's actually used really widely in the financial services industry specifically for credit cards.

But basically, what happens is is it will try to replace that sensitive data with random data that can still be read by the system or model that it's being put into, but the actual link back to the token as it were is in a secured vault so that it can be referenced back to. So for instance, it would be like you put in a credit card number, a token is associated with that, the actual credit number and token is saved, but what's passed through to the model is just the reference to that information. So you're essentially preventing very sensitive information from going into the into the model. And then the last one is so it's just kind of like an alias based system.

So let's say Alice, is in the database. Right? Alice in the database would become, like, user one two three in database a. Then you'd have another database that ties user one two three, which is Alice, to being a premium subscriber of your service. Right? So if you're a threat actor, you'd have to have both databases and the initial input from that user to be able to tell who Alice is and what they're subscribed to.

So some of that's being done. Again, not not new, has been done in other context, but these are just some of the ideas of how do we use automated tools to try to sanitize content that's being pushed into models that are not local and organizations don't have complete control over.

Right. Right. Yeah. And and and isn't that the case here? Like, when you're using a publicly available model, like OpenAI's model, GPT, or, Anthropic's model, Claude, or whether using Google or Microsoft or any other large language model that's publicly available, you don't own this entire system.

And so, you don't have total control, over the entire thing. So that's inherently gonna have risks, associated with it. Now I know that we've been using public cloud in sensitive areas like in finance, like in health care, like in government for a long time. So, you know, there is precedent on how you can do this safely, but we are treading on new ground specifically with large language models and, like, everyone having a chat assistant on their laptop, sending information back and forth into into these models.

So how does that work exactly? What are there any regulatory bodies that govern or that we're talking about that should govern, how we handle sensitive information going back and forth? Is it is it industry to industry?

Is it something where there's oversight by some government agency?

No. I was gonna say, and most most of the, like, initial kinda automated scraping that would occur is really informed by the organization and their own AI policy. Right? So every organization is gonna have different stuff that they wanna take out of content your their employees are putting into the prompt.

So it's very organization specific. I'm just kinda giving some ideas from a security mindset of, like, what would you not want to include? And, like, a good here's a good example would be let's say we're involved in, like, incident response or we're conducting forensics on a compromise of a, you know, victim network, and we don't believe that the adversary is aware of them on the network. They're not aware we know they're on the network.

We don't want to possibly tee them off by putting information from that investigation into any public libraries, whether it be, like, virus total, whether it be, like, chat models, things like that. So there's ways where you can anonymize that information and still kind of look for what you're looking for. It's like you can add larger date ranges that you're searching for things. You can create pseudonyms.

Right? Maybe you're looking for threat actor a, but you're gonna call it something else. So there's that's kind of the same concept here. It's a very old principle where when you're conducting a forensic investigation, you know, virus total is very popular.

Punch in an IP, punch in a domain, it'll let you know a lot of, you know, network related and security related information about that. But a lot of people don't realize VirusTotal, even when you have a premium subscription, if you upload files or put in information into a database like that, other premium subscribers have access to the to the material put into those. Right? So threat actors will sit on things like that and say, holy cow, the domain we're using for this campaign was just put into VirusTotal.

They must be aware that something is fishy. Right? So it's kind of stuff like that that we think of from a security standpoint. We don't wanna tee off the adversary by, you know, unexpectedly or unwittingly putting prompts into a model that is maybe publicly accessible or is being pulled into another database for larger synthesis, and that's gonna tee off an actor that we're onto them.

Okay. Yeah. Right. Well, so, you know, until we have this perfected, how to use a publicly available large language model securely and safely. And for for some industries, it's probably fine as it is. But for those very sensitive organizations, or organizations that deal with very sensitive data, perhaps are, you know, regulated industries, you know, maybe maybe the answer is to use a local model and that's and that's it.

But I would like to transition to the macro, something I wanted to get into. I talked about in the introduction of the podcast today. What is happening at the macro level with large language models, perhaps new attack vectors, and how folks are using this technology to exploit others?

Yeah. Yeah. That's that's a great question and a good good pivot point too. So from a national defense perspective, with any emergent technology that's disruptive in nature like this, think back to, you know, development of, you know, nuclear technology or, the emergence of the Internet in the way that we know it or, you know, nowadays, AI capabilities. It kind of in a way does spark this arms race where near peer competitors, you know, and thinking of the United States, our near peer competitors would be like Russia or China, specifically.

It creates like a arms race, but this is not a traditional arms race and that AI in and of itself would be weaponized. It's an arms race in the sense of economically.

Right? AI is a huge driver for investment. It's a huge driver for research. There's a lot of, financial gain that could be made, through leaps and advances in, AI or the application of AI.

So interest to try to steal intellectual property, compromise organizations to get access to these models, you know, things like that has created a race in the national defense, national security space for these types of, nation state actors. It's also an arms race a little more traditionally, and I when I say traditional, I say, like, kinetically with, like, actual conventional warfare in terms of how does AI speed up, like, logistics, like moving troops? How does it speed up moving, artillery? How does it speed up, you know, the rate with which we can deploy weaponry in terms of a conflict?

Right? Missile technology.

A whole bunch of other areas. Right? AI in some way is going to improve and hone those foundational technologies, and that's an arms race in and of itself. Right? So the defense industrial base is really interested. How is AI going to do this? There's a lot of money and resources being poured into this, and that's creating a race between us and near peer competitors of who can kind of break through with this technology economically and defensively, before the other one.

So that's kinda big picture. You also have the element of we actually touched on it earlier where we have nonsophisticated actors are now becoming sophisticated in some way just by nature of having access to these tools.

So, again, good example, you know, cybercriminal, really mostly just a criminal, right, trying to steal money, trying to make, funds illegally, is now becoming a cybercriminal because they can leverage some of these chat models to create code. They can leverage some of these chat models to make highly believable phishing and, phishing emails. They can do vishing, right, where they're replicating somebody's voice.

They can use some of the models to generate imagery or videos that depict, maybe a CEO at an organization that they're trying to defraud. Maybe they're just doing traditional crime where they're calling the finance department at an organization, and they're trying to get them to change the routing address for someone's paycheck. Now they if they have audio of that individual's voice, they could try to impersonate that person. So you have, like, conventional crime, unsophisticated cyber actors becoming full blown cyber actors because of some technology like this.

So that's impacting national defense in the sense that, you know, state, local, tribal, territorial US governments, which is what the company that I work for, CIS, largely focuses on, are now becoming victims, from threat actor groups that, you know, typically may not have targeted them previously because they just weren't in the cyberspace.

It's also just making everything that cyber actors do much easier. Right? So you don't you don't see typos and formatting errors and things like that in in, you know, a good number of phishing emails anymore because they're being generated with language models that are, you know, good at avoiding that type of stuff.

So and then the last one would really be the information space.

So it's kind of been termed previously misinformation or disinformation, but, really, information warfare has been around for decades. Right? I mean, every war in, you know, history has had some type of information component where you're trying to, you know, deceive your adversary, conduct some type of psychological operation to make them do an action that they wouldn't have normally taken removed from that, you know, the removed from the investment and trying to sway them to make that action. And that's exploded with AI, largely because you are able to synthesize a lot of things on social media and in the larger Internet ecosystem and identify, like, highly divisive hotspot areas that you can very quickly generate content that is mostly true.

And I say, like, ninety nine percent, it has most of the context. It looks pretty good. Seems like it's really well written. The person knows what they're talking about, but there's a drop of poison in it.

Right? And that's kind of how you wanna sell a lie or try to deceive your adversary is you don't wanna just tell a flat out lie because it's kind of easy to detect that. It's like, wow, that's that's really outlandish. Like, nobody would think that or that's not true.

Right? They just wanna tweak it a little bit, and that's very hard as a human being just scrolling through things looking for stuff, and then you gotta draft it and write it yourself and then post it. It's very easy to say, look at all of this information based on the information that I just fed to you. I want a highly believable narrative about this specific location in the country.

I want you to talk about roads and, like, roadblocks and how the delays of traffic and commuting hours, you know, are impacting things, and I want it to be catered towards, you know, the financial services industry and specifically maybe someone in, you know, the payroll department. Right? And it'll actually generate, you know, a believable email for somebody in a financial services organization working on payroll, and it'll have things about, like, the location and, you know, that such and such route was all congested this morning, and I meant to send this email earlier and things like that.

Right? Then it's able to really, manipulate that end user into taking an action you want. And then bigger than that, you got the narratives on social media platforms and things like that where it's taking a, you know, highly divisive subject, and it's just trying to inflame tensions and create false imagery. It's trying to create false videos.

It's trying to just, like, stir up, you know, particular, groups and whatnot in the country to maybe revolt against the government or to to, try to vote against a certain policy maybe that a China or a Russia wouldn't be appreciative of, right, if the US enacted. Right? So there's sophisticated things kind of like that happening too in the AI space.

Yeah. Yeah. And and, I mean, this is a long term proposition. And these these, artificial intelligence companies are they're certainly in it for the long haul investing billions upon billions of dollars. And as has been the case for many, many years, you know, private industry and government, especially the US government, have worked together when there is new technology on the horizon, especially when that technology can help US interests on especially on a on a global scale. And I and I do know, just as an aside, I saw, recently that Anthropic has a national security policy lead on staff now. I'm not exactly sure what that role entails, but anthropic and I assume others, in the same, you know, in the same sphere are thinking about these things now.

Yeah. Definitely. I'll give a I've given this example a couple times before, but just in, like, in terms of, you know, what we're seeing adversaries do, maybe this will kind of bring it home in how some of these language models could be used to target certain, demographics that that may not have been as reachable previously from very unsavory and malicious folks. And the example I like to give is in a k through twelve school space.

So what we're seeing in ransomware writ large is ransomware actors will go into a k through twelve, school district or school. They will do what they typically do. They'll they'll probably have either purchased access through an initial access broker, or they'll have identified that access themselves.

They'll then get into the environment. They'll try to map out the environment, identify backups, logging capabilities, things like that. They will encrypt those and then encrypt the data in the organization.

Right prior to that, more often than not, they're extracting as much sensitive information out of the network as they can, before they deploy network wide encryption, and then they hold that organization for ransom.

And just think of the element of the data exfiltration.

What what kind of data is in a k through twelve school system, right? It's information on students, it's information on faculty, it is information on parents. So, like, socioeconomic status for the parents and the kids, the address of where they live at, what their school schedule is, what their busing schedule is, what their grades are, any physical or mental health records that are on staff with the school, disciplinary actions, maybe who their friends are, you know, if they got in trouble with their friends and it's wrote down that, you know, Sally, you know, had an engagement with Susie and, like, they went to the principal's office or whatever.

There's just highly granular information that is in a k through twelve school that's exfiltrated by some of these groups, that would be very, very appealing to a malicious actor looking to have follow on targeting opportunities, and that's what we're seeing. We're seeing the data data exfiltrated, and even when the victim pays the ransom, there's some cases where this data is still getting propagated in dark web spaces and getting purchased up by other groups.

So imagine for a second, as tough as it may be, you're a threat actor on the dark web and this data is posted.

What conceivably could happen is you will buy that data, you'll see that they're students about a particular age range or whatever or parents, you would then look through that data, use a large language model to say, hey, please, or, you know, make my text that I just put in above. Maybe you're saying, hey, you know, go to your parents' pocketbook, or go to whoever's pocketbook and pull out their credit card and make this purchase online.

You'd say something like, please translate this into the language that a thirteen or twelve year old would use. Right? And it will actually yeah. It will basically give you output of what you just put in, but kind of in the the short form emoji texting lingo that a demographic in that age group would actually use. Right? So that could be one way to defraud a young child into stealing their parents' credit card and go using it via, like, something in Minecraft or an online game. Another use case and even worse would be, the predator space where predators are trying to target kids.

They will be able to chat with children in that school district, and it will appear like it's another student in the school district because of the information they have, right? They could say, you know, in the case of Susie and Sally, who went to the principal's office, maybe that threat actor is going to impersonate Susie and try to reach out to Sally. Right? And it'll say, like, make me sound like I'm also an eight year old girl, when you're chatting with this with this student. Right? So just stuff that could never be done previously is being done with this type of technology, and it's opening up a completely different paradigm of how threat actors are doing stuff in cyberspace, and it's resulting in, like, physical real world manipulation and harm, of people, and particularly in k through twelve schools in this case.

Yeah. That's, that's pretty unsettling. I didn't I didn't enjoy learning that just now. But, what what can folks that are on the defensive do to mitigate this sort of attack? What can folks that are on the defensive do to help prevent AI being used in in this kind of a way?

Yeah. So from from the defender standpoint, I mean, I think the biggest thing is it's just faster detection.

It's really mostly an educational thing, that we're just trying to bring awareness that these things are happening. Right?

One of the things we're trying to kind of explore and the security communicate community is trying to explore is, you know, we see a ransomware incident happen.

Maybe it's a k through twelve school. Maybe we should just proactively notify parents and students and faculty there that this happened and here's some of the that could come along. Right? That's just on the information space. But if you're speaking specifically about how AI is being used, again, faster detection, trying to automate actions. So instead of, like, a SOC analyst reviewing an alert that comes in, doing, you know, searches through their own tooling, deciding, yes, this is a true positive, and then hitting, you know, enter and have an action get taken. That's just becoming an automated thing, so it's able to synthesize a lot of elements from disparate pieces in the security ecosystem.

So you'd have like an IDS, you'll have an, endpoint detection solution, You'll have intel feeds coming in. You'll have, logs coming in if you're, like, a managed security service provider, and the tooling that's available now, you know, you can label it with AI. It's been enhanced a bit, but, some of this was around previously.

AI technology is just being heavily invested in, and it's making some of this stuff better and faster. It's able to correlate that stuff, right, and then make a recommendation or even conduct the action in an automated capacity. So you don't have to have a human saying, yes, this is a true positive based on my analysis. Let's conduct this action. It's either doing it automatically or immediately providing a recommendation and saying, here's why you should hit yes yes that this is a true positive.

And then big data analytics is another one.

There's a, you know, a really simple case would be, you know, something happens. There's, like, fifteen news articles that pop up. There's reports from people coming in that they're seeing things similarly.

All of that could be fed in easily to a, you know, text based chat model and say, give me an executive summary of all of that. Right? And then that just saves someone a ton of time from having to read fifteen different reports, looking at all the reports coming in from people, and it just gives them the high level blurb of kind of what they're looking at.

So, again, speeding up the the process that people synthesize information and are able to make a decision, I would say are the biggest, like, practical things that I have seen.

There's a lot of claims and things that I think the the community is saying with AI and and whatnot, but those are the most practical kind of initial things that I have seen, being being done with some of this tooling.

Right. Right. Yeah. And, you know, I respect that. That is that is, there is a difference between the smoke and mirrors that is part and parcel of the AI hype.

Things that folks say AI can do or maybe will do shortly, and it's all speculation or it's just not true. And then and then, you know, contrasted with the things that we can use large language models for right now to help in our own operations, in this case, security operations. And, yeah, I do believe that we're gonna see some incredible advancements in the near future. But what can we do today?

Now, I also wanna ask, what can we do, with regard to deepfakes? We started the conversation talking about that very, very briefly, and I wanna get back into it. What can we do? TJ, you know that I'm a big, user of social media and, maybe a little bit less these days. But, generally, I'm a big user of social media, and so I consume various posts and and short form video and blog posts and things like that. What can folks do to ensure that what they're seeing, watching, consuming is, in fact, what they think it is, that it's real?

Yeah. This to me comes down to, and maybe this is the the intel roots in me, but it comes down to not the content itself, but the source of the content.

And I think that's the big thing that we should focus on when we're trying to synthesize information out there, whether it's true and trustworthy and good content or maybe we're being, you know, fed something that's a half truth or is not the full picture and whatnot is trying to identify what's the actual source of this information.

So something that I have tried to do is I pre identify trusted sources of particular pieces of information. So if I'm looking at a, you know, particular threat actor group, there are certain sources that I know have reliably reported on information related to this particular threat actor previously.

And so my de facto response when I need to find information about that threat actor is I will go to those sources.

I won't just take on face value information about that threat actor when I can't verify the source.

And I think, unfortunately, because of the position that we're in where, you know, the a false piece of information moves so fast that by the time you correct that piece of information or quote unquote debunk it, it's already out there and people may have checked their phone five minutes ago and then the correction came out, you know, two minutes after that, but they'll never see it. Right? I so, yes, there there's some of that technology where they're trying to, you know, imprint, you know, kinda like watermarking or or things like that in videos and images and things to try to, you know, alert folks that this is made by an AI model and this is fake.

I really think just for the average end user who's trying to decipher stuff out there, the best approach to take at this point in time is pre identify sources that are reliable, that you have trusted historically, and continue to go to those sources.

And then and then trust but verify. I didn't mean to cut you off there, but, but we need to, be careful about the media that we consume, which really is true regardless of this entire AI conversation or not. Right? So, TJ, it has been a pleasure to talk to you again. I love talking about AI lately for sure, but also, with security being so top of mind, it's been awesome to discuss with you some of these things that we should be thinking about and how we use these publicly available models, perhaps why we should consider running a local model, and also how, you know, folks are using this technology, for nefarious purposes at what I say in air quotes at a small scale against, you know, SLTT, like you said, but also from a macro and, like, the nation state level. So TJ, thanks so much for giving us your insights. Always appreciated.

Yeah. Absolutely, Phil. Truly always a pleasure to be on the podcast. Happy to come back anytime.

So if you would like to be a guest on Telemetry Now, I'd love love to hear from you. Or if you have a idea for a show, you can reach out to us at telemetrynow@kentik.com. So for now, thanks so much for listening. Bye bye.

About Telemetry Now

Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS? Well, you're in the right place! Telemetry Now is the podcast for you! Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.