Dive into the world of AI and Kubernetes with Shopify's Shane Lawrence in this episode of the Cloud Security Podcast. Shane, shares his experience in the security team at Shopify and working on the intersection of AI, Large Language Models (LLMs), and Kubernetes security. Shopify is looking to pioneer the use of AI to streamline developer operations, enhance productivity, and bolster security measures in multi-tenant Kubernetes environments. This episode will be valuable for you if you work in Kubernetes, Security and looking for how AI can build efficiency in your team. This episode was recorded at KubeCon + CloudNativeCon NA 2023
Questions asked:
00:00 Introduction to AI and Kubernetes
01:32 Shane Lawrence and Shopify's AI Journey
02:21 AI and Developer Efficiency in Kubernetes
04:39 AI-Driven Automation for Security
06:34 Challenges of AI in Kubernetes Environment
11:22 Case Studies for AI in Kubernetes
13:43 The Future of Kubernetes and AI
15:59 Learning and Experimenting with AI in Kubernetes
17:49 Closing Thoughts and Fun Q&A
Ashish Rajan: [00:00:00] Have you looked into building AI projects in Kubernetes? Or maybe your organization is already working on LLM projects in the space of Kubernetes because you are in an multi tenant Kubernetes environment, and you are thinking about using Kubernetes, not just for purposes of building LLM models, but also for security.
In this episode, we had Shane Lawrence. Shane Lawrence works for Shopify, which has been working on AI projects. And they announced one recently as well. If you are looking at doing AI in your organization, and you want to understand how you can use Kubernetes from a security perspective, then this is the episode for you.
If you have any friends or colleagues who are also thinking about this from a. AI LLM perspective, what can you do from a Kubernetes security perspective? This is the episode for you as well. This is a list of experimentation that Shane has been running at least in his organization to talk about what they've learned so far.
They also had a great talk about this at KubeCon North America. If you're someone who's working in the AI space and specifically in a Kubernetes environment, this is the episode for you. If this is your second, third, or maybe even 10th, or maybe 50th episode that you're listening to of Cloud Security Podcast, or maybe watching [00:01:00] on YouTube channel, And you have been finding us valuable.
I would really appreciate if you could take a few moments to drop us a review or rating on your popular podcast platform like iTunes or Spotify. That is if you're listening to this. If you are watching this on YouTube or LinkedIn, definitely give us a follow or subscribe. It definitely helps us spread the word and lets other people know as well that we have a community that we would love to welcome them into.
We are a growing community of about 50, 000 people so far. So we would love to keep growing that and keep spreading the good message of cloud security and how to do that I hope you enjoyed this episode of Cloud Security Podcast and I'll see you in the next one. Peace.
Share a bit about yourself.
Shane Lawrence: Yeah, I'm Shane. I've been working at Shopify for six and a half years doing security there the entire time. And in that time, I've gotten to do a fair bit of detection engineering, intrusion detection systems, things like that.
And multi tenant security, similar to Cailyn, who you had on before this. And a lot of networking. I've also had the chance to build in all kinds of automation that we can use so that developers don't need to worry about [00:02:00] how they're going to implement security on whatever it is they're building. We can hopefully most of the time do that for them and then put in some guardrails so that if they start going beyond the beaten path, there's a chance for them to notice that before they do something that would require intervention on the part of our security team, or in the worst case, some of the researchers from our security bug bounty program.
Ashish Rajan: Oh and obviously you have a conversation on the whole LLM space as well. What's the LLM in Kubernetes space doing?
Shane Lawrence: Oh, so Kubernetes we've been doing since 2017, maybe 2016, even before I got there.
LLMs are pretty new for us. Our founder is very excited about AI. And so there's been this really big push to include AI in as much as, as reasonable at this point. And so we've got some cool stuff going on in our product where we've got a chat bot and you can say, make my store more pink and it'll just go ahead and make it more pink, which I mean, that's something that you could probably have done without the AI, but it's cool that it's there [00:03:00] already.
And this is still very early for AI. So there's a whole bunch more that we're going to be able to do with that. And then on the back end for developers, we can say something like write a pod spec in YAML and it'll just. There it is. And I find stuff like that just saves me so much time.
It's not quite mind blowing because I could have googled it. I could have found an example online and used that and copied and pasted. But it wouldn't be tailored to my use case. And so instead I would have to spend all of this time searching, finding the right one, and then adapting it. And, alternatively, I could sit there and scratch my head and Okay, does the spec go here, and then I put pod below it, or what's the syntax?
And it saves me a lot of time that I would have otherwise spent either searching or trying to remember how to do something. Fairly mundane, like the syntax of how to deploy a pod. And so we've got this sort of developer portal that we can use, where we can get AI doing all kinds of cool stuff. And then we're also experimenting a bit with how we can use that for detections.
So detection engineering, you get a lot of [00:04:00] false positives. It's really hard to come up with a detection rule. that isn't going to have false positives because a lot of traffic that's suspicious looks pretty close to traffic that's normal and just differs in a few subtle ways. And with AI, we're looking at different ways that we can use it to just filter out that noise, that the human on the other end can focus on what they should be doing, building cool stuff instead of just identifying and looking through compliance reports and audit logs and things like that.
So it's really exciting to me because I think it's going to free up a lot of developer time to do the things that are more exciting and take away a lot of the tedious tasks. That we don't necessarily look forward to.
Ashish Rajan: Overall, efficiency could increase quite a bit. Productivity could increase quite a bit as well.
I hope so. Yeah. Yeah, we're all hoping for it as well. And, would it be different if people who are listening to this Because, to what he said, you're in an experimental stage at this point in time. Yeah. Would it be different between, say, a managed Kubernetes versus a self hosted Kubernetes?
Would that be the same benefits?
Shane Lawrence: I wouldn't [00:05:00] say they would be the same, but I think there are equivalent benefits still. So the example of a pod spec, that's still probably going to be there. On the other hand, if you are managing your own Kubernetes clusters, then there are a lot of other things you could use it for.
Deciding what shape your clusters should be and how to deploy those. And I don't even want to get into Configuring your own etcd. And I'm very happy that I don't need to worry about that. And haven't since I learned Kubernetes the hard way many years ago, but it could still be a very valuable resource when you're trying to do stuff like that.
Of course, if you're using a managed cluster and you start asking it questions like that, you need to be careful that it's not misleading you because it does have a tendency with the kind of LLMs, the chat models we've seen so far, they have a tendency to hallucinate. And so they will very confidently tell you to do something a certain way and it's completely wrong and they have no ability to distinguish between those.
So it's really useful for getting started as long as you have some way of validating that what it's saying is actually correct. And in that way it can be useful as like a set of training wheels, but [00:06:00] it can't be your only teacher. You need some other human in the loop. And so I've heard a lot of talk about these fears that AI is going to replace people.
And maybe someday it will, but right now I just don't see any real way that could happen. Instead, I think it's much more likely to be like other technologies we've seen, where it makes some things easier, it may reduce the need for a certain kind of employee or a certain shape of job. But even more than that, it creates more opportunities to do something else.
And I find the other things that it's creating the opportunity to do, frankly, more interesting than the mechanical work of just writing that YAML the same way over and over again.
Ashish Rajan: Productivity increase would be so much. But to your point about hallucination, like what are some of the bad scenarios? We were talking about the whole approaching this from a threat modeling perspective.
And if you were to apply LLMs into Kubernetes clusters for whether it's to increase productivity or to actually build an LLM model in your organization, what are some of the bad things that probably are related to the LLM as a threat actor in terms of what are some of the things that come to mind as you were researching
Shane Lawrence: this? So I guess there's a [00:07:00] couple different areas that could happen on the one hand you need to worry if you're publishing an LLM similar to the chatbot that we use or something that's gonna be accessible to your customers or the public you need to worry that whatever you put in there Yeah may end up coming back out of there at some point. And so you need to carefully control what data you provide to that model. And so you want to make sure that it doesn't have access to secrets. It doesn't have access to potentially sensitive information. And then you want to have strong controls over its ability to say things that would be offensive or misleading.
It's not going to leak data necessarily, but it will say something that you don't want it to say. And because they're non deterministic compared to a software program where unless there's a bug, you know what's going to come out the other side. You have no idea. Between even with the exact same prompt it might answer two different ways on two different occasions And so you need to be careful that you have some kind of control over what it's able to do and establish boundaries up front for what sorts of circumstances [00:08:00] a user might venture into where they would be off that beaten path outside of what you are personally willing to vouch for as an individual or an organisation and on the other hand, there's also the concern that your use of LLMs could open you up to some new attacks that didn't really exist before.
One of these, we use it a lot, like I said, for creating boilerplate. I might say, chatbot, write me a Go program that's going to calculate shipping taxes or something like that. And if it suggests that I use a certain library, there is no guarantee that is the best library to use. There's no guarantee that library even actually exists yet.
And there's no guarantee that it's not a typo squatting replica of another library. So if it is typo squatting, it might be a very similar misspelling of a real package that someone has hijacked. If it doesn't exist yet and someone else is able to predict that it's going to suggest this, then they can just buy the domain or create that library, and maybe at the beginning they'll just [00:09:00] replicate the exact functionality of the thing that you're mistaking it for, and then once it's been deployed widely and it's achieved widespread use, and they can determine that by the metrics from the package manager, then they switch it over and decide to do something nefarious with it.
So that's one example of something where you take the output from an LLM, You use it, and it opens you up to this typosquatting attack. On the other hand, there's a possibility that someone else could have corrupted the data that was being used to train the model. Or in specific kind of LLMs that use retrieval augmented generation, they'll have this language model, this elaborate autocomplete based on math that I don't even fully understand, and they'll combine that with a corpus of documents, and it can just pull from those really quickly.
And the advantage of that is the documents can change really quickly. You don't need to retrain this whole model. It might take millions of dollars and months or years to complete. It can just pull this information on the fly. But now you've got two problems, because you've got potentially a poisoned data set in the [00:10:00] training.
And the opportunity for an attacker to change the data in that corpus. And so then you need to be confident that the model wasn't trained on bad data. And you need to be confident that the corpus it's pulling this fresh information from this real time or near real time. information from, also doesn't contain any poisoning.
And for the training models, we've seen an academic paper showed that for less than $60 you could corrupt, I think it was .1 percent of the, or .01 percent of the data. And if you can predict exactly what people are going to be using it for, then you can also corrupt the data in that corpus. So if it's pulling from Wikipedia or GitHub, you don't need to permanently change those.
You can change the Wikipedia page to say that, the cloud security podcast is the number one podcast around the globe. If you can predict that the timing that it's going to crawl Wikipedia, it doesn't matter if somebody changes it to say that, it's not the most popular, but it is the best one and reflect reality.
Of course, then you can still change [00:11:00] the data set when it was pulled froM.. On the other hand, with the corpus, you have this opportunity where you can put things out there that you yourself can delete immediately after. So there's almost no chance of it being detected. And once it's been retrieved, someone is going to get that information in a way that is probably unwanted and, potentially even very malicious.
Ashish Rajan: Sounds like a lot of the controls around this should be, like, should not require a sophisticated system. That's basically, a lot of that is hygiene as well. Are there actually use cases already there of this being used in the Kubernetes space at the moment? Or is that just still being, to what you said, experimented on, figuring out what, is that primarily more for building a boilerplate YAML, or have you seen implementations of it beyond that as well in terms of the Kubernetes environment?
Shane Lawrence: Yeah I think frame. io might have been an early example of using ML to train some of their security detection tools with Falco, which is a tool that I'm very familiar with. And so this is an early use case of an AI like thing. And [00:12:00] again, it was to reduce false positives and increase the ability to do detections.
I've also seen a pretty cool use case of natural language processing, where It's common for an attacker to obfuscate the code that they're using, so if they change the javascript so that instead of saying hello world, it says rmrf star, then that's gonna be really obvious to a person who's looking at it, but if instead they change it to write the letter r to this file, and then move a bunch of stuff around in memory so that now there's an m somewhere, and it writes that somewhere, and so on, so that there's no single place where it says RMRF star, but it's all going to be put together with a lot of confusing, unnecessary, redundant steps in the middle just to mislead a human.
That's never going to mislead NLP because the things that are good at hiding stuff from humans are totally different from the things that are good at hiding stuff from machines. So there are ways to obfuscate those kinds of attacks. from the machines, but those ones would likely be more visible to the human.
So there are some pretty good examples of it being [00:13:00] used in production in the real world, stuff like the chat bot style thing. I don't quite know where that's being used in production. We don't have a way of just saying like automatically create this pod and go deploy it right now. I guess we do have, the make the store more pink, but there are some pretty strict controls around what it can do and again, we would not give that, unrestricted access to the Kubernetes API at this point, just because we don't quite trust it that much yet. And also we have the ability to, just roll back if you don't like the way your store looks after a change, you're just testing it.
You haven't saved it yet whereas if it deletes a cluster, We might be able to rebuild the cluster, but the damage is done, the downtime has already been suffered.
Ashish Rajan: Yeah, and I imagine because we are KubeCon as well, anyone who has an open source project potentially looking at LLM or probably not even looking at LLM, thinking about including LLM into their thing, is there some scope for it there as well where you would see that being used [00:14:00] for an open source project or you would see more?
I guess, where do you see it go from here? You're obviously experimenting in stages time, but you're finding obviously good results from it. There are a lot of hygiene things that could be done to prevent it from going bad. Even if it was like a situation that it's actually active in production.
I wonder for the next phase of, okay, now we experimented. Like the same thing happened with cloud where, initially people were all experimenting. Now they're like, oh, let's start putting some real workload onto it and see what happens. Is that a situation you reckon we'll reach soon with Kubenetes and AI?
Shane Lawrence: I think that we're building more and more confidence in it, but I think it's important to not just trust it unconditionally.
You should make sure that there are guardrails around what it can do and that you've tried out some of the situations that you're going to be putting it in before You unleash it on your customers, your developers, whoever it is that your target market is. You should be careful that whatever they're going to be doing with it is something that you've tried it with before.
And especially careful, for instance, if you're putting their data into it. Or [00:15:00] if you're putting metrics that you've collected into it, things like that, you need to be careful that you're divulging exactly how you're using other people's information, obviously. And what else could happen to that? And so to simplify it, I like to think of it the way that I would think of a SaaS or another company that I was going to start doing business with.
If we were going to start revealing information to another company, we would have all kinds of controls over specifically what information we shared with them and what kind of agreements we had. And obviously you're not going to sit down with ChatGPT and say, ChatGPT, do you promise not to share this information with anyone?
Don't, please don't do that. But instead we can put, for instance, a proxy in front of a portal that might be interacting with an interface like that and then make sure that we're restricting any queries that look like they might be bad and also looking at whatever it's returning so that if it starts to leak secrets or if somebody accidentally puts a secret in, then we can identify that and take corrective action before it becomes a real issue.
Ashish Rajan: Awesome. [00:16:00] And very last question as well. Where can people learn about it? Because it's almost there's a lot of misinformation about it as well. Absolutely. And there's also like A lot of experimentation required to what truly is the case. So we're in an unknown territory as well. How are you finding learning LLM for Kubernetes or just the cloud native thing?
How are you going about it and what can people take away from it?
Shane Lawrence: That's tough. I think that's why I chose this as the topic for my talk, because when I started looking into this, I found that it was really hard to get good information on this. I wouldn't recommend Twitter as a source of accurate information about AI because what I found is
Ashish Rajan: Elon Musk is not tweeting the right thing.
What are you talking about?
Shane Lawrence: I'm not going to say personally that a specific individual is lying about their product, but it's such a common problem where organizations, or maybe individuals, are creating misleading claims about what AI can do or what their specific AI can do that the FTC even issued specific guidance on it earlier this year and there's just a lot of snake oil out there Everybody's saying, you know this new AI tool is going to be able to solve all [00:17:00] these problems and in most of those cases they're at least exaggerating what the capabilities are.
So i'm not an academic but I have found that looking at academic research papers is a very good way to get accurate information about it. It's also really slow and it's really tough for someone like me who's not well versed in the language and doesn't understand most of the math that they're using and so I've found that it's even more important than ever to rely on the community and look for industry leaders who have a history of telling the truth or at least retracting it when they're wrong and then try to find out what their take is on it and for me my favorite thing to do is just sit down and play with it. Experiment with the technology, see how it works, and then go look on Wikipedia to see who's using this and what they're doing with it, and then, go straight to the source and find out, okay, how did you figure out that it could be used for this, and what problems did you run into when you tried it for that?
Ashish Rajan: Cool thank you for sharing that. It's time for my fun questions, which I need the jelly beans for. You can take three, any three, doesn't matter. We'll just, I'll let you take any three you want. If you get extra, just give me an extra if you like. Cool, all right, first one, [00:18:00] man. All right. Oh my god. 7 Eleven
Shane Lawrence: taquito.
Ashish Rajan: Is that what he said? It was like shit. This isn't good at all. No. First question. Oh my God. It's bad. It's has like spit. But if it was like spit with booger. Oh, that's bad. What do you spend most time on when you're not working on Kubernetes AI?
Shane Lawrence: Lately, my son. He's 15 months old and he's amazing.
And so I spend as much time with him as I can. It's just really cool to see the first of everything. The first time that he wears a hat. The first time he wears mittens. The first time he sees the ocean. And there's just so many things that are brand new and exciting. And I'm like a little bit envious of him.
Because there's nothing to which he's jaded yet. Everything is new and shiny and exciting sometimes he gets sleepy and needs a nap, but I can relate to that too.
Ashish Rajan: Awesome Fair enough. All right next one. Oh, it's not bad. It's like subway. How many subways? Oh, it's a lot of pepper There's a lot of pepper in there a lot of pepper.
There's all the overpowering Having the entire day [00:19:00] of it, if you could have a superpower, what would that be?
Shane Lawrence: I think just more time. There's so much to do, and so much to see, and there's so much cool stuff in the world, and it's terrible that we have to choose some at the expense of others, and so I just If I could have unlimited time, that would be my ideal superpower.
Ashish Rajan: Last one. Oh my god, no, I got dirt again. This one's hot. I got dirt for sure, yeah, dirt. Oh my god. Last one. What's the best part about coming to KubeCon?
Shane Lawrence: The people. I think sometimes that if everyone outside of my immediate colleagues on my team were replaced by AI. I wouldn't notice until the next KubeCon, and so I'm just very excited to have a chance to see people who I follow their work on GitHub, or I read their blog posts, or I see them on Slack all the time, but I don't actually see them, and it's just oh yeah, you're a real person.
Ashish Rajan: Three whole dimensions, and that's pretty cool. That's awesome, thanks for sharing that. And where can people find you on the internet? I'll put the link to your talk as well. But where can people find you to connect with you?
Shane Lawrence: Yeah, lawrence.Dev is my domain. So just go there and I'll put a link to whichever social media is in [00:20:00] vogue that week.
Ashish Rajan: Awesome, thanks so much man. Thank you for coming on the show. Thanks a lot, it was great to be here. Likewise, and we'll see you next episode.