AWS Multi-Account Security: What Netflix Learned

View Show Notes and Transcript

🚀 How do you secure thousands of AWS accounts without slowing down developers? Netflix’s cloud security experts Patrick Sanders & Joseph Kjar join us to break down their identity-first security model and share lessons from scaling security across a massive AWS multi-account environment.In this episode, we cover:

  • Why identity, not network, is the best security boundary
  • The challenges of least privilege and right-sized access
  • How Netflix migrates IAM roles while minimizing disruptions
  • The impact of multi-account AWS security strategies

Questions asked:
00:00 Introduction
02:05 A bit about Joseph
02:32 A bit about Patrick
02:38 Scaling security across multiple accounts
03:29 Least Privilege is hard
06:44 Why go down the identity path?
08:49 Identity based approach for least privilege
15:43 Security at scale for Multi Account in AWS
23:54 Lessons from the project
27:02 What would be classified as an easy migration?
30:55 How the project has progressed?
35:01 Automation Pieces that enabled the project
37:54 Where to start with scaling security across Multi Accounts?  
39:21 Resource Access Manager and how it fits into migration
--------------------------------------------------------------------------------
📱Cloud Security Podcast Social Media📱
_____________________________________
🛜 Website: https://cloudsecuritypodcast.tv/
🧑🏾‍💻 Cloud Security Bootcamp - https://www.cloudsecuritybootcamp.com/
✉️ Cloud Security Newsletter - https://www.cloudsecuritynewsletter.com/

#cloudsecuritypodcast#cybersecuritypodcast#cloudsecurity#awssecurity

Joseph Kjar: [00:00:00] It's helpful to estimate a couple of things, right? Migration complexity, the security risk of the application in question and the operational risk of the application. So if you have an application that's low complexity, but it's high security risk, and it's a simple migration. You found a golden target for an initial migration.

Ashish Rajan: Building security using cloud native at scale in a large AWS account or accounts, if I can say that. It's quite complex sometimes. Sometimes it takes you a few years to go through multiple challenges you may be facing, especially if you have a lot of baggage from starting in AWS quite early. I had the pleasure of talking to Patrick Sanders, Joseph Kjar from Netflix.

We spoke about scaling security across a multi account AWS footprint. What does that look like, especially if you've been working in the cloud space for a long time? Now, both Joseph and Patrick work in Netflix, so they've seen scaling challenges that perhaps you may already be seeing at your end or perhaps may see in the future.

So this overall conversation was quite fascinating to understand. How they've gone from, by the way, we did another episode with them a couple of years ago in their [00:01:00] first AWS re:Invent talk. And this is recording at their second AWS re:Invent talk. The first time we spoke about why the identity first approach for doing multi account security.

In this particular conversation, we spoke about, what's built up from that, what the lessons people have learned from it. And also, things along the lines of if you are someone who's starting today versus someone who already has a baggage, how should you approach security? And are there only solutions that probably doesn't require you to code much?

All that and a lot more in this episode of Cloud Security Podcast. If you have been listening or watching Cloud Security Podcast episodes for some time and have been finding it valuable, I really appreciate if you're on Apple or Spotify gives a reviewer rating or if you're watching this on YouTube or LinkedIn gives a like, subscribe.

It's an easy way to support what we're doing and to let us know that, this is the kind of topic you want us to create and you want us to keep working on the amazing things we're creating for you here on Cloud Security Podcast. I'll let you enjoy the episode. I'll talk to you soon. Hey everyone, welcome to another episode of Cloud Security Podcast.

I've got two folks, Patrick and Joseph. Welcome to the show, guys. Thanks for coming in again.

Patrick Sanders: Thanks for having us.

Ashish Rajan: Yeah. so much. today. Like what? Two years ago.

Patrick Sanders: Where's the time gone?

Ashish Rajan: It's just [00:02:00] the different setup, there's a lot more cameras around here, but yeah, thanks for coming over.

Do you mind giving like a 30 second version? Maybe Joseph, do you want to go first with your introduction first, man?

Joseph Kjar: Yeah, absolutely. So my name's Joseph Kjar. I'm a cloud security engineer at Netflix. Work on an incredible team with my good friend Patrick here tackling all sorts of problems in across the gamut of the world of AWS security from account management to identity and access, scaling internal Netflix services and trying to adapt to a very volatile and ever changing business landscape.

And what about you Patrick?

Patrick Sanders: Just the same, like copy paste. I am Patrick.

Ashish Rajan: Everything that Joseph said is what I do as well. Same team guys. Actually because to get some refresher, you guys were here a couple of years ago. We were talking about building multi account scaling security across multi account kind of architecture.

Maybe Patrick, do you want to lay the groundwork for what was that about? And Joseph going to fill in where, wherever your teammate drops off. [00:03:00]

Patrick Sanders: Yeah. So the main project that Joseph and I have been working on for three plus years now it's been a while is taking our famously large multi tenant environment from, the days before anything was the way it is in AWS and all the organic growth that's happened since then and moving cloud identities out of those multi tenant environments to isolate them and create better boundaries around identity and access.

So the identities for an application can only access the resources that they're supposed to. Because right now in our big multi tenant environment, it's not that way. Oh, fair. Least privilege is hard. What is least privilege? It's a whole other episode in there. Oh yeah. We can dive into that.

Ashish Rajan: Because my least privilege is very different to your least privilege.

Patrick Sanders: Yeah, and that's. It's something that we're leaning into is the ambiguity of what least privilege is. All right. And in a multi tenant environment that we have, it takes the shape of, we try to [00:04:00] remove permissions, permissions that aren't used by a role and remove access to particular resources that aren't used by a role.

But that's really tedious scale and really difficult. So instead, what we're doing is we're moving these identities into their own accounts and the account boundary is really strong. It's like the best security boundary that there is an AWS.

Ashish Rajan: Yeah,

Patrick Sanders: and we're taking advantage of that and giving broad permissions within that boundary.

Which is very much not least privilege, but if the role can only access its own resources, why does it matter? And it depends on your threat model and your risk appetite and everything, but for us it works well.

Joseph Kjar: Yeah, and internally we've come to reference this as right sized access versus least privilege, which Oh, I like that, which is a dangerous goal, right?

If your goal is to scope every single IAM principle to only the exact and precise actions that it needs. And you want to do that for even, a [00:05:00] minor percentage of your application population. You're gonna burn so much time and how much security risk reduction are you actually getting right?

Is that helping you scale? Is that improving your developers lives and experience? Probably not. Yeah And so we try and stack security and scalability and UX benefits as much as we can yeah and moving a workload to it or moving a workloads identity to its own account, letting developers have a lot of freedom and in that isolated account that checks all those boxes, right?

Ashish Rajan: Yeah.

Joseph Kjar: No, it's not perfect API level least privilege, but the scope of impact for a compromise of that application is severely reduced or significantly reduced and the developers happier, right? They don't have to worry about super restrictive IAM policies. They don't have to submit policy requests to us.

You have the freedom to operate in that environment.

Patrick Sanders: And we don't have to go through and try to take away permissions and get yelled at when we break [00:06:00] something.

Ashish Rajan: IAM people love that shit. What are you talking about? They love the requests coming in. Hey, I changed my role. Can I get some more permission?

Do you still need the older permission? Yeah. Oh my God, I still need them. And I think what I'm getting from this is I think it's also a huge productivity add on as well. And I know we are in a very Gen AI kind of world. So maybe that's what I learned that word, but it's like improved productivity for security.

But what was the reason to choose identity over resources? No, obviously more recently there's the whole RCP and other things. I'm pretty sure. Very new to even start using it or even experimenting with it. Why go down the identity path, not the resource path in the right size part?

Joseph.

Joseph Kjar: Yeah, this is something that I think is a little bit unintuitive to a lot of folks when we first explain what we're trying to do. Think back day one, you're learning AWS, you go to launch an EC2 instance and it gives you a little option that says select an IAM role for your instance.

Yeah. That IAM role must live in the same account [00:07:00] where your instance lives. It's just an assumption of AWS from the beginning, and it creates a lot of downstream problems in the security realm. To have that be like a binding paradigm. It means that if you have multiple workloads in the same account, their identities, their IAM roles are all in the same account.

All the teams that want to manage those things. We'll also need to be in that account and before too long with even three, five workloads, you have resource contention, you have, a really difficult time trying to create these artificial isolation boundaries between what these different teams should be able to do in there and it's just not practical.

Now imagine Netflix scale coming from a time before organizations even before the IAM service itself and accounts with not five, but hundreds, over a thousand workloads in the same account. There's no way to create meaningful security boundaries [00:08:00] between those things and why come at it from the identity angle, because trying to lift and shift all of those resources out is remarkably complex and difficult, right? Resources tend to be heavy anchors in accounts for numerous reasons. Whereas the identity it's only anchored there because AWS says that it has to be. And so that's why we decided to attack that part of the problem and find a way to decouple.

And slice the identity off, put it in its own account. Now the application runs in the context of that new account, it hits its own AWS API quotas the team that manages it now logs into a separate account. To manage things over there and we get to leave the really sticky, nasty resources in place and significantly reduce our migration complexity.

Ashish Rajan: And I guess probably to your point, complexity also in the fact that if you've been building or if you have been existing as a business for a long time, a tech business for a long time, there's a lot of applications that are [00:09:00] probably still created. Waterfall cannot be touched. No one talks about those areas of this organization as well.

And to your point about the stickiness of those workloads, everyone has those skeletons in the closet, as much as people like to deny it. In that context as well, do you find that older kind of applications still were able to function with the whole identity based approach for least privilege?

Patrick Sanders: That's one of the interesting nuances of this like migration pattern that we're doing. The approach that we're taking is that by moving the easier identities out of our big multi tenant environments. That puts an account boundary between those identities and the resources that are in that account.

So we can explicitly allow the cross account access for the resources for the application that are still there.

Ashish Rajan: Oh.

Patrick Sanders: For services that support cross account access.

Ashish Rajan: Yep.

Patrick Sanders: So every identity that moves out is risk reduction. And is fewer paths between applications and these resources. Yeah.

So a critical application, even if it doesn't move out of the multi tenant account, we're still reducing the risk related to [00:10:00] that application by migrating other identities out.

Ashish Rajan: Because we spoke about the what. I'm sure people are curious about the how as well. How do you start?

Joseph Kjar: So this is something that kind of crosses our previous talk and the one coming up tomorrow But it starts with a mechanism for account agnostic credential delivery and i'll let patrick talk about the nuts and bolts of that since he's done an amazing job doing the bulk of the software engineering work for that.

Ashish Rajan: Oh, I like that. Joseph.

Patrick Sanders: One of the key pieces that makes all this work is like, how do we get AWS credentials to an application for it to run? In EC2, that's normally via an instance profile and you just use the instance metadata service to get session credentials. Yeah. Yeah. It just, it magically works. Yeah. In containers it can be a lot of different things for us it was slash is [00:11:00] an IMDS proxy.

. That proxy some things to the host IMDS. But for credential requests, it handles it internally and, does some assume role magic. So we had that in our container platform before, but it wasn't, it didn't really fit with the EC2 use case also. So we decided to develop a new IMDS proxy that will support our container platform and EC2.

And it uses OIDC with IAM so it does STS assume role with web identity to get credentials and deliver them to the application. And similarly, it's completely transparent to the application itself. It just magically works because it serves on the IMDS. IP and SDKs just know how to get credentials.

Ashish Rajan: So me as a developer who's deployed an application, whether it's containers or EC2, I still use my SDK to call the AWS CLI or whatever.

I would not know any different. Yeah, exactly. Do I still have an IAM role or I [00:12:00] don't have an

Patrick Sanders: You, you do have an IAM role it's just not where it was before.

Joseph Kjar: Yeah, it's often one of these isolated accounts.

Ashish Rajan: Oh, another thing about the risk reduction is, sorry to call out Capital One on this, the hack that they had was the IAM role that the vulnerable web server had was sharing a role through the S3 bucket and blah blah. God forbid if it was used by other EC2 instances as well in that account, that could have been a lot more catastrophic.

One of the things people talk about in IAM role in an EC2 instance oh, if you keep in one account, most likely developers would share it with more resources. 'cause they don't wanna go back to security for, Hey, I need another identity to check out. Yeah. If someone has a overprivileged IAM role, I'm gonna use that instead of trying to go to security for this.

Yeah. In this particular scenario, me as a developer, I'm totally oblivious. I have an IAM role that let's call it Joseph and Patrick, like the Joseph and Patrick role I have, it gives me access to what I need. Is there least privilege in there? Or is it least privilege in the central IAM account?

Joseph Kjar: So just to clarify, the IAM [00:13:00] pieces are decentralized.

Ah, so so each application, your app in this example, when you spin it up, we would give it its own AWS account one per environment test and prod and when your app starts the proxy will use that OIDC workflow to fetch a role from that remote account and serve the credentials to your application. So your compute might be shared, right?

You might be running on you know on the shared container platform or as part of a big shared EC2 environment. But your role is not right. Your role lives in its own account. So your application gets compromised. You have an SSRF vulnerability. Someone gets ahold of those credentials. They start doing enumeration activities.

What are they going to see? They're going to see nothing. They're off in la land, right? They might see a bucket associated with your app specifically that you created in there, but that's it. Because that's, and that's where we're really taking advantage of those account boundaries to just not have to worry about so [00:14:00] many things that happen when roles get compromised in.

Ashish Rajan: And in the same account you have other resources that are shared, that are sharing the same IAM role as well, same instance profile in the same account, that's where your blast radius is expands out.

Patrick Sanders: The problem is more that historically we've over provisioned permissions for roles because it, Guilty as charged. Yeah, it provides a better developer experience and it gets us out of their way and it makes everybody happier, but it's riskier and like we've accepted that risk for a long time and done what we can to mitigate it with repo kid and.

A lot of great work that our team has done in the past.

Ashish Rajan: Yeah.

Patrick Sanders: But it just got to the point where we're hitting scaling issues with those things and also with role quotas. We can't create more roles in, in some accounts just because there's too many. Oh, actually, yeah. Yeah, and then like also noisy neighbors.

If one application is slamming an AWS API and using up the rate limit, then that could take other applications down. [00:15:00] That can't happen if all of the apps have identities in their own accounts.

Ashish Rajan: Okay, fair to your point then. I guess maybe taking a step back, the IMDS proxy that you guys have worked on, v1, v2 support?

Because these days, most people are either V1 or V2. I don't know, I've got to talk to people in the shame. What was, I think Scott Piper has like a shaming list for IMDSv1 I'm like, I wouldn't want people to go into there, but let's just say for the greater good, you guys at least have support for V1 and V2.

The first time we spoke about this, you guys were building the foundation for it. I think one of the things we spoke about was, what does scale mean for you guys? Because a lot of people, they look at scale as, I have multi accounts, I have, I don't know, 300, 400, 500 accounts or whatever from AWS, that's multi account.

Bringing security at a scale, how do you guys describe security at scale in a multi account context, like what's the role of security? Because we're talking about identity specifically here, but how do you describe that if you as a first principle, if I were to ask you that question, how do you describe that?

Joseph Kjar: I [00:16:00] think multi account, it's a tool in your toolbox, right? It should be part of your higher level thinking about workload isolation. There are so many things that go into defining like what makes good security at scale, right? So for us, a couple of things that let us know we're moving in the right direction there are how much friction are we introducing to our developers?

Does our current security model mean that we're constantly being bombarded with requests for can you please unblock this? Can you please tweak that does our control model facilitate the business doing what it needs to do? That's a really, foundational thing. And then from there you have to assess for your organization what is acceptable risk, right?

What is the degree of isolation that I need? And there's different kinds for some organizations the data is going to be the number one thing because they hold, PII or regulated data, and that's where they need to be most cognizant of boundaries and scaling like [00:17:00] data protection boundaries.

That's a different problem than scaling security solutions for IAM role permissions and access related but different problems. So I think you really have to understand your own security priorities to determine, what are the things most important to you and then let the downstream stuff.

Your account strategy, how you group workloads together, how you silo data or centralized data, the access abstractions you build, those should all be in service to your higher level objectives about, what do you care about protecting the most?

Patrick Sanders: One thing that comes to mind for me is finding risk reduction leverage.

So once, once you know what your risks are and the things that you need to protect and what you care about, how do you act on that? Yeah. And we're a small team. Netflix is pretty big. We have a lot of apps, a lot of everything going on all the time. So we can't be involved in every conversation about IAM roles and permissions like [00:18:00] what an app can access. By splitting things off into their own little sandboxes if you will, we can not think about those things and we can use that as our leverage to isolate things that reduces a ton of risk. It introduces new risks, like somebody could if an application does get compromised, that application might have really broad permissions within its own account, somebody could create resources, we have guardrails to prevent things from being shared publicly, all that stuff.

There could be some denial of wallet kind of stuff. Yeah. But those are not the risks that we care most about. We care most about protecting the content, protecting the PI for our members, and our, talent and our employees and everybody, those are the things that we're, yeah, we're really careful about.

That means that we can be less careful about these other things and just not get in people's way.

Ashish Rajan: Sure. Because you're saying they're important, but not like the, in terms of the levels of importance, [00:19:00] because there's only so much a one person can do at any given point in time as well.

It's an interesting one because having a lot of these conversations, you get to hear a different perspective and you guys took the identity approach. I've had conversations where people have taken the network security approach because they had a lot more, they had a, let's just say, a well known vendor firewall, and they say multi account means you put a firewall in front, like an app, firewall appliance in front of every account exit on the network.

It'll be covered. To your point, the importance may be, Hey, I don't really trust what's in the cloud. So that's the number one risk for them. Maybe that's why they come from the network security perspective. I'm sure there are other perspectives as well. Even though the industry has been talking about identity is like the most important thing that if you don't have identity in AWS, or any other Azure, GCP doesn't really matter, which brings me to another question as well.

We're talking about like the risk factor. If there was an event that's detected, someone has to go in and investigate. I think we spoke about this last time as well, but I think it's worth revisiting the whole forensic approach in a multi account kind of thing. Hey, we do the right things.

We have the guardrails, we have [00:20:00] an identity, right? Everything is good. One of the questions that I've often gotten from people is that, hey, all the IPs are dynamic, the resources are dynamic, and if we separate out the identity, what's the approach for forensic to be able to get to that 259th account that seems to have somehow an event being triggered is what's your thinking around instant response kind of thing for a scaled out security?

Joseph Kjar: I think the guidance that AWS provides in In their well architected framework is generally a good starting point for this, because whether you have 100 accounts or 12, 000 accounts, it, the picture doesn't really change very much. It only really changes when you're going from very early stage organization and cloud security maturity to a more robust scaled up.

Scaled up model, but you mentioned, assume that the security controls the baselines are there. Yeah.

Ashish Rajan: Yeah

Joseph Kjar: Usually that means that you've centralized your logging. Yeah, right So you have CloudTrail [00:21:00] for all of your accounts you have. If you have applications running in each of them you know you have CloudWatch logs perhaps for the application level logs and then along with that you should also have pre provisioned security roles, right like A role for your incident response team that they can use to log in to each of these accounts.

I think the secret, which isn't really much of a secret is that the process shouldn't change from one account to another. If you can do it in one account, you should be able to do it in another following the exact same runbook. The work to make that happen is really more of an automation chore.

Yeah, than anything else and ensuring consistency from one account to the next, but that tends to be relatively straightforward if you're using things like infrastructure as code of some variety and just validating that all those security resources and controls exist where you think they should

Patrick Sanders: And that they're protected by SCP make sure there's somebody in their own little sandbox account can't like disable [00:22:00] CloudTrail or something like that.

Ashish Rajan: Oh, yeah. I'm sure we'll start talking about RCP soon as well. Now, I guess it'd be RCP, like it'd be a mix. I love that also because from a perspective of a scaled out account forensic incident response is not spoken about enough, but I love the identity approach you guys had for people who are looking at this from a, Hey, I think it's a great idea what Joseph and Patrick is talking about.

I want to start doing this as well. If I can put you guys in back in the shoes two years ago, the way you took on this project and we'll get into where it is today and what you guys are looking at next on top of that, would you change anything in the way you approached it now that you have two years of learning behind this for people who may be starting today and maybe you can cover both angles where someone who has a greenfield account versus a complex account.

Joseph Kjar: Yeah, sure. I think this approach holds the most value for people with a lot of baggage in AWS, frankly, right? You're coming from a large, complex enterprise. Brownfield environment and maybe you're like us where you tried to do a traditional multi account migration in the [00:23:00] past Where you lift and shift resources to different accounts and migrate things that way and failed we were unable to do that because it was just too much work.

Yeah, right too much work too complex and not enough value for the effort if you're in a situation like that then we'd strongly recommend that you take a look at some of these ideas because we have been seeing the benefits that we hoped for that, like Patrick said, it's probably been about three years from the inception of the idea.

To where we are now and we have learned a lot along the way, right? We would change certain things part of the reason for our timeline is we've been doing other stuff this is not the only project

Patrick Sanders: Joseph's been having kids That's

Ashish Rajan: a lot of work

Patrick Sanders: man. Yeah. Yeah, like there's life stuff at work.

Ashish Rajan: So you guys have a life outside cloud security.

Obviously life does take over some obvious. We're all humans at the end of the day anyways. With the cloud security pieces or approaching this based on what you learned, I have to point about people [00:24:00] who have baggages in AWS, would you cherry pick the kind of applications you do this with, or would you like, or would you approach me for picking where, what's

Joseph Kjar: my starting point?

Yeah, that's a good question. So we found it really helpful to Index on migration complexity as a key, like trait for what to move first, don't cherry pick too much, right? You shouldn't have to conduct a robust analysis of every single application before deciding to move it.

But it's helpful to, it's helpful to estimate a couple of things, right? Migration complexity, the security risk of the application in question and the operational risk of the application. So if you have an application that's low complexity, but it's high security risk, and it's a simple migration you found a golden target, right?

For an initial migration. And so if you can find a way To apply those classifications to your whole [00:25:00] application population. Then you can start divvying things up into, into a reasonable migration timeline and structure. But the approach should be the same for all apps at the end of the day, because otherwise you're just going to be spinning your wheels.

You'll never finish the migrations and something that we learned that we would do differently. Is we would have gotten into that migration feedback loop earlier, right? So we built a really robust and complex migration orchestration system to help us transparently migrate all these things right under people's noses, right? Zero code changes, like we were saying. We tried to make all of that tooling not too perfect, but before starting. And so we would highly recommend that folks get into the migration feedback loop earlier. So get just the bare bones things you need, do migration number one, you'll learn some things.

Tweak and go from there and then before long you'll be doing dozens of applications in parallel and that feels really good

Ashish Rajan: Would you change anything about [00:26:00] the when you started? I obviously you go see out of the same team. So I imagine a lot of the lessons may be similar But I was there something that stood out for you in the lessons learned

Patrick Sanders: Most of what I change is probably like more process oriented we did a lot of spinning our wheels trying to figure out how to integrate with like our migration campaign platform that we have at Netflix and working with project management to, coordinate a bunch of stuff that really slowed us down.

And it wasn't anybody's fault necessarily. It's just that like Joseph and I have very constrained time, we're the primaries on this project and we also have other responsibilities of course. So the more we split our time it just stretches everything out.

I think focus is the word that I'm looking for.

Ashish Rajan: This is going to be done into a productivity podcast. Like how to be more productive in my eight hours that I have with one hour lunch break in between.

Patrick Sanders: Do less. Yes, [00:27:00] honestly is more.

Ashish Rajan: Actually think of making it, optimizing it to, so you're doing less.

I think as we're talking about this I have a person back of my ear or someone who listener I'm pretty sure it's going to be like throw the word AI into this conversation. I know you guys are going to hate me for this. So now that we are in this AI era, like If you were to I'm not that I don't even know if there's such a solution because we're a great one for anyone in the migration space, because the whole idea behind the old data analytics thing is that you should be able to pull data in and understand instead of going through 10, 000 applications one after another as an individual.

Trying to figure out, Hey is it an easy migration or not an easy migration? Or is it easy migration? Because the agent from the AI thinks it's an easy migration. Cause I can do this in two days. So worthwhile are you defining? What would you guys say are examples of easy migrations? I think, cause obviously we were talking about this, like there's like baggages that people don't want to touch.

I still remember one of the companies I was working for and we had this. I don't think it was a security bug. It was just a bug, I think, but no one would want to touch it because that is the core if that [00:28:00] thing, anything happens, people who left that company 15 years ago, who basically made that thing walked away, no one touches it, it just works.

So I'm sure people have a lot of snowflakes like that in their environment. And I'm sure it's it would be a good gauge to understand what do you classify as easy migrations.

Joseph Kjar: That's going to change based on what your multi account strategy is. If you're doing something like our approach, where all you're doing is moving the identity, then estimating migration complexity becomes a lot simpler, right?

We can take the oldest, cruftiest app you can think of, and if all it's doing from an AWS API perspective is writing DES3 and maybe reading a message from SQS, we can migrate that in our sleep, it could be this big, ugly monolith of a, monster code base on the inside. Yeah. But from our perspective, it doesn't matter.

All we need to make sure of is that [00:29:00] when we swap that rollout from under its feet, it doesn't lose access to the S3 bucket to the queue that it depends on. And so for us, migration complexity has primarily been driven by AWS service usage, how is this thing interacting with the AWS APIs?

Oh, okay. And when we swap that rollout, are we going to cause disruption, a mudslide, or can we like pinpoint how it expects to work from an AWS perspective? Make sure that's all ironed out before we push the big red button. And the let it go.

Patrick Sanders: And that's actually pretty easy to figure out too.

Oh, the IAM Last Accessed Info API.

Ashish Rajan: Oh,

Patrick Sanders: yeah. Yeah. And you can just pull that down for each role and look through every service that's been accessed in the last year and it's so easy. But where it becomes more difficult. Is when an application is using services that don't support cross account access.

So your Kinesis, your Route 53. So let's say there's an application with [00:30:00] a role in the multi tenant account.

Ashish Rajan: Yeah.

Patrick Sanders: And they're managing Route 53 hosted zones in that same account.

Ashish Rajan: Yeah.

Patrick Sanders: To migrate that app, we would have to make code changes so that the app assumes a role in the origin account to be able to do the Route 53 calls.

We don't want to be in the business of code changes, but when we ran the numbers, we realized that we have three groupings. We have no AWS service usage, we have cross account AWS service usage, And same account, AWS service usage. Yep. And the large majority of our applications are in those first two groups.

Ashish Rajan: Oh, okay. Yeah. So the pool is quite small for the other one. Yeah.

Patrick Sanders: So we're leaning into that, severing the connections between identities and resources. So these more complex applications, they can stay in the multi tenant environment for longer. And as we're moving things out, we're reducing risk for those things.

It feels magical.

Ashish Rajan: Yeah, I was going to say like the way you describe it sounds very like almost like the snowflake, [00:31:00] the thing that I was thinking of that kind of goes away automatically because you're realizing that if you especially the three buckets you called out, I think they're pretty good because a lot of people still have resources, don't use any IAM at all. Yeah, definitely. Some way, maybe using some, even though they might give start with star, but if you get to a point where you have a defined the right access to the account, as you were saying earlier if you can define that, what that could look like and yeah, I think it's ties back really well into why you guys picked identity as a one because I was thinking that if I were to put a network security lens to that, I'm like, I, that would just be chaotic. They were like, every new firewall rule has to be going through security. Maybe if I want to keep my job, I guess I want to be like, I, you need to come to me directly from IP address one, two, three, four, or whatever.

So how has it changed? Now that we know, which one is to go with in the beginning, which ones to cherry pick, maybe not cherry pick too much. How's it been over the past couple of years every time you've had the time to work on it what's different now since the time you started working on

Joseph Kjar: it, what have you guys learned?

The main thing that's different [00:32:00] now is that we've gone through and we've built all of that migration tooling and we've really ironed out all of the core technical foundations that we need to move real life workloads. Like Patrick was saying, it's easy to discover which. AWS services, an application uses, it's not so easy to discover all of the resources that it uses.

Because you can see from the last accessed info that, oh, this application is using S3, but which bucket? Oh, actually. Because you need to authorize, get object requests, make sure those still work when you move the role. Yeah. But those don't show up in cloud trail. No, you probably don't have data events for us three and able to cross your organization unless you've discovered a hidden money tree so that part has taken some time to figure out, how do we create all of the necessary app to resource mappings?

Some of our colleagues gave a great talk on AWS SDK instrumentation, [00:33:00] which is the thing that we lean on primarily for those things that are typically only visible in data events.

Ashish Rajan: Oh so instrumentation actually shows you either, I'll definitely find the talk and link it up. Yeah. Do they refer to how you can identify objects or S3 or specifically what resource or what object within a resource?

Joseph Kjar: Yeah so basically our custom AWS SDK instrumentation just emits some carefully curated and selected logs. That tell us based on the request going out from, say, the Boto3 library. Yeah. Which bucket was that targeting?

Ashish Rajan: Also because I can already see the benefits of it as well, like from an incident response perspective, sometimes you're trying to figure out what bucket is being called, but all in CloudTrail is an S3 bucket.

You don't know what object, what, you can't see the kind of request.

Joseph Kjar: I t wouldn't even show up in CloudTrail. It's not even there because you're doing a get object. You're not doing.

Ashish Rajan: From the application. You're not even doing it.

Joseph Kjar: Oh, yeah, you're not doing put bucket policy management action.

It's just get [00:34:00] object. And so we have S3 access logs as well that we lean on for S3 specifically, but then the SDK instrumentation for other things like SQS where send message, receive message, none of those things that show up. So going back to the, to what's changed we found ways to stitch all of that data together and built up our migration tooling to the point where we can curate a batch and say, we want to migrate this set of applications.

Hit the button. And we go get a drink and a hundred plus real life migrations for workloads happen self sufficiently.

Patrick Sanders: This does all sound very magical. So I want to bring things down a bit. This doesn't solve all the problems, right? It doesn't solve the network perimeter problems. It doesn't solve, lateral movement on the network.

It doesn't solve, like multi tenancy kind of things in a container platform perspective. So yeah, container escapes, that sort of thing. You got to hedge a little [00:35:00] bit.

Ashish Rajan: We do need I guess your point it's refreshing to know that it, the cloud native capability that a lot of DevOps people and cloud people talk about, which was like, Hey, I can do automation, reduce my time for deployment, blah, blah, blah. You're able to enable that insecurity as well. It's like a final, like almost a, to your point, it is magical for a lot of people who might be listening or watching going, is that even possible?

Like, we're security at the end of the day, right? We don't do that shit. That's not us. So in terms of, I guess the skillset within teams as well, because obviously people who are listening are individuals or trying to part of a team, they want to bring this conversation to their teams and go, Hey, we should probably look at something like this.

What was some of the automation component like Terraform, I imagine, or what were some of the foundation pieces that enable that automation for this wizardry to happen?

Patrick Sanders: It's not necessarily an automation kind of thing. But one thing that's worth calling out is a lot of companies. We were talking about this earlier.

A lot of companies are like starting out in a container world.

Ashish Rajan: Yeah. Yeah. And

Patrick Sanders: they don't have this baggage of [00:36:00] EC2 that we have. Yeah. And, if you're using EKS, this becomes a lot easier because you can use I think you can use IAM roles for service accounts. That's right. a similar cross account

Ashish Rajan: Yeah.

Kind of thing. And then, they're going down the part of the container can have an i, it's like hard identities port as well.

Patrick Sanders: I don't, I'm not very familiar with that world. Yeah. But I know that there are options there. Yeah. So you don't have to have a software engineer on your team who can develop a, an IMDS proxy to do this for you.

Yeah. Yeah. Fair.

Ashish Rajan: Would you, I guess it's an interesting one 'cause I think most of the conversations that I'm having today, most of them are looking at Kubernetes and container first workload. So for some reason, a lot of them tend to lean on open source kubernetes, native solutions, rather than like an AWS native solution for doing identity, network security, all of that.

I think I was talking to someone about a project called Cilium, which does network security. It's basically like a If I wanted like a mutual TLS between two kubernetes ports, I'll use something like a Cilium as an open source,

Patrick Sanders: Its that like a service mesh?

Ashish Rajan: Yeah but then what their call out is that, Hey, we do [00:37:00] network security for you because the service mesh itself is by default, not secure.

Everything is not by default secure. I was like, you're like, where do you start? There's a long list. But. And that's where maybe the complexity of that's coming in from as well. But it's good to hear that even with the baggage of people who may have been using EC2 for a long time, they're still able to use that.

The principle should not really change technically. You still should be able to apply the same to containers in terms of identity proxy. So IMDS proxy, you should still be able to do that. Even if it's containers or kubernetes or because at the end of the day in the AWS world, it's the IAM role that decides what can I do as an application.

Yeah. Was there anything else that you wanted to cover from your talk perspective that I should ask?

Joseph Kjar: There was a question you asked earlier that I wanted to explore a little bit more because you had asked about how we think about this approach for people who might be facing similar problems.

Oh, yeah, but also for folks who might have be coming from a greenfield environment Oh, I see we haven't covered that .We never really touched on that if you're coming from if you're starting fresh Yeah, there are so many new tools In the AWS toolbox that [00:38:00] we just didn't have that Netflix didn't have available over the years.

And it's too hard to retrofit. And yeah so things like a resource access manager for sharing subnets that has totally changed the game for, best practices of designing your multi account network. And that's just one example, right? There are so many things that you can look at nowadays to make things a little more streamlined so can't say I would recommend exactly what we're doing for greenfield. But I am encouraged and intrigued to see this concept of identity segregation from compute and other types of infrastructure becoming a more common pattern.

So just At this last fwd:cloudsec, there was an interesting talk about segmenting identities for Kubernetes workloads and it had a very similar flavor and feel to like the design principles behind why we chose to do this for IAM roles in AWS accounts. And so I think it's perhaps a [00:39:00] pattern that's gaining a little bit of traction and in a general sense.

Ashish Rajan: Yeah.

Joseph Kjar: And the more workload management platforms, be they things like Kubernetes or things like Spinnaker that manage EC2. Yeah. The more of those things that start to acknowledge or support this mode of operation out of the box the better. Yeah. So really curious to see where things end up with that.

Ashish Rajan: Could you describe the resource access manager and what do you see as a use case, which can fit into a migration thing as well?

Joseph Kjar: This one is I think easier to describe as like olden days versus now, used to be you had to provision VPCs in every account and then find some way to link them together, right?

Straight VPC peering, transit gateway all those mechanisms to allow connectivity from one to the other. No matter how you do it, you have VPC proliferation. You have actual VPCs that exist in all of these independent accounts, and managing them is a non trivial problem.

With Resource Access Manager, you can have a central networking account, and [00:40:00] your network team can create and manage all of the VPCs and subnets there, and then use RAM to share those subnets out to remote accounts. So an operator in a remote account can go to deploy infrastructure, and AWS will show them this subnet is available even though it doesn't live in their account. It looks like it does but it lives in The network team's centrally managed place. Yeah. Totally different paradigm from the way things used to be. But opens the doors to a lot of new patterns.

Ashish Rajan: Maybe that's where I heard the whole firewall appliance thing.

They're like they can put a firewall appliance in one VPC and control all of them. It's just a.

Joseph Kjar: A common thing we see in the cloud service providers, right? You start with a bare bones functionality, VPCs everywhere, peer them all together. Then things evolve to the next stage of, Oh, you should have a transit network.

Yeah. And that allows you to centralize egress and put your firewall in one place. And then it's you can go another step further and centrally manage all of the network infrastructure, but make it look like it's present in the remote [00:41:00] accounts, even though it's not. And so it's really cool. And that's your greenfield advantage, right?

Yeah. So don't squander it.

Ashish Rajan: I guess to also to add to what you were saying as well, for people who are in the brownfield as well, they have to outweigh Is it really worthwhile putting using the newer capability, whether it's transit gateway or the RAM service or whatever, is a really truly value in say Ashish is spending however many hours transitioning, transmitting all these, getting rid of VPC peering everywhere in another thousands of accounts we have to be this shared subnet.

What's the like, how dramatic would be the change? And what if AWS changes something else tomorrow? Oh, like we go back to square one. Is there a balance to be found there as well?

Joseph Kjar: Yeah, I think there is. That change happened recently, right? VPC lattice subnet sharing has been out for a while, but now there's this whole lattice thing.

It's rare that in brownfield environments for large customers, it's rare that it's the [00:42:00] right thing to do like a complete overhaul of all of your networking infrastructure.

Yeah,

That's a tough sell for everyone. And so it really comes down to those fundamental questions of what are we trying to protect?

Is it, how hard is it for us to create the network boundaries that we need? And what operating model for our engineers are we trying to foster and unless you're like completely unable to deliver on both of those things, you're probably not going to want to switch over to the latest and greatest.

AWS network tech just for the sake of it. So it can be useful to find pockets where you can explore those things and get some value from them, but we're probably not going to be converting AWS prod to VPC Lattice anytime soon.

Patrick Sanders: One, one nice thing about. The work that we're doing is of decoupling identity from like network and compute placement is that maybe we could, yeah, if we do the math and look at the risks and everything and determine that it's worth it, it's way more feasible now that we don't [00:43:00] also have to worry about migrating an app resources and dealing with what we're dealing with right now. Yeah, separating those concerns entirely. Yeah. Gives us more flexibility.

Ashish Rajan: I think my biggest takeaway from this conversation is the fact that I love the self service angle where you're almost probably the right way to put this is that you've done the thing which security should be doing is just to enable developers to do what they want to without feeling that security is a blocker while still having a acceptable risk spread across the organization's cloud footprint that you feel, oh, okay, you know what? I get it. I do want to enable them, but at the same time, I do want security as well. How do I find the balance? I think maybe a takeaway for people over here could be the fact that, hey, if you are looking at any challenge to, I'm going to quote what you said about find what the important risk for the organization is and use that as a stepping stone for, okay, based on this, what can security do to enable a developer friendly environment?

I think start from there. It could be network for all, could be VPC lattice Oh, we should use Lattice [00:44:00] or Ram or whatever. That's the ideal way. I feel like that's one takeaway for me for this conversation. I think I'm pretty excited to hear your talk tomorrow as well.

I'll probably put that in the description when it comes out as well for people to go and hear the whole talk as well. Maybe the previous episode too. That's all the questions I had, but I have three fun questions. If you guys remember two years ago, I wonder if it's if it's changed first one being, what do you spend most time on when you're not working on IAM proxy kind of one. Let's start with you.

Patrick Sanders: I think honestly, mostly dogs lately. How many dogs do you have? We have two dogs that are our own and we're fostering another dog and we also have a cat.

Ashish Rajan: Oh, wow. You have a farm then at that point.

Patrick Sanders: Very small house. I live in L. A. There's a farm in L.

A.

downtown.

Joseph Kjar: What about you, Joseph? I live outside of work as. Patrick mentioned blessed to have two, two young kids at home, both, three and under. Yeah. So keeping that keeps us busy. It keeps us entertained. It keeps us smiling and laughing also drains our time but [00:45:00] in a more than acceptable way.

So enjoying that stage

of life.

Ashish Rajan: And maybe I said I've got a later one then what is something that you're proud of that is not on social media.

Joseph Kjar: I'm proud of the work that we've done both together and as a team, don't always get to share all the fun stuff that we do, but this is a great privilege to be able to talk about it here and tomorrow in our talk and proud of trying to navigate a challenging new stage of life.

Keeping up with the career demands. Yeah, that's been a big adjustment for me and something that still very much work in progress Like defines life.

Ashish Rajan: It's like this could be a deep Like we have we had productivity in the beginning and now we're ending with hey life always a in progress story. What about you Patrick?

Patrick Sanders: I'm gonna go deeper

So I've been really reflective lately because I'm coming up on five years at Netflix in a couple weeks and five years is a very long time. And I'm just, I don't know, I'm pretty blown away by the people I get to work with and just [00:46:00] like the amount of empathy and compassion and skill also that everybody has.

And I just, yeah. I'm really proud to be a part of such a good team and like these five years have been hard at times, easy at times. There's been a lot of therapy. , , it's, but it's been really good and I'm just, I'm really happy to be where I am.

Ashish Rajan: That's awesome, man. Yeah. Good to hear. From both of you, it's very.

Sounds like a great place to work for as well. Final question. What's your favorite cuisine or restaurant that you can share?

Patrick Sanders: I like food so much.

Ashish Rajan: This is a hard question for us. If you were to think of stuck on an island, only, but you can only carry one meal with you or one cuisine with you, I guess it's not like that.

Patrick Sanders: We have another hour,

Ashish Rajan: but you only get 30 seconds, it's like squid games. You only get a few seconds to decide.

Patrick Sanders: I'm gonna go with it. It changes day to day, but I know I'm gonna I'm gonna go with Chinese. Oh overall There's [00:47:00] so much like diversity within Chinese cuisine, yeah a lot of different stuff to try

Joseph Kjar: So I think

Patrick Sanders: that you want varieties in the island.

Ashish Rajan: You'd

Joseph Kjar: want to I

was gonna cheat and say Asian fusion. Oh, yeah because Fused with what? Everything.

Ashish Rajan: So if I was stuck on the island, I would like Asian fusion. What is that? What is that? It's like being Cause you can lean on any of the Asian cuisines and mix it with Steak on the other side. Take on the other side. Wait, do you have a Texas slow cooked meat on one side mixed with noodles?

You're like, that's fusion. Yeah.

Joseph Kjar: Keep my options open. Yeah. Yeah. Fair.

Patrick Sanders: We want have a very luxurious island .

Ashish Rajan: I'm like, I thought should change the question to if you were stuck in an island, you get one meal. Yeah, that is definitely, and it makes you I can sense both of your foodies as well, so I think it'll be a really hard call.

I think I go through seasons myself. I think currently I'm sitting on a Japanese season at the moment, I feel. [00:48:00] Where it's but it's hard to find good Japanese in Vegas. So I'm gonna keep looking, but unless people have recommendations, definitely let us know. Yeah,

Patrick Sanders: let me know if you find any.

Ashish Rajan: I would definitely let you know, because I'm like, I'm still struggling. Find one, but they can people find you on the internet to talk about more of this? IDM proxy? Greenfield. Brownfield.

Patrick Sanders: I am sometimes on Mastodon on Patrick Sanders. Oh, yeah. On infosec.exchange

Ashish Rajan: is that what Mastodon IDs are like at InfoSec Exchange?

Patrick Sanders: Is that something like that? Okay. Yeah, okay. I think I'm I also have a blue sky. You can probably find me if you search my name. Yeah. Fair. Okay. I don't use social media much. Yeah, fair. Joseph doesn't use it. I don't, too busy with the dog. Yeah, dog and cat. .

Joseph Kjar: What about you Joseph? The one I'm most likely to glance at on occasion is LinkedIn.

Oh, fair. Okay. I'll add that in. So yeah, LinkedIn, Joseph Kjar, KJAR, also on Mastodon. Same as him. We share it.

Patrick Sanders: might as well. Also cloud [00:49:00] Security forum. Oh yeah, the side work space.

Ashish Rajan: And fwd:cloudsec as well. I think. Shout that as well. Come see us.

Patrick Sanders: There's one coming up soon. Yeah. In Denver.

Oh, I should really look at the dates before. Check out the fwd:cloudsec website right? fwdcloudsec.org Yeah. And we'll link it in the show notes. Yes.

Ashish Rajan: And we get the description for Denver in June July. June July, middle of the year, middle of 2025. I'm gonna get so much shit from this

Chris is gonna be like, what is going on guys? You should be doing this on top of this. No, but I appreciate both of you guys taking the time out for this. I really enjoyed the conversation. I look forward to more of these, hopefully around a couple of years. Maybe another end of the year. Yeah.

We'll get on it, but I thank you so much for coming on the show. Thank you. Having, I really appreciate that.

Patrick Sanders: Love chatting with you. As always man, always a good time.

Ashish Rajan: I'm great. We had a few laughs as well on how would we choose the cuisine if you're stuck in Ireland. I'm the Asian fusion angle next time.

Perfect. All right, guys, thank you so much for tuning in. We'll see you next time. Thank you. Thank you so much for listening and watching this episode of Cloud [00:50:00] Security Podcast. If you've been enjoying content like this, you can find more episodes like these on www. cloudsecuritypodcast. tv. We're also publishing these episodes on social media as well.

So you can definitely find these episodes there. Oh, by the way, just in case there was interest in learning about AI cybersecurity, we also have a sister podcast called AI cybersecurity podcast, which may be of interest as well. I'll leave the links in description for you to check them out and also for our weekly newsletter where we do an in depth analysis of different topics within cloud security ranging from identity endpoint all the way up to what is the CNAPP or whatever a new acronym that comes out tomorrow.

Thank you so much for supporting, listening and watching. I'll see you next time.

More Videos