What is SRE & when do you need it?

Tim Heckman
Tim Heckman
Senior Site Reliability Engineer, Netflix


May 17, 2020

About This Episode

Like this show? Please leave us a review here — even one sentence helps! Consider including your Twitter handle so we can thank you personally!

What is SRE & when do you need it?

May 17, 2020
Season 1
Tim Heckman

Tim Heckman

Senior Site Reliability Engineer, Netflix

About this episode

Like this show? Please leave us a review here — even one sentence helps! Consider including your Twitter handle so we can thank you personally!

Episode Description

What We Discuss with Tim Heckman:

  • What is SRE?
  • Is it helpful to have SRE team when you already have a Security team?
  • What does Security in Netflix look like?
  • How can people scale maturity in security when dealing with cloud and multi-cloud?
  • And much more…

THANKS, Tim Heckman!

If you enjoyed this session with Tim Heckman, let him know by clicking on the link below and sending him a quick shout out at Twitter:

Click here to thank Tim Heckman on Linkedin!

Click here to let Ashish know about your number one takeaway from this episode!

And if you want us to answer your questions on one of our upcoming weekly Feedback Friday episodes, drop us a line at ashish@kaizenteq.com.

Resources from This Episode:

Recommend a topic

Partner with us

Join the team




Ashish Rajan: [00:00:00] So I would love if you can adjust too, for people who haven’t heard it a few before. Who is Tim Heckman? Yeah,
Tim Heckman: [00:00:06] so I’m Tim Heckman. I’m currently a site reliability engineer at Netflix. So no longer in the security space anymore. But, in previous lives when I worked at startups on smaller teams, you definitely wore many hats and, and dealing with infrastructure and application level security was one of things I did for many years, to, to Starbucks.
I worked that. Yeah. So in recent years, I’ve, I’ve been doing less than security side and more on the reliability, not just like the systems, the code itself, but the humans interact with that system. how does that work with them? Is the system empowering them to, you know, not be on call and get paged overnight?
It doesn’t help them not have a stressful job. And so a lot of the stuff recently has been looking at it from a different angle of not just the code and infrastructure, but how do the people interact with
Ashish Rajan: [00:00:43] that? Yeah. That’s an interesting one for me that it’s harder you differentiate between, so I understand you were from insecurity before.
Now you’re an authority now. Sorry. Seems to be, I guess it’s the job of the year or job of the decade, I guess, but a lot of people don’t [00:01:00] even know what that really entails because I feel there’s a lot of security. I went in there and. But it’s just not called security. But I’m just curious to know from your side, you mentioned the human side.
so what is , but you like, what is it? What’s the day to day like?
Tim Heckman: [00:01:17] Yeah. I think SRE is one of those challenging things where each company defines it differently. You know, Google SRE is much different than an accessory. We’re at Netflix. We don’t really own any services. We’re not contributing to the code bases themselves or having direct access to the repos and doing things for teams.
Right. We work with all of the teams together. Right. To kind of build a story of what we’re going to do, what’s important to focus on, or what are some patterns we’ve learned when things didn’t go well, that we can apply across the board. And so we’re looking at more of the, instead of, Hey, we own the service, or we own these handful of services, let’s make sure they’re doing well.
We’re looking at basically the entire Netflix ecosystem, all the thousands of microservices to be run. How do they all interact with each other? and how do the systems interact with those and the humans? Then how do they interact with all thousands of those themselves? and [00:02:00] so what’s a little different I think is that,
again, we don’t really focus on the systems themselves. Netflix expects individual teams to own the reliability story of their service, so we’re more of a partner in that process versus an owner of that process.
Ashish Rajan: [00:02:15] Oh, it just kind of like the security role as well, where you’re not really the owner of the team, but you want to help them patch all these things.
So yeah, I guess I get the human psychology part. Now I’m going to start with questions that I normally started with everyone, which is what is cloud security, in your opinion?
Tim Heckman: [00:02:34] I think it depends, right? Cause you could break it off into many segments. I think overall it’s just understanding your footprint in the cloud and where the different protectors are.
You know? Definitely knowing where your ingress points are, where you’re regressing data, where you’re storing data. I think it’s having that full map of the system to understand where those risks are. You can’t always mitigate all those risks, but one of the things I think is being able to identify them so you know where they are, know what to look for.
Ashish Rajan: [00:02:57] Oh, right. Okay. So it’s not just, it’s not just a [00:03:00] matter of. Okay. I like the definition. I kind of liked that, how you took a different, different approach to it. so in that case, would you consider that multicloud, which is, I guess. Do you have, to your point, using data, you’re using information from the cloud for it.
How do you define multicloud then? Is it like multi-facets or multi cloud provider? How do you take that?
Tim Heckman: [00:03:24] I mean, I think in that case, if you’re doing that, you need to put an abstraction layer on top of it, like your, your systems can’t really think about them. and so you need to set your security posture to be something.
Able to be applied across all of them. and so in the past I worked at PagerDuty. We were multicloud when I was there, and some of the work we did was trying to make sure that Azure security ideas transpose into our ATPs stuff. And the things work well together. So a lot of it was building glue layers in between the different providers to make them behave the same in terms of our configuration and to make it look similar to the end user.
Ashish Rajan: [00:03:53] Oh, Oh, that’s the three. Interesting. And, I, I don’t, I don’t want to go into [00:04:00] the, PagerDuty talk that he went through, which was pretty amazing, by the way, for anyone who’s listening. And, Tim was down here in Melbourne for O us AppSec day, and he did an amazing talk, which unfortunately was not recorded for obvious reasons.
Yeah. Cause I was in that room making sure no one was recording it. but if you ever get to meet him in person, definitely ask him about that talk that he did. definitely worth your time. And I think one of the things that made me go down from that talk was that a, how do you respond to incidents in cloud where it’s like, you know, so I guess because.
Well, we do have different, I guess, audiences which are at different stages. Some people who are starting off in cloud security today or starting cloud as well for, and they just want to know how to do. How do you look at security in cloud? So starting from that perspective, and going all the way to the other end, which is kind of like, to your point about Netflix has every individual team looking at reliability and, you kind of have to all kind of always [00:05:00] see, not oversee, but like almost encourage people to do the reliability part.
so. How do you kind of translate security in cloud for people who are like, where do you, where does it start? I guess let’s go with that. Where did it start from the very basic, the very big mature guys starting today, where did they start according to the phase? Should they start?
Tim Heckman: [00:05:20] I mean, I think the one thing that’s always true with any sort of cloud solution is that it’s a femoral.
Those insistent go away at any point in time, whether it’s because of a security thing that causes you to want to kill them and get rid of them. Or just natural failure itself. and, and so because of that, you tend to want to architect things in a way that allow you to observe what’s going on with those individual systems.
So like exfiltrating the logs, right? Getting everything you can off of those systems as soon as possible, that only, that doesn’t help you, just insecurity, but that’s going to help your operations folks try to diagnose when something doesn’t go well, they’ll have some logs or maybe something that indicates, was it our stack that went bad?
Was it the VM itself? And so I think you can actually. Put those two needs together. Operational performance and support needs with the security posture of we need to collect these [00:06:00] logs and this data for both of these purposes, we should do well for both.
Ashish Rajan: [00:06:04] Oh yeah. And I think logging is probably the base basic steps that you can, anyone can do to make sure, at least they are the information that they can work on instead of just going off.
well, I’ll figure it out when an incident happens. I guess.
Tim Heckman: [00:06:18] Well, I think too, you know, you know, and I’ve had the case too, where like a VM dies and the way that it dies, it corrupts stuff on the file system. Like it’s not a clean restart. And so I think then too, you know, if, if you’ve mushed party on the VM making changes or something just goes bad, you kind of want to be able to export things off as they’re happening.
So you avoid going back and going, Oh, my file systems lost. I don’t have any forensics anymore. Or, you know, the logs were raised. And so I lost that process too. And so I think it’s about getting all of those metrics and logs off the box as quick as you can. So you have them elsewhere to look at it if you need them.
Ashish Rajan: [00:06:46] Yeah. And I think that makes, brings me to the mature guys because the mature guys actually have a third was withdrawn last for a very long time, or even use server less for Lambdas and stuff. And, how do you, I guess what’s your [00:07:00] viewpoint on that? Like as to how do they do security? Well, at that level when you’re.
Or maybe in a mix of easy two instances. And I’m going very specific to AWS cause my understanding is it Netflix is AWS as well, but for people maybe like Azure or however, but then the comparison would be if you have a server as well as a serverless technology, how do you kind of go about with security for that?
Tim Heckman: [00:07:23] The several is one is, so we’re trying to figure out right now internally that’s something we’re working on is kind of building our own platform on top of things like Lambda, because we do want to provide things like mutual TLS authentication between those workloads and the backend services, right?
And those are hard to do with those prebuilt services on AWS, for example. And so there are definitely things you’ll have to do depending on your security posture and how mature you are that you might need to build your own just because you want to do something mutual TLS authentication and you have no way to like give those workloads a secure token to validate them.
so that one’s a much harder question to answer. I think for the VM ones, you know, if you’re treating your infrastructure as a femoral, it makes it much easier, [00:08:00] right? Because you can just shut this down or isolated or capture the disc image and get rid of it. And that really isn’t a loss to your system.
You definitely would figure out what happened because morals get compromised if you don’t clamp it down. But I think ultimately. As you get more mature in the cloud, you start to treat these instances less like pets and more like Cabo, where it’s a herd of things that you’re trying to keep together and keep alive.
And you know, if there’s a massive like loss of them, that’s a problem. But things do happen naturally. and so you kind of want to start looking at the single ones and more of just this will get replaced automatically. The logs will get exhaled off somewhere else. I’ll have the forensics if I need them.
Ashish Rajan: [00:08:33] Yup. Yup. All okay. And I think to your point about, I guess VMware is probably easier also because you can probably, it hasn’t there can IPS some ideas as well, which is probably inspecting a lot of the information for you. But as a serverless doesn’t have that concept, quality’s not for the moment.
Tim Heckman: [00:08:49] No.
You’d have to basically instrument your debug logging all the way through your, your servers, like all your Lambda functions, right. And walk every single thing. But even then, there isn’t a guarantee that you’re going to get what you need.
Ashish Rajan: [00:08:59] Yeah, that’s right. [00:09:00] And I think, yeah, I can’t even imagine the mapping that would being more, if you have like five Lander functions that are kicked off by the same process and you’re like, which one was like, yeah, it’s a, it’s a, it’s a, it’s, it’s worse.
I think it was funny because I wonder if that’s the reason why Cuban and he’s kind of got a lot of popularity where it kind of gives you the illusion of serverless parents, not really serverless. That makes sense. Kind of like, you still have a portal, you still have a management portal, you still have a cluster, you have containers, but then at the same time you feel like, Oh, I just tried on a couple of commands.
I just does a thing.
Tim Heckman: [00:09:34] I think it’s that in like I can sh into that container and look at it if I really need to like there, there are still things you can do that feel like operating a system that you can dive in and really inspect the state of where things are. And that’s usually where weather, reliability, security focus.
Sometimes it’s really good just to get in the middle of the system and go, what does it look like? What are the things? See, and that’s much harder in the serverless things unless you build your code in a way that allows you to their log it or have hooks to send further data out, which isn’t very scalable.
You can only write code to be [00:10:00] maintainable that way.
Ashish Rajan: [00:10:01] Oh, so we spoke about operational security now, and so how do we look at this at an operational perspective? What about things like, during incident or, so I guess the way I see the stages is you have operational security to make sure that you have the right controls in place.
And I guess you have the right alerting in place. You have the right logging in place. Next step after that is there is an incident that happened. Alerting has kind of happened. what is that? are you, are you a fan of auto remediation or do you prefer to, I guess alert, inspect and then I guess
Tim Heckman: [00:10:39] action.
It depends, I think. I think there, it really depends on the problem you’re trying to solve with the automation. there are lots and lots of cases where it’s actually safer for the system to do that because you can have automated checks in there that goes, this metric has been nominal values, right? The system looks good.
where when a human is looking at it, and I’m guilty of this, I’m eyeballing graphs, I’m, you know, this feels okay to me. It feels right. [00:11:00] But also, which is more work for the people to worry about it, stress on them to make sure they’re running those commands correctly and they’re not making a mistake.
And so I think for a lot of things, it does make sense to remediate as much as you can. Automatically, but there are cases, you know, good examples of where that can go off the wire. but one example is GitHub at an outage, probably years ago now, where their database failover system just kept ping pong in the database back and forth because it thought it was bad.
Oh, it’s bad again. It’s bad again. Right? And so they built the system to look at things and to gauge whether that was unhealthy and it was just tripping back and forth. And so the system was fighting against them. And so I think it really depends. It takes a bit of maturity, I think, of doing it. Manually yourself and understanding what works before a lot of those complex ones so you can be comfortable with the system doing it itself.
Ashish Rajan: [00:11:41] Right. Okay. So sounds good. And I think just taking a pause over there for a second. So anyone who’s listening in, I think I’ve got about eight or nine people just thinking, anyone who, if you have any questions, feel free to, ask a question in the comment section and we’ll be happy to kind of post that to Tim.
the [00:12:00] same goes for Twitch as well. We have five people watching on Twitch. That’s pretty cool. Wow, that’s pretty good, man. considering I’m running from morning and Twitch, just pretty awesome. so. I guess the next question, so talk about borders security maturity look like. and I’m going to switch gears a bit and ask him a different question.
Is SRE something that people should consider or. At any scale.
Tim Heckman: [00:12:25] Yeah. I mean I think it goes back to my original answer, which is it’s hard cause you kind of want to understand what your personal interests are and whether the SRE role at the company is really that, some companies just retitle ops engineers as this or use, and they’re just doing operations work really under the scenes or behind the scenes.
Excuse me. And so it really depends if you’re, if you’re really interested in like delivering infrastructure and building out robust systems that are reliable, that sort of SRE role that’s titled that way, maybe perfect for you. But there are also folks that are more interested in the learning, how the human is working with the system and kind of the, the human factors and the system safety component of it, of just how does it all [00:13:00] work, versus I’m focusing in this one area, this one specialized domain.
I think it’s very important. I think it’s also, you know, even if you’re going to be focusing on one domain, you can still have those sorts of learnings. How are our customers interacting with our product that we’re building on this team? And how do we as the operators interact with that product when it’s in trouble or when it’s healthy?
And so you can still do that very localized, but I think. The sort of breadth and the scope changes depending on the company and how they define it.
Ashish Rajan: [00:13:24] Sweet. Okay. To your point, so I guess we’re, I know there have been a few titles that kind of floated around, like there hasn’t been a DevOps title then. Then there was this, I think, I feel a dev ops title has changed internet, sorry, title.
Would you, do you know. Do you believe, is that worth happened or is it always been a sorry from day
Tim Heckman: [00:13:45] one? I think there are. I think there are some places that have migrated their dev ops or operations team to be SRE when the company has grown, right? When they have more systems that are in play. I think they’ve, they’ve wanted to make that change.
But I think if the backfill those [00:14:00] people, you’re changing over, right? And if you don’t really change their responsibility or alleviate some of that from them, they’re just gonna keep doing the same sorts of work. yeah. And so I think there’s been some companies that have tried to make the move, but haven’t fully invested in that migration.
And those folks are kind of limbo state where they have this new title, but they’re still doing a lot of the old things they were doing before. yeah.
Ashish Rajan: [00:14:18] Oh, sweet. And I think the, the reason I asked that is because, I had a question from one of,, I think Steve, can’t remember his last name, but he was asked, is basically he’s, he’s been doing, I guess, operations for data centers, like actual data centers for a very long time, like almost 20 plus years.
And he was looking at transitioning into like, how does it look like in the cloud world? And do you. Like do you feel there’s value for old school? I don’t think it’s very old school. I mean, we still have plenty of data centers, so I think it’s fine. It’s sounds old school because we’ve had it for a long time.
Do you feel that is a space for them as well or for anyone in that skill set in the cloud space?
[00:15:00] Tim Heckman: [00:15:00] Yeah. I mean I really think it comes down to what they like, you know, does the idea of scripting and building in that process really seem interesting to them? Cause I think a lot of, even, you know, and Netflix, I don’t own things.
I still maintain and work on some of the tools we have as a team to make ourselves successful. And so I think ultimately there is still a need to do some automation, some scripting, maybe not full blown software development and a full project, but there needs to be some of that interest in some, that desire there.
And I think it’s more on the personal level. Do you as individual, do you, are you reached to that sort of work? Would you want to put the time in learning that and doing those sort of things? If you don’t really have an interest there, it’s not that there isn’t a place for you, it’s you gotta find something kind of fits your needs and really what interests you.
I’ve definitely, you know, I’ve seen folks in different backgrounds coming to SRE and do very well, whether it’s, you know, racking and stacking servers. There’s even folks on my team@netflixwhostartedansweringthephonesatnetflix.com. When we were selling, we were sending DVDs in the mail. They’ve been really long enough.
They’ve been here like 10 years that they started answering phone calls and, and they worked up through the rings of customer service then to come on our team to be necessary for the service. and so I think you can do that. I think it just really depends on what your interests [00:16:00] are and where you want to invest your time.
Ashish Rajan: [00:16:02] Wow. That’s awesome. Because I think that makes me think that what, so how. I mean, I guess there have been people from the beginning, then in Netflix, so it’s not just the founders.
Tim Heckman: [00:16:11] So it’s weird. We, because Netflix original business was sending DVDs in the mail. So you’d go to a website and I think towards the end of when that was really popular, 95% of America, you’d get the DVD basically next day.
and so you could watch her movie you wanted, if it was on the catalog. And back then it was rack and stack data centers. There’s no cloud. There was no auto scaling. We had to knock the precursor for our team that I’m on now was a knock when we, when we did the DVD things and we realized that the NOK model wouldn’t scale as much into the cloud because of the ephemeral nature of all the things.
Things are breaking all the time outside of our control. And so we kind of needed a team to look at the overall picture of that system on that provider to look at the patterns of what we can do to make it better. Yeah. And so I think, I think there definitely is a progression. You know, it’s just, it’s kind of expanding the skillset and increasing the breadth a little bit.
but I think if you’re interested in doing that for sure, there’s definitely a [00:17:00] path, for folks to go from sort of that, whether it’s rack and stack technician or knock, like function to think more about the system as a whole and reliability fashion.
Ashish Rajan: [00:17:08] Oh sweet. So anyone who’s listening in, cause I have a few people who reach out to me for asking about how do I get into cyber security, blah, blah, blah.
But I feel like because SRE has elements of security in there as well. And to your point about. You’re trying to drive humans to do the right thing. Do you, what do you feel is a skill set that people should be looking at for anyone who’s listening into this and wants to get into that field? Because I think, and probably should be looking into getting into the field, especially if they’re looking in the tech space, because that’s kind of like you’re seeing a lot more of SRE is coming in.
What does it take for someone to get a role within SRE role?
Tim Heckman: [00:17:48] It’s a good question. I mean, I was very lucky though that everything I did, you know, self-taught. I didn’t go to school or anything for this. It was kind of, you know, bang my head on the wall as a teenager, as a hobby being assisted admin sort of thing.
Right? And so it took many years and [00:18:00] many opportunities to do that. I think there are lots of places where you could dive in with a little experience and have someone invest in you really what? You have to be ready for us to be uncomfortable. Right? You’re not going to know. Do you feel like you should know or what you want to know?
and you’re going to spend a lot of time digging, trying to find things and understand them. And so if you’re comfortable being in a spot and working on a team where you’re, you’re okay admitting, you don’t know to ask for help, I think that can be one big thing that can be important. but also just then the desire to learn in general, right?
Every day. I’m dealing with like new systems, new microservices I’ve never heard of before that are now failing. And I have to understand what it does and how does it interact with the system. And so you got to be okay with a bit okay with a bit of chaos, right? The world and security to the world is going to be crazy.
There’s a lot of entropy out there and, and so you need to be comfortable with that chaos. And I’d even say some people that do really well in the role, they strive in chaos. Like those like chaotic moments are when like they go a hundred percent and then afterwards they’re super tired, but they’re like, that was great work.
You know? I got a really good feeling out of that. I work with a great team. And so if you’re the kind of person, like when things are on fire and [00:19:00] things are going crazy and you’re the person to like brush yourself off and get going, Esri can be a good role for you and you can have a lot of good impact on the people in the systems that you work with.
Being the sort of individual.
Ashish Rajan: [00:19:11] Oh, sweet. That’s a great answer, man. I think, it, it’s, it’s definitely a lot of chaos, especially in an organization which has to your point, grown from turning our DVDs in the mail to now probably a global service, which is, and I can’t even, I can’t even imagine the number of microservices you would have to manage.
And then just like what you get. It’s kind of like you find the needle in the haystack, but you need to figure it out where the needle fits into the rest of the haystack. So that’s, that’s how I describe it, I guess. But, it, to your point about the human psychology fight of how humans use systems and, or what’s your take on how can someone do this effectively.
And especially, I guess, cause you ha ha. I don’t know. How long have you been Netflix yet? How long have you been at Netflix?
Tim Heckman: [00:19:57] Two years. Six months.
Ashish Rajan: [00:19:59] Six [00:20:00] months or so. So in your experience, what have you found as a, the best way to approach human psychology? Kind of an approach to make people, not make people, but help people do things right.
Do the right thing.
Tim Heckman: [00:20:13] I think a big part of Netflix’s culture is that we trust folks to build responses and our relationships with each other, right? Like you, you’re supposed to have a conversation with somebody and discuss things, and so we build a lot of our tools to be guard rails, right? It’ll protect you from doing things.
It will say, Hey, you can’t do this. Go talk to security team. And it’s very much to be a conversation, not a blocker, right? Just to understand what your needs are as a customer of a security product. of course, to make sure that what you’re doing aligns the best practices. If there isn’t a better suggestion that team can make for you.
But our security team is not about being in the way. It’s about empowering a person to do what they need to do to do the job well. And so some of that is going, Hey, you want to do this thing that seems kind of weird. We have this little solution over here. Why don’t you go look at that instead? And there’s a lot of that sort of consulting that happens, but that ends up building a good relationship where it’s not some team going, no, you can’t do that.
Like, get outta here. It’s very much like, here, come chat with us. Come sit down on our [00:21:00] couch. You know, we’re gonna have a discussion about what your needs are. and we can’t go to the couch now because of current team. But before they would have spaces, people will come and chat with them in the office to go, Hey, I have this new year.
I was questioned. Does it seem to make sense to you? Is it safe? Could I do something better? And so we try to have the tools, you know, protect people, but not block them from doing what they need to do.
Ashish Rajan: [00:21:17] Oh, sweet. Okay. And I think too, do you find yourself also interacting a lot with the security team as well?
I imagine, yes. But just want to confirm.
Tim Heckman: [00:21:27] A little bit. Yeah. So one of like, so my team, we handle the incident response for like availability, reliability problems for the entire system. The one sister team of ours is the security incident response team cert. They do the same thing from security perspective.
and we’ve separated those two roles because we’ve, we’ve noticed that security incidents are much different than reliability and since they’re much more long tail, there’s much different needs of, of collecting forensics and, and the process is just different. And so we have isolated as the two separate teams.
They can build a process that works for that precise need, but we do cross share and cross pollinate with things we’ve learned in patterns. We found that work well just [00:22:00] so that we’re both keeping each other, I wouldn’t say in check, but finding ways to help each other improve as we’re doing incidents across Netflix.
Ashish Rajan: [00:22:05] All right. And to a point. I guess because you’re kind of laying the foundations of alerting mechanics and from the beginning and logging as well, hopefully, which means that they spend there and things to, I guess go shit phase, for lack of better word, that there is something being, there is something in place for you to kind of go investigate and find out what the root cause could be.
All right. Okay. And I think, just on the learning part from security and patterns, you mentioned guard rails, like what were, I guess. It’s always very broad, right? When people say guard rails, because it depends on the organization and everyone wants like a cheat sheet for what’s like cheat sheet for guardrails, but because it’s sort of different for the cost of the services you may use or maybe the application, is there like a thinking that you apply for or the, there has been a thinking that isn’t applied to.
[00:23:00] to putting guard rails on. Not something like, Oh, if you do, should I be, it should be encrypted. If someone makes the unencrypted plug, there’s an alert. I think it was more around the thinking, kind of like the human side of it. How do you guys approach the whole guard rail model? Can it just be like, can I just go and apply it anywhere or you have to go through like an RSC in the organization?
How does that work?
Tim Heckman: [00:23:21] Yeah. So Netflix, we’re big on something we call freedom and responsibility. And what that means is you’re free to make any technical decision that you would like to make, and you then are responsible for anything that comes out of that. and so if you incur a lot of tech debt, well that’s on you to kind of bear the burden of that and, and, and fix it eventually.
Right? And so we kind of use that same sort of model, even across security in many areas where I think a lot of folks that come in and go, wow, that’s a weird security posture. where we trust teams, individuals to do things. And. To make the right decision. Some of that is, you know, self service portals, right?
If we’re dealing with like roles and things, making the self service and so you can add permissions. It goes through security team to review and approve like a PR process, but they try to make it low [00:24:00] friction where you know, pings the on call engineer, they review it within a few minutes and you either get approved or you get a DM and the person’s like, Hey, I saw your request, let’s chat about it.
Right? some of it is around project generation. We have an internal project generator called new for new projects, new services. It scaffolds the directory for you. It gives you the dependencies you want with the default settings that are secure. It tries to set you up at a good foundational spot where, I mean, you could turn those things off and remove them if you really want to.
But it’s already done. Why would you do that? And so we try to make it so that when you get in and when you’re at your first step, it’s kind of where the security and the platform teams wants you to be. You can always deviate neuro checks and metrics and other alarms that go off and the teams will contact you if they see something weird.
but it very much is just, let’s try to make it so they don’t really need to see the security of there. And they know that there’s, you know, transport security in place and all those things are done transparently, so they don’t have to worry about it.
Ashish Rajan: [00:24:48] as so to. I guess take it another notch. I think it’s really interesting that that whole template model for having, like you used the word [00:25:00] scaffolding, I’m going to probably use the same word as then.
It’s more like you have already, you have a scaffolding ready for building a house and you just like. There you go. I mean, why would you not use it? I guess that’s probably the point. Like, no, you want it. You don’t want, you don’t want fillers in your house. You don’t want the hat thing to have a roof.
Think questions like that. I think. Do you find that model scales
Tim Heckman: [00:25:21] well though? I mean you have to invest in it. I think as long as you have a team of people, we have a developer experience team, right? There are folks who, their full time role is developer experience tooling that’s working with our Jenkins build systems.
This, this, this project generator. They do this full time. really? Yeah. So developer experience as a whole organization that exists in the company to build these sort of tools and make the extant easy for them. And so the project generator, when you need a project, it’s one command. You get your Jenkins build pipeline, you get your Spinnaker deployment pipelines, you get all of the, basically the whole thing is set up and it’s ready to deploy a web app with one button press effectively.
And so they spend their time making sure that works. As the platform changes and best practices change, updating the plan, the tool [00:26:00] to apply those instead. and even sometimes they try to do like migrations, right? Hey, let me migrate my project to the new best practice and have the automated, if it can do that.
That’s much harder as things deviate and you add more things, but there is some attempts of just, you know, how much can we automate or, or how much direct instruction can we give the teams to make that migration easier and not have it be something that to figure out for each team themselves.
Ashish Rajan: [00:26:22] Wow. Again, that that definitely makes.
Yeah, that makes sense. And to your point about if they’re let them, they’d already, I mean, I guess ongoing patching is probably automated at that point. I guess once the template is ready,
Tim Heckman: [00:26:34] some of it, a lot of teams opt into automatic dependency updates where they just have a job that runs weekly or whatever to update their dependencies automatically.
And then we have a separate system that scans all dependencies and things for outdated, you know, base AMI images, outdated dependencies, and just reached out to the team and goes, Hey, FYI, your services, you know, has a little red Mark next to it because you have some dependencies. Need updating. You should do this.
And so we basically run like deprecation cycles with teams and tell them, Hey, you need to update this service because it has, these [00:27:00] things are out of date. And we give them very precise descriptions of what needs to be updated to satisfy the check.
Ashish Rajan: [00:27:05] Oh, sweet. Cool. This is really good information. And then I want to switch gears a bit more, and go nontechnical for, for a few minutes here.
Has the whole Colbert thing changed. I know you’ve moved over to LA and beginning of the year, pretty much around the same time, the whole corporate thing was turning up, kind of sprinkled around the world. has the whole covert way of working changed. Your viewpoint on, I guess first as a change in the way you do your job, like the mortally versus being in the office.
the second one being, do you feel the security has kind of changed as well, or the way you operate has changed as well? Quite a bit.
Tim Heckman: [00:27:44] So in terms of just the, how work is different. Yeah, it’s something, a lot of Netflix is being in the office and having those. Personal discussions with people but isn’t really a meeting.
Right. The hallway conversations. And so I think it’s, it’s hard to, to, to organically do that when you remote. Our team was very lucky that we had some folks in the LA [00:28:00] office, some of the Bay area office, and then there was one person who’s remote in Indiana. And so we’ve kind of had a flexor remote first muscle, a little more than other teams had at Netflix.
And so we’re, we’re fortunate some ways that we’ve kind of practicing without knowing we were practicing for a few months. But definitely it has, you know, you know, I’m sure people that are listening have kids or family or friends that have kids and that’s what I’m packing them. They’re there now. Full time working full time childcare.
So I think a lot of what it is, because I don’t have kids, is, is picking up things the rest of my team so they can, you know, have that head space cleared out and not be worrying about work as much and just know that family stuff is super important. They should focus on that more than work and the rest of the team will help pick up the things that are kind of left behind when they do change their focus on that.
Yep. I’m a little, like, I’m a loner. Like I sit in my house, like I’m down to play video games all night by myself. And so for me, the quarantine hasn’t impacted me directly, but it’s definitely been tough watching my colleagues and my friends go through that impact and people that don’t have that same proclivity as I do and how it’s changed their lives having to be, you know, doing childcare 24, seven.
All those things. That’s more what’s impacted me is seeing others go through that. and so yeah, I would say it’s definitely [00:29:00] changed. how we’re working. We’re trying to be more cognizant that we’re, we’re gonna burnout faster. We’re going to be under a lot more stress, so we need to take more time off.
We need to take Friday and Friday afternoons or just Fridays altogether and things like that to keep ourselves healthy. So we’re really trying to be those sorts of things. just to not burn ourselves out and pace ourselves cause it’s a marathon, not a sprint.
Ashish Rajan: [00:29:17] Yeah, that’s actually true. That’s very nice of you as well, man.
You’re taking on the workload, some of the workload to help you out. Your friends help out, help already colleagues, because what I’ve heard longterm, like long, long distance schooling or homeschooling as some people are calling it as well. That’s super tough, man. It’s like, imagine doing a full time job and trying to teach someone or I mean in I guess in my case, my nieces or nephews or a neater around teach them like.
Says, that’s a tough job. Like you have kind of like the fork in everything that you do on a daily basis on the internet and everything for your work and you come back and you have to just switch off your, I don’t know, in your twenties or thirties brain and be like, I’m going to be a 10 year old or under 10 year old and be like, yeah, it’s a very different world, [00:30:00] man.
I’m sorry, go on.
Tim Heckman: [00:30:02] I was going to say like the one big thing, you know that all my coworkers called out is a much bigger appreciation for teachers and the staff at schools and those that help out with that process. I don’t think any of them under appreciated them, but I think there’s a much larger appreciation out of just, they deal with chaos so much and they deal with it so well.
And so I think everyone just across the board is very appreciative of those that help out. Yeah.
Ashish Rajan: [00:30:22] So I’ve been college at the time. We should definitely pay them more, like efficient, get paid more. I’ve got a, so this is kind of like towards the last section of my interview, just kind of like the fun section.
and it’s not a technique question, right? This is just like regular, regular day Joe question, which I know, I think I will ask you some of these questions already because I know you personally a bit as well. But, this is for the audience is familiar, but boy or better. So the first question is, what do you do?
Where do you spend most time on when you’re not working on cloud or technology?
Tim Heckman: [00:30:55] Ooh, so that’s a terrible question because I haven’t spent too much of my time working with technology, [00:31:00] and it’s actually what I do a lot of my spare time. And so I use the go programming language myself. It’s my, my weapon of choice.
And I’m one of the admins of the go workspace. And so I kind of help out around there answering questions. People that are newer to the language, help them learn to program in that, but also moderation and administrative tasks. And so I do spend quite a bit of time, it is sort of technically related, but it’s also, you know, community and, and people focused trying to help them learn language and ramp up there.
Yeah. That’s probably a lot of my time is spent from not doing that. Spend time with my wife or playing video games is probably where it’s happening on that. Ooh, what
Ashish Rajan: [00:31:29] are you playing at the moment?
Tim Heckman: [00:31:32] So it depends. I fluctuate like I have a really over the top flight SIM setup. and so, yeah, so I’m pretty excited for Microsoft Microsoft flights in 20, 20 like that, that’s looking good.
but when I don’t do flight SIM stuff, you know, the standard like shooters or those sorts of games, you know, things like rocket league or. battlefield, like those sorts of games. But rocket league is another one. I put my friends cause it’s just kinda like an easy game to play. You know, it doesn’t, it doesn’t get your stress.
Let’s get your heart pumping. You can go hang out on our phone.
Ashish Rajan: [00:31:59] [00:32:00] my wife and I are the God of Wars. The politics, like all the anger. This comes out as we go to war. the, the next question is, what is something that you’re proud of but it’s not on your social media?
Tim Heckman: [00:32:13] Ooh, that’s a great question. I mean, I think it’s just, you know.
Coming to Netflix and, and, being able to flex different skills and to do more of a leadership, be more of a leadership role, not a manager per se, but, our team is very unique that a lot of the engineering teams look to us for guidance and lots of areas. And so I didn’t really appreciate how over my career, how many leadership skills and patterns I picked up.
and really. I’ve had opportunity to use them without realizing it. So it’s kind of one of those things. You look back two years and you go, dang, I did a lot of weird stuff these past two years. Like I never would have realized I had done this work or, or help folks these ways and, and use skills that I didn’t have.
And so I think that’s what I’m most proud of is that, other challenges have come up. and over the years I managed to pick up and observe skills from great leaders that I could reuse. I’m really just copy and paste, to try to be successful [00:33:00] myself.
Ashish Rajan: [00:33:02] The dude, I think that a lot of us have learned like that as well.
Right. I think a lot of Australians are either are burned by solving our problems or we just found something on, hopefully not on stack overflow, but sometimes not on stack overflow, but it kind of sometimes seems to work. I don’t, I mean, I’m guilty of using stack overflow as well, even though I do, I guess, you know, beat Darrell people off on it, but I’ve used it myself.
Yup. That was, sometimes it’s works out. Last question. What’s your favorite cuisine or restaurant that you can share?
Tim Heckman: [00:33:32] Ooh, my favorite cuisine or restaurant. So I it, it was the weirdest thing I ever had. So it was more of a single dish. and so my wife’s Japanese went to Japan a few years back and I had lobster, so she me, which is raw lobster, just like cracked open in the back, pull out in a plate.
you eat it right there. That was really weird, but very good. and so it was probably one of the most, like. My end, like my brain was like, no, you shouldn’t do that. And then I ate it. I was like, Oh, I really [00:34:00] should do more of that, really. But it’s like, it’s like eating lobster, but it’s a lot more smooth.
It’s a little more cool. but it’s just like a different texture, but very delicious. And so I would say Japanese food, like a lot of the seafood is, is kinda what I, I met a Japanese woman. I’ve become a lot more in last few years, I’d say. but a lot of good stuff there.
Ashish Rajan: [00:34:19] Wow. I need to tell that wife.
is a draw fish to begin with, but then
Tim Heckman: [00:34:29] you have, it’s just raw fish. And so she, me in general, I was just, it’s just the raw
Ashish Rajan: [00:34:32] fish fish. That’s right. And so this was just that.
Tim Heckman: [00:34:35] Oh lobster. So they brought the lobster tail out, crack open. The weirdest part is it was still twitching. So like you picked it up with your chopsticks and the meat was switching and the chopstick.
That was a little, yeah, right. I was like, I don’t know about this.
Ashish Rajan: [00:34:48] As long as he did, as long as, as long as it didn’t Twitch in your mouth, I guess.
Tim Heckman: [00:34:52] I waited though. Stop. I was like, you know, I’m just going to give it a few minutes. You know?
Ashish Rajan: [00:34:56] Can somewhat hammer this thing just to make sure it’s dead or like, I don’t want to do that, [00:35:00] but can someone else do this?
Cause I don’t want it to carry you. The guilt with me that I just killed my own meat.
Tim Heckman: [00:35:06] It was just funny because my, like my wife’s family, they’re all Japanese are laughing at the white dude staring at the lobster drawing. You know, I’m just like, what is this stuff? And they’re laughing at me cause I’m just, you know, I’m the weird guy out.
Ashish Rajan: [00:35:15] was a. Oh, I’m sure it will be like, and plus I think lobster, it’s supposed to be like a delicacy in that culture as well. I mean, I guess in the Asian culture in general, cause you almost see them like just pounding on the lobster. Everywhere I go, every, all you can eat. I’ve seen lobster just runs out, but apparently his great days.
And he’s been amazing what I’ve been told. All right. that was the, that was what I had nine for me. This was really awesome. Thanks so much for sharing so much information about, I guess, the life of an authority and how do you use human intelligence and influence to kind of make things right for liability and availability?
Who’s really good, for people who kind of have follow up questions that probably were shy to put a comment out. Well they can, they reach you
Tim Heckman: [00:35:59] just to be up on [00:36:00] Twitter. T Heckman is my Twitter handle. So first letter, first initial, last name, and you can just send me a DM or a tweet or a Marin. I’m happy to respond.
Ashish Rajan: [00:36:07] Sweet. And I’ll put that in the show notes as well so people can reach out to you as well. which would go on the website, which is on, just kind of like a roller dice. I don’t know. People can see that. But when they see this on YouTube or something, they might just see it. So, yeah, dude, thanks so much for taking time out.
I can’t wait to have you again. I feel like everyone that I’m bringing on the show are like such cool people to hang out with. I feel like I could definitely hung out with you in person as well, but with the whole covert situation. So I’m just hoping that what should coffee becomes. Real coffees as Coby come out of COBIT.
So I’m looking forward to coming in LA, which kind of have to, if you’re coming from Australia, kind of have to come into LA. So I’m, I’m definitely picking you up, man, but thanks so much for taking the time. I really appreciate that
Tim Heckman: [00:36:49] they appreciate being here. Thank you so much. I’ll chat with you soon.
Ashish Rajan: [00:36:52] Thank you. All right, enjoy it.

Enjoying our content? Don't forget to subscribe!