Building an Incident Response Team for High-Growth Companies

View Show Notes and Transcript

In this episode, we sit down with Santiago, a Senior Security Engineer at Canva, to talk about the complexities of building and managing an incident response team, especially in high-growth companies. Santiago shares his experience transitioning from penetration testing to incident response and highlights the unique challenges that come with protecting a rapidly expanding organization.We explore the differences between incident response in high-growth versus established companies, the importance of having the right personnel, and the critical skills needed for effective incident response.

Questions asked:
00:00 Introduction
01:58 A word from our sponsor - SentinelOne
02:48 A bit about Santiago
03:18 What is Incident Response?
04:06 How IR differs in different organisations?
04:48 Red Team vs Incident Response Team
06:17 Challenges for Incident Response in Cloud
07:16 Incident Response in a High Growth Company
07:56 Skillsets required for high growth
09:14 Cloud vs On Prem Incident Response
10:03 Building Incident Response in High Growth Company
11:39 Responding to incidents that are not high risk
14:41 Transition from pentesting to incident responder
17:20 Endpoint vulnerability management at scale
25:32 The Fun Section

Santiago G: [00:00:00] If you're not prepared, sometimes trying to extract the information that you need for the investigation at that point can be very tricky because sometimes you have to go into the system to take the information out and that can lead to the evidence being contaminated. We see an incident and we think we need to fix this right away.

It has to be done right now. Everyone has to drop everything and work on this. And it's more about is that really something that needs to be resolved right now? What is the actual risk?

Ashish Rajan: Nine or 10 times, I think security loses the battle by coming across too hostile that this has to be fixed right now.

There's usually two kinds of people who would join an established incident response team and the others who help build an incident response team. I had the pleasure of talking to Santiago who is a senior security engineer with Canva on his experience of building an incident response team for a high growth company.

He has experience in an established incident response team before and he was able to use the experience to bring that knowledge on how you can build a incident response team for a high growth company that you may be working with. Now, in this episode, we [00:01:00] spoke about the challenges in the beginning, the skill sets required, and what are some of the transfer skills you could use.

Specially, if you have some pentesting background in the past as well, and what are some of the KPIs that you should be at least working through the initial round of getting some buy in from the broader folks as well. If you are someone who is looking to transition from an incident response team onto a managerial role, like building an incident response team yourself, or thinking about just that next career progression for yourself from a pentester to an incident response capability builder, then this is the episode for you.

If you know someone else who's probably thinking of this as well, feel free to share the episode with them. And as always, if you're watching and listening to the Cloud Security episode for the second or third time and have been finding it valuable, I would really appreciate a subscribe or follow , if you're watching this on a video platform, and if you are listening to this on Apple iTunes, Thank you so much for keeping us in your ears and I appreciate a review or rating that you can drop us on iTunes, Spotify. It really helps more people find us on the podcast platform as well.

I hope you enjoyed this episode with Santiago. I will see you in the next episode. Peace.

We interrupt this [00:02:00] episode for a message from this episode's sponsor, SentinelOne. As cybersecurity professionals, we constantly seek ways to enhance our threat detection and response capabilities. SentinelOne's Purple AI uses advanced AI and natural language processing to streamline threat investigations and provide actionable insights.

It's designed to help your security operations team work smarter and faster. If you are interested in leveraging AI to boost your sec ops, Purple AI is worth exploring. Check out more details at SentinelOne. com slash purple. Now back to the episode.

Ashish Rajan: Welcome to another episode of Cloud Security Podcast. And today we have Santiago. This is going to be an interesting conversation because he has had some experience in incident response. And we're going to talk about what does incident response look like when you are an established organization, high growth organization.

But before that, Santiago, welcome to the show. Thank you for coming on.

Santiago G: Thanks so much for having me, Ashish.

Ashish Rajan: Not a problem, man. And maybe to kick things off, could you share a bit about your experience and where you started and where you are now?

Santiago G: I started my career many years ago as a penetration tester.

I did a bit of work in threat detection, and then I [00:03:00] moved on to incident response. I think I spent most of my career in incident response teams. And now I'm working in the internal system security team, which is the equivalent of corporate security.

Ashish Rajan: The topic today is talking about incident response in high growth companies, and maybe perhaps a sprinkler difference of what it looks like in established companies.

What is incident response to level the playing field for a lot of people? I guess obviously everyone has a different definition of it. How do you describe incident response to people?

Santiago G: I would say that it's like a collection of Tools and Procedures to respond to issues that pop up in companies that require urgent attention.

And like the most popular cases of incident response can be things like leaked credentials or leaked customer information. And there's been like, for instance, many cases of ransomware as well. So those are some types of incidents that an incident responder will respond to. And what they do is basically try to understand what's going on.

Try to stop the bleeding or try to stop the issue that is happening. And then after that's been stopped, [00:04:00] try to make sure that any changes, there's some change in the organization so that this thing doesn't happen again.

Ashish Rajan: And how would you describe the incident response being different between say an established company?

You've worked in a big corporate before now, you're in a high growth company like Canva, how would you describe the difference in incident response on both ends?

Santiago G: Sometimes it's a bit in scale because Canva being a high growth company, there weren't too many services when I started doing incident response.

So if we needed to do something that affected a lot of people, services, it was very manageable, whereas at my previous company, like it was a very big company. So we had to use automation and a bit of programming and things like that to make sure that we address problems systematically so that everyone across the company, all of the services that are affected by an issue were resolved properly

Ashish Rajan: Because there's a whole concept red team in bigger corporates as well.

Unlike a high growth company where sometimes there might not be a need for a red team. How different are they as well? Cause a lot of people think incident responders [00:05:00] and red teams are the same thing. Are they different? And how different would they be in a corporate kind of world?

Santiago G: Yeah. Like they're very different because what Red Team is trying to do, they're trying to emulate the tactics of attackers. Yeah. So they try to get into like maybe the network. If they try to get into someone like one of the employees laptops, they try to gain access to some objective, which can be like customer data or a particular system that has a sensitive data, and then they show all of the attacks that they use in their path to get to their objective, how successful they were. And they usually make a suggestion of this is how you would stop something like that. Whereas incident response is more from the other side. Once the red team exercise, but ideally incident response should catch the red team while they are doing their exercise.

They're there to respond when they see, for instance, that system that the red teamer went into that. That's got sensitive data shows some anomalous behavior or something like that to investigate. Then from there, try to recreate all that [00:06:00] happened, which can be a bit complex at times because it depends on like how much evidence there is.

And that is partly on, do you have the proper tools and processes in place to collect evidence. Did the attacker leave any evidence? Because obviously attackers can try to cover their tracks as well.

Ashish Rajan: It was very interesting because I think a lot of people don't even think about the fact that, Hey, we probably should be able to recreate what the attacker did.

It's an interesting world as well, because on one end, I agree, but also the, on the other end, I think about from a cloud environment, but I feel most things just disappear. They're like, what are some of the challenges for incident response in the cloud world?

Santiago G: Usually in the cloud, like you are dependent a lot of times on the infrastructure provider and providing you the right tools to carry out investigations if you haven't deployed anything.

If you haven't set anything up, it's not like you can go into the data center and pull the hard drives and then image them. You're dependent on the tools that they give you. And if you're not prepared, sometimes trying to extract like the information that you need for the [00:07:00] investigation at that point can be very tricky because sometimes you have to go into the system to take the information out and that can lead to the evidence being contaminated, or you can run into problems where like the functionality that you need to get the data is not really available to you.

Ashish Rajan: If you were to build the incident response for a high growth company today, what would you do?

Santiago G: I think if starting a team from scratch, the main thing to get it done right, I would think would be to get the right personnel. I think there's a lot of security professionals and what you need to do is make sure that the ones that you're building the team with are professionals that are very independent and autonomous because in the high growth, a lot of the time you're going to have to say okay, I need you to deal with this. And I deal with this and then whatever comes next, someone else needs to deal with that. And you don't always know what's coming next. So you need people that are able to work together, but at the same time be autonomous in what they do.

Ashish Rajan: What kind of skillset people should be looking at as well?

Santiago G: For incident [00:08:00] response, even though it doesn't seem very obvious, that one of the main skills is communications, having good communications, because usually you need the incident responder to take control of the incident and coordinate with the different people involved in that.

And a lot of times that will involve depending on the scope of the incident, potentially even notifying like senior leadership. So you need someone that can speak to the technical side of it, to engineers, and also like to be able to speak with the legal team or maybe like the CTO, CEOs and things like that.

Communication is very important as far as technical skills. You could go in many different ways. Just having like good general security knowledge is really good. But specifically for incident response it's also good to have still seen forensics to be able to investigate what happened. And that can be like mainly two fields, like either endpoint forensics or being able to determine what happened in a machine or something like network forensics, where you can look at logs, network logs, and determine like what path an attacker [00:09:00] took. Something that has been really helpful for me throughout my years as an incident responder was being able to do data analysis, which is also another skill that's not very obvious.

Just because sometimes you get a lot of data and you need to find the signal in the noise.

Ashish Rajan: And would you say the separating of single from the noise, are there things in addition to this you find are different about cloud? Cause you've done this in on premise as well. Is there specific things about cloud that people would need to consider as they look into becoming an incident responder in that kind of space?

Santiago G: I think it's less about cloud specific and more like about companies that are very dependent on SaaS applications that have a lot of SaaS application footprint, and it's being able to reach out to an external company to get help with an incident, because sometimes you need someone from their side to take some actions.

It's fairly easy to try and convince someone in your own company to do something, if you have the right culture in the company, but it's much harder to get someone from another company to help you [00:10:00] if you need something urgently in the middle of the night, for instance,

Ashish Rajan: If you're dropped into a high growth company today. What should be the first thing you would like to build on?

Santiago G: I think it's very important to have a good framework to work within. And I can reference two different things here. And one is having a good incident response process and framework so that everyone's aligned on what the stages of the incident are and what you do at every stage and how that looks in that specific company.

And the other is having a good framework for categorizing data criticality. And that's more like a risk compliance thing, but I think it helps about incident response to deal with incidents. When you are better equipped informing people of what the risk is of something, because it's very different to respond to an incident where there's public data involved, let's say that your company put out some code and it's on GitHub and there was a release that went too soon, that's a very different incident from potentially let's say that some customers have their emails exposed in a spreadsheet. The data criticality [00:11:00] is going to help you not only prioritize which incident is more important and which one you should respond to first, but also get people to watch it more quickly because you can then see something like SLA saying if there's an incident involving this type of data, we need to contain it in this much time and resolve it in this much time.

Ashish Rajan: Actually, that's a good point because I think it's all about speed as well. When it comes to incident response, how quickly can you get into a system? A, that is access, but also how quickly do you get the context of, Hey, is this a high risk, low risk, or what am I doing here? Is this a point where I raise a fire alarm or is this a point where I just continue investigating and let people know instead of waking everyone up at 3am in the morning?

Santiago G: Yeah, and thinking of risk is a really good tool for incident responders because a bit of a mistake that many incident responders do in their early career, and it's something that I've done in the past, is we see an incident and we think we need to fix this right away. It has to be done right now. Everyone has to drop everything and work on this.

And it's more about Is that really something that [00:12:00] needs to be resolved right now? What is the actual risk? Is it something that can wait until tomorrow? Do we need to work on it through the night? Having that perspective and like taking a step back and thinking like what needs to be done and how it needs to be done and when it needs to be done.

It's really important because otherwise you're gonna have everyone just chasing everything. And usually you are working with limited resources. And so you need to be able to prioritize.

Ashish Rajan: Even if you do identify things that are medium risk, but still need to be resolved, the communication part that you touched on the beginning, I think it's funny because the first skill you mentioned was communication.

We're going to come back to this. I feel like on this one as well is because once you do identify the incident. To be able to communicate that effectively, but have some maybe right now, and we may not jump onto it right away. I think, I don't know how have you dealt with in your own experience when there's an incident, you realize, okay, this is not a high risk, but I still need to get this resolved because it's not great to keep it there.

Have you found a strategy that works for you to bring people together on that?

Santiago G: So on those [00:13:00] occasions, it's more about like establishing good communication and trying to make it into a negotiation. You reach out to the person and you say Hey, I need your help with this. Can you help me with that? And this is why I think that it's important.

And it's vital that you say what you think it's important because for people that are not knowledgeable in security, they might not immediately grasp the gravity of the issue unless you explain it properly to them. So you say this is why I need your help with this. And they might say, yes, I'll help you or no, I won't help you.

And then if they tell, you can say okay look, I know that you have other things that you're working on that are important. Can you let me know, like what kind of stuff that you're working on and let's see if we can figure out a priority for this. And then from then on, you can start to negotiate a bit further and say okay, like I know that what you're working is important.

So why didn't you finish what you're doing? And maybe you can help me with this after that's done. Or we can do like this workaround while you finish what you're doing. And then in a week or something, we can look at it again. And it's not so much about trying to tell people what [00:14:00] to do because you're not their manager, you're not their boss.

And that can be a bit intimidating and more about like establishing good rapport with the other person and making them feel like. You're helping them as they are helping you as well. And that's really important to get the collaboration going.

Ashish Rajan: I was in a conversation yesterday with someone and we were talking about this exact thing where nine or 10 times, I think security loses the battle by coming across too hostile that this has to be fixed right now.

It's almost like if this doesn't happen, if you don't turn on MFA right now, we're going to be basically for a bit of a shit show a thing. I'm like, and sometimes it's okay. I think, I don't, and I think to what you are saying as well, not every incident has to be raised like a P1 as well.

It it's okay sometimes to just wait for a week or two weeks sometimes. Perhaps depending on what the organization has longer now that you've transitioned a bit more onto the corporate side. I think I remember you did a blog on the whole endpoint vulnerability management at scale, which you use native capabilities for it as well.

So a, how's the transition been to, pentesting, incident response and corporate security? Do you find the skills are transferable? Cause I, I feel [00:15:00] like a lot of people sometimes start an incident response or start a pen test and don't realize they can transition to another career. Oh it's not really another career.

It's like another field within the career. That makes sense. How was the transition from like a pentester to incident responder and now looking at corporate security?

Santiago G: So I think like the transfer of skills between pentesters and incident responders is more direct. Even though you're using them like on the opposite way, because as a pentester you're trying to find all the vulnerabilities of a system and you're trying to exploit them.

And as a incident responder, you're trying to figure out what was exploited and how can you patch it or solve it, fix it so that it doesn't happen again. So it also gives you a bit of an advantage going into incident response from a a pentesting side, because it gives you a bit of an attacker mindset, which is good for your investigations.

It can also sometimes lead you astray because you can think like this is more complex than it actually is when it's something simple, but a lot of the times you are able to think like an attacker, like if I was an attacker and I [00:16:00] was in this system, what are the things that I would try to do in terms of corporate security?

It's more about not those immediate skill transfer, but it's more about the side of the communications and being able to solve for the big incidents that I would say transfer the skills just because on this side it's not as fast moving as incident response, it's more about there's a problem here.

We either identify a problem or we identify that there's a problem. Then we try to determine what the problem is and how to solve it, which is what usually happens in the big incidents. You're not just trying to fix something immediately. You need to think about all of the consequences of applying the different types of fixes and all of that.

And then the communication, because at my current role, there's a lot of I think we should be doing this. And then our job is to go and convince someone else to go and do it. So oftentimes we'll work together with like it teams and we dictate and say okay, this is our current status in [00:17:00] security.

There's this big problem that we're trying to solve and we need your help to solve it and we think you should do this. And again, establishing that negotiation, trying to like. Convince them to do this while feeling like there's a collaboration going on, because you need to be able to have some input in there.

But usually it's other people doing the actual work.

Ashish Rajan: Good way to describe it as well. And I think the projects are longer as well. It's not like you've finished in a week or something as well. And maybe, I don't know how long it took for the whole endpoint vulnerabiliity management at scale project.

If you can share a bit about. What was your, for lack of a better word, thesis, what was the goal? You wrote a whole blog about it as well. I'll link that blog in the description in the comment section as well.

Santiago G: We've always been very mindful about endpoint security and it's not where you want it to be.

I'm reading articles about some of the recent vulnerabilities that have, or maybe not necessarily recent, but some of the big vulnerabilities that have come up in endpoints, it's very concerning and knowing that there's some attacks that have [00:18:00] been successful through employee endpoints, we thought like we need to improve the security of those devices.

So how do we do that? One of the ways that we could do that was ensuring that the applications on a device are up to date. And in theory, that sounds simple because it's just about updating applications, patching, and that's easy enough. If you want to update an application, that's generally as a user, a very easy process, but how do you get a company that has thousands of endpoints thousands of employees to all keep their applications up to date?

A lot of them will think that they don't need an update because they're working on more important stuff. And the updates cannot take time away from that important work from them. How do you make sure that you are giving visibility to other teams like the incident response team about what's going on in endpoints as well as protecting everyone in the company from vulnerabilities.

The approach we decided to take was we're not going to fix every vulnerability on every device. We are going to take the applications that we [00:19:00] believe most of our employees need to be productive. We're going to ensure that those are updated and for the ones that don't, once they reach a certain threshold of how many devices it's deployed to, we're going to look into them and we're going to either try to find a way to resolve the vulnerabilities that you may have, and that can be either by blocking the application in like the worst case or putting that application into the managed application list so that it gets the updates or having some sort of adhoc updating script that you can just run to get that one thing updated. It was very important for us to be able to scale and also to minimize user disruption.

So we found that through our MDM, it's relatively easy for us to set up some applications to update automatically. So a lot of the times when an application is vulnerable, what happens is our system picks up that there's a vulnerability, reports it to the IT team that's in charge of endpoint. And generally if it's a managed [00:20:00] app, they don't need to do anything and the ticket will resolve itself because the vulnerability gets patched automatically, if it's something else, they'll look at the ticket and then investigate who has this application, what needs to do about it. Because it can be that sometimes the applications are needed by a specific team, and sometimes it's just stuff that's not really necessary.

Ashish Rajan: Cause I remember seeing a lot of these components in the image that you had in your blog, where there's EventBridge, there's Lambdas, there's a lot going on there. How would you describe, to, where you start with application from the point of application to whatever it lands in the, whether it's the JIRA or your endpoint security product or whatever, what was the thinking behind using the AWS components in the, in your architecture that you were building for this?

Santiago G: Yeah. So as part of our EDR, we also get like an add on that does vulnerability scanning on endpoints. Just having that information raw was not really good enough for us because you get one vulnerability per application, an application can have multiple vulnerabilities and [00:21:00] you get one record per vulnerability, and you can have multiple applications in a endpoint and there's like thousands of endpoints on our fleet.

So that clear quickly grows like a very big list of hundreds of thousands of vulnerabilities that we will never get to. So we needed to find a way to bring that down and make it actionable. So the way we're working through that is that we have the Lambda pull the data from our EDR, then the Lambda does an analysis.

So actually the event bridge is basically a cron job that runs every day to trigger the Lambda. And so we have the Lambda that analyzes the data, creates tickets if necessary, and then uploads the information to S3. And from S3, it's consumed by our data warehouse. And then from there we can make dashboards to ensure that like the things that we want to trend down are trending now.

Ashish Rajan: Interesting. So from your EDR solution, you're pulling data into Lambda to process it, to understand, is this a high vulnerability or is it more to identify that, okay, I have 25, 000 endpoints. So I'm [00:22:00] looking at out of the 25, 000 of your Lambda functions going through each one of them and verifying which one is currently patched or not patched.

If it's not patched, it's a Jira ticket. If it's patched, it goes into historical data to your point about the S3 bucket or whatever. Is that how you're thinking about this?

Santiago G: No. So the way it works is more that it takes like the Lambda takes all the data from the EDR and then does some aggregation. So basically let's say that, for instance, this Vulnerability in Safari.

So multiple vulnerabilities in Safari. So the Lambda would get like multiple records of the vulnerability for each endpoint that is vulnerable. And then what the Lambda does, it just spits out one line that says Safari is vulnerable and this version of Safari is vulnerable. Because you just like, the only thing that the IT team needs to know is that they need to update that version.

They don't need to know like about the details of the vulnerabilities or how many there are, as long as they can push an update for Safari for that version to all of the fleet, we're all good.

Ashish Rajan: The data lake or the snowflake thing that you guys had from an [00:23:00] S3 bucket. Why have a separate dashboard.

Is that dashboard for you guys to see how it's trending? What was the point of putting that into a data storage.

Santiago G: Yeah. So basically we're putting into S3 just to make it easy to import into Snowflake. But in Snowflake, we run the different queries where we want to analyze the data. We look at the queries in Snowflake and what we have currently working is mode relies on Snowflake for the queries, but mode is what we use for the visualization.

Just because. Even though we could have the dashboards and graphs in Snowflake, from my experience with the tools, mode gives you a bit more flexibility on how you want to display those graphs and all of that. So you can build some dashboards and graphs that can be a bit more complex and tell like a better story with the data because it's very like, as part of the project, it was very important for us to make sure that all of those, all that data that the is in the EDR, it's easy to consume, like either for IT through the tickets or for like us through [00:24:00] a set of dashboards.

So making a graph that tells you like, these are the managed applications that are out of date, and this is the severity of the vulnerabilities assigned to it. Let's you very quickly see do we need to take action on this specific thing? And for instance, like one of the dashboards that I built had managed applications that have vulnerabilities.

And throughout that, we noticed that one of the applications that we had set up for patching wasn't patching properly.

Ashish Rajan: Was that a lot of effort to learn Snowflake and Mode because I think I obviously have heard of Snowflake before, but I have not heard of Mode before. Was there a big learning curve to pick up ? Fairly easy? If you had the data points to work with.

Santiago G: So it's not not so much for me just because in my previous experience in incident response usually the platforms where the logs are stored, we were using some sort of SQL for querying. Yeah. So I was familiar with the SQL language.

So that wasn't too bad, but going from like, the Mode is not hard to work. What's hard [00:25:00] is you having the information and decided what's the best way to showcase it. That's the hard part. So you need to learn, like, how do I tell a story with a graph? At first it was a lot of trial and error. I knew some of the things that I wanted and those were easy to build.

But then I thought what other insights can I get with this data? And that was a lot of trial and error in making a graph because it sounded good, but then saying sometimes it didn't turn out so well. Or sometimes you think Oh, this is good, but not good enough. Maybe I need to add something else to it.

So it was more about like how to tell the story with the data, rather than like the tools themselves.

Ashish Rajan: Those are most of the technical questions I had, man. I think I'm got three fun questions for you. The first one is, what do you spend most time on when you're not working on solving all the incident response and end point security challenges in the world?

What do you spend most time on?

Santiago G: There's two things that I spend a lot of time on. And one is video games, play a lot of video games. And the other one is I love live music. So I tend to go to like concerts pretty often.

Ashish Rajan: What's your favorite cuisine or restaurant?

Santiago G: Italian and Japanese. And there's a restaurant that [00:26:00] is Japanese, Italian fusion.

That's my favorite. What's it called? It's called Lumi.

Ashish Rajan: Where can people find you on the internet? They want to connect with you and what's the social media stuff that people can connect with you on?

Santiago G: Just on LinkedIn. You can find them on LinkedIn from the blog post that

Ashish Rajan: I can leave there in the podcast episode as well.

So people can definitely find you there as well. Appreciate you coming on the show, man. Thank you so much for. Sharing and what instant response. And I guess you're the work that you did for vulnerability management as well at scale. But I look forward to having you again and maybe having a few more conversations with you, man.

Santiago G: Yeah, definitely. Thank you so much for having me.

Ashish Rajan: Thank you for listening or watching this episode of Cloud Security Podcast. We have been ready for the past five years, so I'm sure we haven't covered everything cloud security yet. If there's a particular cloud security topic that we can cover for you in an interview format on Cloud Security Podcast, or make a training video on tutorials on Cloud Security Bootcamp definitely reach out to us on info at cloud security podcast.tv

By the way, if you're interested in AI and cybersecurity, as many cybersecurity leaders are, you might be interested in our sister podcast called AI cybersecurity podcast, which I run with former CSO of Robin Hood, Caleb [00:27:00] Sima, where we talk about everything AI and cybersecurity. How can organizations deal with cybersecurity on AI systems, AI platforms, whatever AI has to bring next as an evolution of ChatGPT, and everything else continues.

If you have any other suggestions, definitely drop them on info at CloudSecurityPodcast. tv. I'll drop that in the description and the show notes as well. So you can reach out to us easily. Otherwise, I will see you in the next episode. Peace.

No items found.