Ever tried solving DNS security across a multi-cloud, multi-cluster Kubernetes setup? In this episode recorded live at KubeCon, Ashish chats with Nimisha Mehta and Alvaro Aleman from Confluent's Kubernetes Platform Team.Together, they break down the complex journey of migrating to Cilium from default CNI plugins across Azure AKS, AWS EKS, and Google GKE. You’ll hear:
- How Confluent manages Kubernetes clusters across cloud providers.
- Real-world issues encountered during DNS security migration.
- Deep dives into cloud-specific quirks with Azure’s overlay mode, GKE’s Cilium integration, and AWS’s IP routing limitations.
- Race conditions, IP tables, reverse path filters, and practical workarounds.
- Lessons they’d share for any platform team planning a similar move.
Questions asked:
00:00 Introduction
01:55 A bit about Alvaro
02:41 A bit about Nimisha
03:11 About their Kubecon NA talk
03:51 The Cilium use case
05:16 Using Kubernetes Native tools in all 3 cloud providers
11:41 Lessons learnt from the project
Alvaro Aleman: [00:00:00] AWS clusters we use an internet gateway. And what that means is in order to have internet connectivity, you need to reach this internet gateway from the primary IP address of the primary network interface. The problem with network interfaces on AWS is they have a limit in terms of how many IP addresses you can add to them.
And Cilium basically allocate additional IP addresses for each part, but at some point it has to allocate a second interface because the first one is full.
Ashish Rajan: Have you ever tried solving a DNS security problem for a multi cloud, multi cluster Kubernetes platform across Azure, GCP and AWS? We had the pleasure of talking to Nimisha and Alvaro from Confluent. io, who had a talk here at KubeCon about the different challenges they had when they were trying to solve DNS security with AKS. EKS, GKE and all the other acronyms you can think of in a large scale Kubernetes cluster deployment that's migrated and is being made secure. A lot of the conversations were around different kinds of conditions using Cilium, which is a network security open source project from CNCF.
[00:01:00] So a lot of the conversation was really valuable for me to at least understand what some of the challenges can be, and I think it would be valuable for you as a cloud security person as well, to understand how Kubernetes platform engineers are being able to solve some of the network security challenges in a fairly large, complex Kubernetes platform.
If you're watching or listening a cloud security episode for a second or third time, a great way to support us would be if this is on Apple Podcasts or Spotify. I would really appreciate if you drop us a follow or subscribe. If you're watching this on YouTube or LinkedIn, instead, give us a follow subscribe.
And it means a lot to us. It helps us know that you are enjoying what you're doing and want to support the work you're doing as well. I really appreciate all the support that you have already shown us by leaving us reviews. I would really appreciate if you can give us a subscribe and follow as well on the platform.
I hope you enjoyed this episode. I'll see you soon. Peace. Hello and welcome to another episode of Cloud Security Podcast at KubeCon North America. I've got two interesting folks with me. Could you mind introducing yourself and what you do?
Alvaro Aleman: Yes, my name is Alvaro. I work at a company called Confluent as a software engineer on what's [00:02:00] internally called the Kubernetes platform team.
I'm working with Kubernetes in some shape or form since about 2018. Prior to that, I did a lot like, I don't know, configuration management, and then I eventually went to Kubernetes and it's just so much more elegant as an approach and Confluent basically offers products related to data streaming, data stream processing and we offer them as managed version in all of AWS, GCP and Azure. And one interesting thing, this details is that we have to be running these in the same region as our customers. And this in turn means that we have to manage a ton of infrastructure. And since it's across multiple clouds, we also trying to abstract the clouds themselves away from our internal teams as much as we can, which is a great fit for us.
Nimisha Mehta: Yeah, my name is Nimisha Mehta. I'm also on the same team as Alvaro, the Kubernetes platform team. And one of the things we do, as you mentioned, was like build an abstraction layer over the cloud providers so that the users in our company have a unified experience. And yeah, all of conference products are ultimately deployed on Kubernetes.
So it's a lot of [00:03:00] infrastructure. A lot of critical stuff that we have to manage. And personally, I've been I guess somewhat involved with Kubernetes for the last five years or so. Been in the cloud space for, yeah, about five years. And the whole time I've been using Kubernetes.
Ashish Rajan: Awesome. And maybe a good one to start with you guys on the talk that you guys had yesterday.
Maybe we'll start with the title of the talk. And we'll talk about what was the motivation to go behind solving that challenge that you spoke about in your talk.
Alvaro Aleman: But at the end of the day, what it's about is so Confluent uses the managed Kubernetes offering of all of the three major clouds.
And originally we used the default CNI plugin that all of them had. And at some point we got a number of requirements that these default CNI plugins could not fulfill. So at that point we ended up looking into what other options exist that can fulfill these and that we can use everywhere. And after some evaluation, we ended up with Cilium as basically being the best choice for us.
Ashish Rajan: And I guess to give some context, could you share a bit about Cilium and what use case was it helping you solve?
Alvaro Aleman: Yeah, Cilium is a CNI plugin, which basically means, in the context of something like [00:04:00] Kubernetes, it enables network connectivity. And it offers additional features for example you can configure network policies.
So what port can communicate with other port or other addresses. This is both possible based on ports, but you can also have these DNS based network policies, which basically means that you allow or deny list a certain host names to be used as a target. Transparent encryption is another feature that we're using.
It also has a Q proxy replacement, which uses eBPF, which is much more efficient than the. original IP tables based version. It basically has a large number of features and I don't think I could know it out of them.
Ashish Rajan: Awesome. I'll ask you the next question then. Some of the issues, obviously, before we started recording, we were talking about the three cloud service products you guys mentioned.
I would love to know what kind of issues that you were coming across as you were using Kubernetes native as well as Cilium to solve the problems. If you want to start with Amazon or whichever your favorite one is.
Nimisha Mehta: My favorite one. I can start with Azure because I spoke about that yesterday.
I have some [00:05:00] more detail. So the challenge actually was migrating because once you use a cloud provider CNI, sometimes there's some lock in that happens. And in Azure, the migration was particularly difficult because it involved like multiple stages four stages specifically. And one thing about Cilium is you can you can use it in two modes.
One is using an overlay mode and one is the native routing mode. And we wanted to use native routing mode. Overlay mode means that Cilium will establish an overlay network with the CIDR that you give it, but we didn't want that for reasons related to some of our product requirements, but in Azure, if you do the migration, you always end up in overlay mode.
So that was definitely something that we found a little difficult to work around. There's one more issue with the migration itself, which is, if you want to use a configurable Cilium installation on Azure, you have to use the enterprise version, you cannot use the open source version, so that's another thing that may deter folks, [00:06:00] because Azure CNI powered by Cilium, that's the official name, is basically just a switch on the AKS cluster where you say, I want Cilium or I don't want Cilium, and they have a preset configuration that they just install on the cluster.
But yeah, if you want to en enable like certain features of Cilium, like for example, transparent encryption or something else, then you can't really configure that until you use the enterprise version from the Azure marketplace. So those are just some things to keep in mind. But yeah,
Ashish Rajan: if that's your favorite, that's your favorite.
What's your favorite? So which favorite of the issues, , of the issues of the cloud providers which you can pick whichever one you want, man.
Alvaro Aleman: Google Cloud was kind of interesting. So I guess in terms of pure management, GKE is the best you can get. And GKE actually interestingly has a similar story as Azure in terms of they offer some managed version of Cilium.
It's called data plane V2. It has similar issues in the sense that it doesn't offer too many configuration knobs. And for us in particular, we need to migrate existing [00:07:00] infrastructure over. We cannot just recreate everything, which is why this isn't an option for us.
Ashish Rajan: So that was a challenge in GKE, that it's the same challenge existed in GKE.
Is that what you're saying? I think in terms of like you, the native option was working, or did you have the same challenge as migration in AKS?
Alvaro Aleman: The native option doesn't work for us because it doesn't allow a migration at all. You have to create a new Cluster.
Ashish Rajan: Yeah. And what were the issues that you specifically came in GKE?
I think you had a couple that you were thinking about.
Alvaro Aleman: Yeah. One we have. So in order to explain that, I probably have to start a bit with how we did this migration. So basically what we did is that we wanted new nodes to come up with Cilium and existing nodes between whatever they already have.
And the basic idea behind how we did this is that we labelled these new nodes and then basically used a selector on the Cilium daemon set so it only runs on these new nodes. And then the idea is that whatever is the default does not run on these new nodes through similar means using a selector.
The problem in GKE specifically is this is baked into the image the nodes come up with, so you cannot actually disable this and that in turn [00:08:00] then means that it's possible that pods get scheduled to a node before Cilium is up, and this will just work, unfortunately but then what happens is that they get an IP address, but Cilium doesn't know anything about this, which can lead to Cilium later handing out the same IP address to another pod, at which point these other pods is not going to have working network connectivity.
So yeah, that was one of the issues what we mostly ended up with is basically a component that figures out that this happened and then just deletes this part that got up before Cilium did.
Ashish Rajan: Fair. And Amazon, the most favorite for both of you, clearly, that we kept for the last for and not the least. Who wants to cover Amazon?
Nimisha Mehta: So detailed. I think, the best, it
Alvaro Aleman: does. Okay. I tried to explain it and I'm afraid it will take me probably three minutes or something. So basically in some of our Azure clusters, we use an internet gateway.
Nimisha Mehta: AWS clusters.
Alvaro Aleman: We use an internet gateway. And what that means is in order to have internet connectivity, you need to reach this internet gateway from the primary IP address of the primary network [00:09:00] interface.
The problem with network interfaces on AWS is they have a limit in terms of how many IP addresses you can add to them. And Cilium basically allocates additional IP addresses for each pod, but at some point it has to allocate a second interface because the first one is full and then to enable internet connectivity for pods that have an IP address on a secondary interface, it nuts and the traffic and routes it to the primary interface.
So that works fine. And the problem is that the rule set it sets up for this is basically if the destination IP address is outside of the CIDR of this VPC where the cluster runs in, then this voting rule applies to use the primary interface. And we have a use case where we want to reach ports from a peered VPC.
And if the port now has an IP address on the secondary interface, the traffic goes into the secondary interface. But the routing will specify that it leaves through the primary one. And then both the Linux has this thing called the reverse path filter which will drop it. As well as AWS itself also to check that the source IP address is actually on [00:10:00] the interface.
So it just doesn't work.
So what we ended up doing is we set up a node pod service and then the workload in this peered VPC will just use the node's IP and the node port to reach the pods.
And that generally works. It's just a bit more complex to set up and you can't really get a good health signal because you're reaching the node and the node might have multiple pods and then you randomly hit one pod and then if only some of the ports have an issue, you can't really tell from the other end.
So that's a bit annoying. Oh, wow.
Ashish Rajan: That definitely sounds a bit annoying. Oh, sorry. Azure race condition. You wanted to add something to that as well.
Nimisha Mehta: On Azure, we had this issue with the race conditions because in some cases the connectivity would just break and restarting the Cilium pods fixed it.
And it was traced down to incorrect IP table rule ordering. Not even what IP table rules, but the ordering itself. And basically boiling down to the kube-proxy IP table rules and the Cilium IP table rules. For everything to work correctly, the Cilium IP table rules have to be first. And then the kube-proxy IP table rules have to come after that.
And all these components pre [00:11:00] pend the IP table rules. So whoever starts last wins. So Cilium has to start after kube-proxy, but what was happening in some cases was that Cilium was starting before kube-proxy because on Azure for reliability purposes, they actually override the kube API server address, which is the API server address of the control plane, basically that all pods will reach out to, to host name rather than a virtual IP.
Which is the case on most clouds. So yeah, the way we solved this was basically always ensure that the IP table rules are in the correct order. If they're not delete them and make sure they're in the correct order.
Ashish Rajan: Maybe now that you guys covered all three cloud providers, if you had to start over this project.
I don't think you guys want to, but if you had to, is what would you do differently out of curiosity? Now, I guess I don't know which direction you took in the first steps were, but if you were on the hindsight for anyone else who's probably listening or watching and who wants to go through a [00:12:00] similar journey, they've figured out, okay, Cilium is the product that I'm using.
How would you do this? Obviously there are some exceptional use cases or exceptions to the use cases that you called out race condition and everything else. For people who are probably on the other end of this, who wants to do something similar. What would you recommend as a good place to start and probably go through this, that you've gone through the pain for others taking one bullet for the team.
Take a moment if you want, just threw a golf ball at you.
Nimisha Mehta: Nothing, what we did was perfect.
Ashish Rajan: That exactly worked. So you start with EKS, move with Azure, then GKE.
Nimisha Mehta: I think the ordering doesn't really matter. What I will say is that I think, very complex project, networking in general can be very complex.
And when things don't work, you have to learn how it's implemented under the hood. The documentation is great for Cilium. Making sure you read all of that. Making sure you keep an eye on the GitHub repo as well, because sometimes people will report issues in the newer versions, which you might be facing [00:13:00] too.
That helps. And then I think Isovalent has helped us a lot too, just in terms of support. And they have really deep expertise when it comes to, how things work like at the lowest level. Yeah, that has helped us a lot.
Alvaro Aleman: I don't think there's too much we could have done differently. So the reality is just if you have a big project like this, some issues you're only going to find in production.
One thing that might have helped a bit is if we actually would have used kube-proxy replacement the whole time. The reason we didn't do it is that there just wasn't the motivation for us in the sense that it didn't have features we needed because this whole performance thing, it's not as much of a practical issue for us.
But using it has the side effect that certain classes of issues like this whole race condition thing, they just cannot happen. Fair.
Ashish Rajan: No, thank you for sharing that. I guess I will put a link for the talk on the show notes as well. But is there anything you're looking forward to in KubeCon?
Alvaro Aleman: There's a number of sessions for the various social interest groups.
And we're looking forward to it.
Ashish Rajan: Oh, any particular interest groups?
Alvaro Aleman: SIG networking is later today. That's [00:14:00] what I'm that's one thing I'd like to just attend all of them because there's so much going on. And if you like have your normal day job, it's hard to keep up. And these sessions are super useful to just know, okay, what's new, what's going on, what are hot topics.
Nimisha Mehta: Yeah, I'm definitely interested in the SIG network meeting too, but I think so many talks happen simultaneously. It's very hard to catch all of them. So it's very difficult to choose, to be honest. A lot of talks also seem to be about AI on Kubernetes and I just want to see what the hype is about.
Ashish Rajan: Oh, fair.
We all want to know what the AI hype is all about as well, but I'll put a link for, LinkedIn or Twitter somewhere as well. But thank you so much for coming on the show. I really appreciate this. Thank you. Thank you so much. Thank you for listening or watching this episode of Cloud Security Podcast.
We have been running for the past five years, so I'm sure we haven't covered everything cloud security yet. And if there's a particular cloud security topic that we can cover for you in an interview format on Cloud Security Podcast, or make a training video on tutorials on Cloud Security Bootcamp, definitely reach out to us on info at CloudSecurityPodcast. tv. By the way, if you're interested in AI and [00:15:00] cybersecurity, as many cybersecurity leaders are, you might be interested in our sister podcast, which I run with former CSO of Robinhood, Caleb Sima, where we talk about everything AI and cybersecurity. How can organizations deal with cybersecurity on AI systems, AI platforms, whatever AI has to bring next as an evolution of ChatGPT and everything else continues.
If you have any other suggestions, definitely drop them on info at CloudSecurityPodcast. tv. I'll drop that in the description and the show notes as well so you can reach out to us easily. Otherwise, I will see you in the next episode. Peace.