Kubernetes Security around the Holidays

For many organizations, the holidays bring a massive peak in activity. Retailers famously look to this time of year to finish in the black, expecting their largest wave of foot traffic and online orders during the 4-5 weeks between Thanksgiving and Christmas. Nonprofit organizations see a similar pattern, with December almost universally being their biggest month for donations. All of this activity ripples out to the many suppliers and service providers supporting those sectors (whether directly or indirectly), too. The anticipation of such big bumps in traffic poses some obvious questions about the availability of services, but they also raise concerns for security teams. If you are on the security team for an organization that runs its services on Kubernetes, here are three such concerns to be aware of during the holidays.

Availability is a Security Concern, Too

Receiving a tsunami of traffic puts pressure on IT systems and teams. Websites, cash registers, inventory management systems, customer databases, email marketing systems, and even accounting software are all going to be running at peak capacity during the lead-up to the holidays. Any of those systems crashing during the month of December could be a recipe for business disaster. Luckily, Kubernetes provides the means to horizontally scale these systems and accommodate extra traffic. But during a peak month like December, teams may also need to scale out the underlying platform itself, adding more nodes or increasing their footprint in certain availability zones.

It has become the trend to concentrate efforts at scaling and maintaining the availability of services into a Site Reliability Engineering (SRE) team. But it's worth remembering that availability is also a key security objective (the “A” in the “C-I-A” triad often used to define security). As a result, security should be a consideration while you scale, especially as you’re adding nodes and scaling infrastructure. Ideally, you can ensure a repeatable, secure architecture by using Infrastructure-as-Code (IaC) to define your expected configurations (including security settings) for both the infrastructure your clusters run on and the configuration of your Kubernetes clusters themselves. This allows your security to scale seamlessly with your cluster. However, when the pressure is on, best practices are not always followed.

Configuration Confusion

With so much on the line, there will be pressure to do it live and make configuration changes in your clusters in real time as a response to whatever wheels are coming off the proverbial wagon. In a perfect world, your work leading up to the holidays will have established a clear set of practices and resources that can be used together to maintain your cluster configurations the right way. But (a) we’re already well into the holiday season, so we fight with the army we have, and (b) perfect worlds don’t exist. Someone will eventually succumb to the pressure and hand-spin some changes in an effort at heroics, and when they do the business will stand and applaud. You’ll probably be clapping, too.

The problem is, ad hoc changes are very difficult to replicate exactly. This often results in a situation in which you either (a) need that change somewhere else, and can’t quite figure out how to do it again (the problem IaC originally aimed to solve) or (b) end up facing a new bug and aren’t 100% sure where it came from (the anathema of SRE teams).

This also has ramifications for security. As we mentioned, IaC ensures that your security configurations scale with your cluster. Live changes are unlikely to achieve the same. As a result, your level of assurance about the security posture of your cluster decreases every time a hand-spun change is made.

Since we don’t live in a perfect world, what you are going to need is a way of monitoring the state of your cluster and measuring it against your security expectations. In other words, you need a way to get visibility into any place where on-the-fly changes have created a configuration that is out of alignment with your security baselines.

Andrew Josephides, KSOC's Director of Security Research, puts it this way:

"When you have people making manual changes in your cluster, you need tools to respond by either fixing the fix— detecting the configurations that are out of step and restoring your baselines— or by implementing mitigating controls. By leveraging solutions like KSOC, you can keep your cluster safe until a more stable solution can be implemented."

(Logs) Volume to I I

The solution we’ve just described—a tool like KSOC that can monitor cluster state and compare it to your security expectations—is going to need to face off against another implication of the super-sized holiday rush: a massive increase in logging events. As activity across your cluster increases, the number of calls to the Kubernetes API, amount of activity by service accounts, and traffic between pods and services on the cluster are all going to dramatically rise. From a security perspective, we hope that the overwhelming majority of that stems from legitimate events due to increased business activity. But undoubtedly, with systems under strain, there will be increased failure events, including new errors that you’ve never seen before. There may also be increased attempts at exploitation (hackers trying to ride the holiday wave or do their own holiday shopping of sorts). And then there’s likely to be detections that arise from the hand-spun changes we just discussed, potentially including users doing things they’ve never done before in response to those errors no one has ever seen.

In that sea of logs, you need to be able to sort the signal from the noise. You need a way to triage and elevate the most important misconfigurations arising from hand-spun changes and the most important indicators of a potential compromise or breach of your cluster. This requires tooling that understands Kubernetes natively and understands what is normal for your cluster, including your normal security baselines.

TL;DR

The holiday season is an important time for your business, and that amps up the pressure: business leaders expect all hands to be on deck, keeping your systems up and running to serve customers and perform essential operations. In other words, the watchword of the holiday season is availability. This drive for availability may even cut against your best practices, like using well defined IaC modules to maintain security baselines. To maintain assurance about your organization’s security, you need tools that can spot changes in your cluster that stray from your security expectations or detect indicators of a hacker lurking in the wave of shoppers, trying to find a way in. You need tools that can speak native Kubernetes.

The KSOC platform provides customers visibility into their clusters, context around security issues, and flexibility in how to respond to them. For more information on KSOC’s cloud native Kubernetes security platform, schedule a demo.