Best practices to quickly recover from a cyberattack and get back on track
No matter how good the defense of your organization is, there is a risk that at some point you may be breached by an attacker who will manage to get into your environment and cause issues.
As in the army, a good commander is always prepared for the worst scenario. A good strategy is not only about how to win, but also knowing what to do when you are in adverse situation – how to withdraw, regroup and get back to battle as quicky and efficiently as possible.
The same should apply to your organization. It is always assumed that your business will be impacted by cybersecurity incident at some point. You need to know what actions need to be applied and by whom to help your company recover and be able to operate again.
Eight steps to preparing a robust defense strategy
This article summarizes some of the best practices that can be applied when preparing your recovery plans.
Best practice #1
Define your critical assets and prioritize the defense and recovery
When working with numerous organizations on resilience, a common observation emerges. Business owners all want to protect themselves against threats, but the level of awareness about what they really want to protect and how to protect varies a lot.
As a business owner, before building any defense strategy, applying any control, or even building any recovery plan, you need to answer some fundamental (yet not easy) questions:
– What is the business priority of my organization?
– What are the most important assets to support the business?
– For how long can these assets be nonfunctional (if at all)? What is the recovery time objective (RTO), recovery point objective (RPO), and maximum tolerable outage (MTO) for these assets?
These questions seem to be very simple, yet they are crucial for building any defense strategy and recovery plans. It is unlikely for an organization to protect all its assets in terms of technical feasibility and cost justifications. Having a list of critical assets and their related business activities will help us formulate holistic, proportionate, and cost-effective recovery plans. Although the recovery planning exercise seems to be straightforward, it often leads to a heated debate within the organization about what is crucial to the business.
The general recommendation is to start the discussion by asking customers if they know what the critical assets to the business are. Getting people to focus on basics and having an objective discussion are key to defining a cost-effective defense strategy and recovery plan.
It is important to understand that the business defines target defense and recovery outcomes with support from the security team, not the other way round. It is the business that ultimately brings the revenue to the organization and knows what is best when it comes to prioritization and securing the company.
Best practice #2
Maintain documentation for the components and configurations of your environment
Although it may sound cliché, an environment that is well known and up to date is not only more resilient, but also easier to recover.
We may not be able to install some basic cybersecurity solutions if we don’t know what devices we have in the environment or if they are outdated for a long time. That’s why it’s crucial to keep an accurate inventory of the assets and their configuration as well as patch the systems regularly.
This may save us a lot of time and stress when implementing recovery actions under the pressure from the business.
Another important point to note: If your business continuity plan (BCP) includes an alternative site, the environment documentation should be prepared and updated on a regular basis as well.
Best practice #3
Update your recovery plans
As the threat landscape is evolving and constantly changing, any significant impact should be assessed carefully against the response strategy and recovery plans. It is recommended to schedule regular reviews and updates of the baselined plans.
However, if there are critical events which are likely to happen and have a high impact, it is important to call for an urgent review and update to the recovery plans without further delay. Examples of these may be changes in the business direction, priority, risk appetite/tolerance, new business products/services developed, and merges and acquisitions, as well as carveouts.
For roles & responsibilities, it is very important to keep an up-to-date register of contact people, as the personnel changes happen regularly, and the register should be always updated too.
Best practice #4
Test your recovery regularly
Recovery plans cannot just be prepared and put on the shelf until the attack happens. If that is the case, we may find out (a bit too late) that the plan is not applicable, or some mitigations are no longer working. Therefore, it is crucial to test your plans regularly.
Which brings us to our next question: How should plans be tested?
Testing of recovery plans varies and heavily depends on the individual organization, its profile and budget. Recovery testing means committing time and resources, which is why some companies are reluctant to carry them out regularly.
However, if recovery plans are not properly tested, you don’t know for sure whether the recovery actions are in place and capable of giving you the protection you need. Nobody would feel comfortable working in the office where the evacuation plans are not tested and the employees don’t know what to do. It is difficult to imagine what would happen if the fire escape doesn’t work and you only find out when there is a fire. Same thing applies to cybersecurity – we need to know for sure if we will be able to recover quickly from a cybersecurity incident and not endanger the existence of our organization.
What kinds of testing are available, and which one is the most suitable for my business?
There are 2 basic types of tests to consider with many subcategories belonging to one of these two types:
- Tabletop exercises
This is a simple meeting to discuss the scenario step-by-step, find the gaps and make sure everyone knows their roles in the process. Everything happens only theoretically, so there is no risk of actual disruption to the business. In this case, there may be some aspects that they will not think of until the actual disruption happens.
- Partial/full operational tests
These tests are an actual simulation of a disruption and help to determine if the theoretical assumptions were correct. Here, we check if we can keep RTO, RPO and MTO as agreed. These tests need some budget as they consume time and resources. They also need to be made only if we are sure we are not going to disrupt the business. Otherwise, they will cause more harm than benefit.
It’s advisable to do the paper tests or tabletop exercises first to identify as many gaps as possible and close them before the actual operational tests happen.
Best practice #5
Measure, document and improve
How should we evaluate the readiness of our recovery?
Taking the prescribed steps to test the recovery plan is not enough. We need to know if the recovery actions are effective and implemented quickly enough. For that purpose, we need to develop key performance indicators (KPIs) for the measurements. The KPIs will be based on RTO, RPO and MTO that we defined earlier. This is why the discovery and assessment phases of recovery planning phases are so important.
All the results should be documented, no matter if the KPIs are met or not. Based on that, if needed, an improvement plan can be developed and applied. The most important part of the review is drawing the proper conclusions with evidences and having confidence of the recovery plans.
Best practice #6
Do not put all your eggs in one basket
Funny as it may sound, this rule is still not always observed. We have encountered potential customers who had been impacted by ransomware attacks but their backups were kept in the same location as their primary data. Needless to say, you can imagine what happened to the backups in that case.
There are plenty of options where we can keep the backup of our data; it may be in the cloud or locally. However, we need to make sure that they are separated from the organization’s network and will be still secured, even if the company environment is compromised.
Apart from traditional backups, currently, some security solutions may be very helpful in restoring the configurations. Consider some of the endpoint detection and response solutions that have a copy and restore option, that makes a note of every change in configuration and, if needed, can quickly help to revert these changes.
Best practice #7
Secure the necessary resources
Recovery, like the rest of the incident response process, takes considerable time, workload and costs. Even if the organization has plans in place, it often does not have additional budget for actual implementation in a live environment.
We recommend good recovery plans that do not only contain recovery actions, but also include the roles and responsibilities of the actions, resources requirements for the execution and annual budget provisions for testing reviews and updating the recovery plans. The size of the budget may be derived based on past incidents or by comparing similar incidents that happened in the same industry. Although management is often reluctant to secure a separate budget for incident response and recovery, it is necessary to secure the required budget and resources to execute the necessary actions when an incident happens.
It may be a good idea to outsource at least part of the team responsible for response and recovery activities. In that case, the organization hands over the activities to the experts, at the same time leveraging the costs and resources they do not have to hire on their own.
Best practice #8
Keep up the recovery pace until full recovery
It is a widely recognized that the pace of recovery often gets slower as the time of incident duration progresses. Why is it so?
During the initial recovery of a cybersecurity incident, the organization is usually very stressed about the attack and its impact. In this phase, only time and safety matter. All teams are mobilized and focus their efforts to mitigate the incident. Once the initial minimum service is brought back to operations, the organization stress level will be lower and the management will start shifting their focus to the budget of the recovery plans. The recovery resources usually get tired over time and need rest or take free time to recuperate from the overtime during the initial phase of the incident.
With all these factors and the partial recovery completed, the management may take a decision to skip some of the recovery actions from the response and recovery plan that do not seem crucial or may extend the time to do the recovery. As the business was already hit, what worse may happen at this point? Lightning does not strike the same twice, right?
This false sense of security after an initial recovery needs be avoided as it may jeopardize the success of a full recovery. It is important for the recovery team to keep up the pace in executing the planned actions and maintain the communication channels as well as not to skip any recommendations from the security team. We may not be sure whether we are secured until all the measures have been applied and the investigation is over.
To summarize, recovery is a process that can be planned and tested beforehand. Although we are unable to prepare for all possible recovery scenarios, it is better to have a proportionate and cost-effective recovery plans in place.
The agreed recovery plan also needs clear roles and responsibilities defined, and comprehensive descriptions for the recovery actions to return to full business operations in a swift and safe manner.
Having agreed a recovery plan alone is not enough, the management also needs to commit to providing adequate resources and budgetary provisions to ensure regular testing and full execution of a recovery plan.
About the author
Global Head of Endpoint Threat Management
Gabriela Gorzycka is the Global Head of Endpoint Threat Management at Eviden.
In her role, Gabriela is mostly focused on strategy and management of delivery of cybersecurity services to over 150 customers out of numerous locations worldwide.
Her main areas of focus are endpoint protection, data protection, mobile and IoT security services, and among the others are Endpoint Detection and Response, Encryption or Data Loss Prevention. She has been a member of Atos’ Experts community, focusing her research and expertise on IOT and OT security.
She also has a strong process background (ITIL V3 Expert) and holds ISO certifications in Service Management and 20000 and 27001.