Download the authoritative guide: Cloud Computing 2019: Using the Cloud for Competitive Advantage
When I visit organizations, I'm often asked what should be logged, how long the data should be retained, and how it should be reviewed. These are interesting questions, as logging is very important for security and process improvement reasons.
A good starting point is the SP800-14 NIST special publication on securing computer systems. It covers audit trails from a high-level perspective. The intent of this article is to highlight and explore some of the audit log concepts laid out in the publication.
Before we begin, there is a very important caveat to bear in mind - log data doesn’t help if it isn’t reviewed! As today’s systems are often generating thousands of lines of data per day, if not per hour, manual review isn’t very realistic. Log centralization and analysis tools should be used to automatically alert on certain conditions as well as help facilitate meaningful log review. Let’s focus now on the details.
Inevitably, someone asks why event data should be logged on a given system. Essentially there are four categories of reasons:
- Accountability – Log data can identify what accounts are associated with certain events. This information then can be used to highlight where training and/or disciplinary actions are needed.
- Reconstruction – Log data can be reviewed chronologically to determine what was happening both before and during an event. For this to happen, the accuracy and coordination of system clocks are critical. To accurately trace activity, clocks need to be regularly synchronized to a central source to ensure that the date/time stamps are in synch.
- Intrusion Detection – Unusual or unauthorized events can be detected through the review of log data, assuming that the correct data is being logged and reviewed. The definition of what constitutes unusual activity varies, but can include failed login attempts, login attempts outside of designated schedules, locked accounts, port sweeps, network activity levels, memory utilization, key file/data access, etc.
- Problem Detection– In the same way that log data can be used to identify security events, it can be used to identify problems that need to be addressed. For example, investigating causal factors of failed jobs, resource utilization, trending and so on.
What to Log?
Essentially, for each system monitored and likely event condition there must be enough data logged for determinations to be made. At a minimum, you need to be able to answer the standard who, what and when questions.
The data logged must be retained long enough to answer questions, but not indefinitely. Storage space costs money and at a certain point, depending on the data, the cost of storage is greater than the probable value of the log data.
The same can be said for costs associated with performance degradation that the log analysis tools suffer if the data sets are simply allowed to grow indefinitely.
Security of Logs
For the log data to be useful, it must be secured from unauthorized access and integrity problems. This means there should be proper segregation of duties between those who administer system/network accounts and those who can access the log data.
The idea is to not have someone who can do both or else the risk, real or perceived, is that an account can be created for malicious purposes, activity performed, the account deleted and then the logs altered to not show what happened. Bottom-line, access to the logs must be restricted to ensure their integrity. This necessitates access controls as well as the use of hardened systems.
Consideration must be given to the location of the logs as well – moving logs to a central spot or at least off the sample platform can give added security in the event that a given platform fails or is compromised. In other words, if system X has catastrophic failure and the log data is on X, then the most recent log data may be lost. However, if X’s data is stored on Y, then if X fails, the log data isn’t lost and can be immediately available for analysis. This can apply to hosts within a data center as well as across data centers when geographic redundancy is viewed as important.
Pulling it All Together
The trick is to understand what will be logged for each system. Log review is a control put in place to mitigate risks to an acceptable level. The intent is to only log what is necessary and to be able to ensure that management agrees, which means talking to each system’s stakeholders. Be sure to involve IT operations, security, end-user support, the business and the legal department.
Work with the stakeholders and populate a matrix wherein each system is listed and then details are spelled out in terms of: what data must be logged for security and operational considerations, how long it will be retained, how it will be destroyed, who should have access, who will be responsible to review it, how often it will be reviewed and how the review will be evidenced. The latter is from a compliance perspective – if log reviews are a required control, how can they be evidenced to auditors?
Finally, be sure to get senior management to formally approve the matrix, associated policies and procedures. The idea is to be able to attest both that reviews are happening and that senior management agrees with the activity being performed.
Audit logs are beneficial to have for a number of reasons. To be effective, IT must understand log requirements for each system, then document what will be logged for each system and get management’s approval. This will reduce ambiguity over the details of logging and facilitate proper management.
Note: A sample log matrix template will be placed on the author’s website at http://www.spaffordconsulting.com/articles.html next to the link to this article.