At this point, three keys terms should be clarified: 1) Incidents are any deviation from the standard operations of a system that could, or does, cause a service interruption; 2) A problem is the condition of having multiple similar incidents, and 3) a known error is the identified root cause of a problem.
Essentially, from ITIL we understand that there are two management forces at work. First, there is incident management, which is concerned with restoring service as quickly as possible, often using workarounds that address known errors. Second, problem management is geared toward both proactively and reactively addressing the underlying causal factors of incidents. Readers might want to review the ITIL Service Support volume's chapter on Problem Management to gain a better understanding.
As complexity increases, the percentage of total system understanding held by any one IT person will decrease. This is because the level of expertise to build complex systems demands the involvement of multiple parties. There just is not an alternative realistic option. Whether developed entirely in-house, out-sourced or some combination thereof, there are multiple people, even multiple organizations, involved.
Correspondingly, when incidents or problems occur, root cause analysis demands review by the parties with the appropriate expertise. For example, to build a large mission-critical clustered server, there will be involvement from the vendor(s) of the hardware, the software vendors, internal software development, IT engineering/release management teams, security, operations and so on.
Problem Review Boards (PRB)
In the same manner that there are change advisory boards (CABs) for updates to production systems, there must be a parallel group(s) reviewing incidents to determine trends, problem identification and ultimately root cause and mitigation.
Depending on the complexity of the organization, there may be one PRB overall or a PRB per system. For that matter, some organizations may be so small or simple that, for whatever reason, they do not need PRBs. In those cases, it is recommended that they still understand the ITIL Problem Resolution processes and adopt best practices into their organizations. For organizations with complex systems, regardless of size, the implementation of PRBs need to be seriously considered.
The goal of the PRB is to govern problem management reactively and proactively. This is done through analyzing incidents as they happen, reviewing historic trend data and staying abreast of current industry news and vendor updates. For example, a switch may not have failed yet, but your PRB may know of an operating system bug that had been identified and eliminated by another organization using the same switch. Hence, it would be advisable to assess risks and determine the best means to modify the switch and mitigate incident risks proactively.
Continue on to find out how to structure a PRB....