Study Unit IT3 - System Defects
Copyright Notice: This material was written and published in Wales by
Derek J. Smith (Chartered Engineer). It forms part of a multifile e-learning
resource, and subject only to acknowledging Derek J. Smith's rights under
international copyright law to be identified as author may be freely downloaded
and printed off in single complete copies solely for the purposes of private
study and/or review. Commercial exploitation rights are reserved. The remote
hyperlinks have been selected for the academic appropriacy of their contents;
they were free of offensive and litigious content when selected, and will be
periodically checked to have remained so. Copyright © 2001-2004, Derek J. Smith (Chartered
Engineer).
|
|
First published [v1.0] 10:35 GMT 6th
February 2001; this version [v1.2 - new link] dated 08:00 BST 19th April 2004
This is the third of nine post-foundation study units making up the INFORMATICS e-learning resource published and supported by Derek J. Smith (Chartered Engineer). For further information, please e-mail me.
Unit Aims and Outcomes: There is no point in changing systems unless and until they are known to be faulty. This study unit therefore looks at the how and why of capturing and analysing system defect data. When you have completed it, you will be able to deploy with enhanced confidence and accuracy the specific skills and vocabulary listed below:
|
Specific Skills |
Vocabulary |
|
1. Treat problem solving as a cyclical behaviour requiring active management throughout |
iteration; iterative
process; post-implementation report; problem solving wheel |
|
2. Design and maintain a helpdesk information system to record incidents and known faults; apply the "five whys" method of root cause analysis; devise optimal workmix |
bug; defect analysis;
incident handling; known faults log; Lord Bridges' Law; prioritisation and
priority codes; proximate cause; root cause analysis; Service Level agreement
(SLA); system owner; workmix |
|
3. Assess the impact of a currently defective system on the efficient discharge of organisational functions, express this in terms of unmet system requirement, cost this unmet requirement, and summarise your conclusions in a formal problem statement |
Requirements
Specification |
Unit Structure: This unit contains three short lessons, each contributing to the overall unit outcomes, each with its own hyperlinked support material, and each with its own additional reading and tutorial task(s). Here is the learning sequence:
Lesson IT3.1: The Cyclical Nature of Problem Solving
Lesson IT3.2: IT Incidents and Known Faults
Lesson IT3.3: The Formal Problem Statement
Lesson IT3.1: The Cyclical
Nature of Problem Solving
IT problem
solving is merely a subset of problem solving in general, and the key to doing
it successfully is to approach the process systematically. Problems are best
solved in stages, with each stage not only supporting and informing the next,
but providing a controlled exit point should the process need to be aborted.
Theorists differ slightly over exactly how many stages and substages should be
recognised and how to name them, but here is a typical eight-stage description
of what is involved:
Problem
Definition Stages: The first three stages of problem solving serve to
crystalise your thoughts on exactly what it is you are trying to resolve.
Basically, they force you to calm down and check your facts, thus avoiding
misdirected, "knee-jerk", or other inappropriate responses. Problem
definition will be assisted by the use of effective incident handling and
defect analysis systems, as described in Lesson IT3.2. Here is a brief
description of what each stage sets out to achieve:
Step 1 -
Identify Your Problem: This is where
you first formally record the existence of a problem.
Step 2 - Gather
Facts: This is where anything of
relevance to that problem is researched and collated.
Step 3 -
Analyse Those Facts: This is where
the information collated in Step 2 is analysed, with a view to producing a
precise diagnosis of what is going wrong and how much it is costing.
Project Feasibility Stages: The next two stages of problem solving derive a
solution to the problem which is both technically and commercially viable, a
topic dealt with in further detail in Unit IT4 and Unit IT5. Here is a brief
description of what each stage sets out to achieve:
Step 4 -
Suggest Possible Solutions: If the
problem is within your power to solve, then this is where you start to consider
possible solutions (which may or may not include IT solutions).
Step 5 -
Identify the Best Solution: This is
where the real decision making comes in. It is where the full range of
potential solutions is critically evaluated, and it is where the costs and the
benefits of each option are weighed against each other, so that a final
judgement can be made as to which option is best on balance.
Project Execution Stages: The final three stages of problem solving are where
the recommended solution is put into effect as a properly planned and managed
system development project. This is dealt with in further detail in Unit IT6. Here is a brief
description of what each stage sets out to achieve:
Step 6 - Plan
Implementation: This is the
resourcing stage. It is where plans are drawn up to deploy the resources likely
to be available to deliver your recommended solution.
Step 7 - Build,
Implement, and Test: This is where
the solution is actually put together. This might involve anything from a small
improvement to an existing system to a major system development exercise.
Step 8 -
Reassess Situation: This is the last
step. It is where you check how well what you did actually worked. You carry
out what is known as a Post-Implementation Review, and you write up your
findings in a Post-Implementation Report.
So much for the theory. In practice, of
course, as soon as you solve one problem, another - often worse, - will
appear to take its place. So you end up going through the stages again,
and again, and again, in a continuous cycle of improvement. Indeed,
problem solving theorists have frequently portrayed problem solving as a series
of activities arranged around a wheel. They call this the problem solving wheel, and this is how it looks and works:
The Problem Solving Wheel
|
|
Problems are best solved in stages, as
shown diagrammatically in the Figure opposite. You begin at the top and move
round clockwise until you get back to the top ..... ..... by which time you will have another
problem! Steps 1 to 3 of the IT problem solving
process are described in detail in the remainder of this study unit, and
Steps 4 to 8 in later units. |
LESSON RATIONALE: And why does all this matter? Because the final problem solving stage will always read "start again from the top". When dealing with business systems, therefore, you should expect them to be in a state of more or less active improvement - with all the attendant chaos - for ever.
|
EXERCISES (AND STANDARD STUDY TIMES): Depending on how thoroughly you have been exploring the hyperlinks provided, it has probably taken you less than 30 minutes to read the foregoing text, and now you have to do some real work. Complete the following exercises, taking careful note of the expected study times: |
|
|
IT3.1.1 |
Consider five problems currently facing you in the workplace. How many are currently being worked on? What stages on the problem solving wheel are they at? What is the limiting factor in deciding how many problems can be solved simultaneously (ie. if you are currently managing four active problems, why can't you manage five)? [30 minutes.] |
|
IT3.1.2 |
Obtain dictionary definitions of the words "iterate", "iteration", and "iterative", and get your IT Department contact to explain the term "development lifecycle". [30 minutes.] |
|
IT3.1.3 |
Research the modern management concept of "continuous improvement". [30 minutes.] |
|
Submitting Exercises for Assessment and Feedback (Fee-Paying Clients Only): Simply e-mail your answer(s) for full tutorial feedback. State each conclusion clearly, and briefly explain how you arrived at it. You may do this one exercise at a time, or all at once. Additional questions may then be asked, and additional tasks given as required. [Submit an Exercise] Please cooperate with this student-tutor exchange, because it will eventually form the basis of your individual student progress record. Do not proceed to Lesson IT3.2 until all the tutorial tasks are completed and signed off. |
|
Lesson IT3.2: IT Incidents and
Known Faults
We now move to
problem solving specifically in an IT context, and the central point of this lesson
is that IT problems can only be reliably identified if system performance has
been formally monitored. This means that incidents and known faults logging
systems (often referred to as "helpdesk systems") need
to have been in place for long enough to accumulate reliable data. Of course,
you only need to be concerned with logging service provision levels if you are
the "owner" of the system in question, and you will probably not be
the owner of any of your organisation's large corporate systems. However, you
may well be the owner of one or more small departmental systems, and the
comments in the remainder of this unit presume that you are responsible for at
least one such small local system.
The
Rule: You must formally monitor and record system performance.
There are a number of software packages on the market to help you do this vital
IT administration task, but because they themselves can be quite expensive they
are best left to larger departments where there are dedicated helpdesk staff.
For smaller workplaces, you can get away quite comfortably with a simple
pen-and-paper system, or perhaps a PC spreadsheet.
The main thing
you have to record are incidents, that is to say, individual occurrences
of departures from specified functionality or level of performance. Each incident
report should record such things as date, time, who reported the incident,
who took the call, who was affected, what they did to recover from it, and how
long it all took. The accumulation of incident reports over time gives you an incident
log. All incidents should be recorded, especially if they are regular
failures, and even if they are cured immediately as part of a routine
corrective procedure (because only then can a true picture of defect costs be
built up).
Key
Illustration: A system might only have one fault, but it
might strike once a day. Another system might have 365 faults, but they might
each only go wrong once a year. Both systems will have 365 incident reports
in their incidents log at the end of one year.
Key
Document - Service Level Agreements (SLA): These are
contractually binding agreements between the providers and purchasers of a
service. They specify what is and what is not acceptable in such areas as
system availability and response time. Their use is recommended whenever
the support people are organisationally distant from those who have to rely on
them, and essential when support has been contracted out to external
agencies.
The incidents
log then needs to be periodically analysed, looking for the underlying system
defects - or "bugs" - and each defect should then be recorded in
a known faults log. This process is known as defect analysis, and
the known faults log serves two important purposes:
Incident Handling: The known faults log is
a prime source of instruction for new system users, because it shows how to get
around incidents when they occur. Again this does not have to be a big system -
perhaps a simple advisory notice kept close to an existing PC, or next to the
telephone.
Planned Upgrade: The known faults log is also - for obvious reasons - the primary document for helping you decide what needs to be repaired. Simply follow Lord Bridges' Law, which reads:
Defect analysis
can sometimes require considerable detective work. You begin by accumulating
totals and averages, and you then go progressively further afield, looking for
"knock on" costs in other departments. It is also worth finding out
how others have attempted to solve the problem in the past, why they failed,
whether others are currently "on the case" at other sites, and so on,
and so on. Remember that the most compelling statistic is how much your problem
is costing in staff time per month, and be particularly suspicious as to why
the problem has not been solved before, because it may well be that it is
actually too difficult or too expensive to solve.
It is also
informative to get to the bottom of your problem, because it did not just
happen: it was at best allowed to happen, and at worst caused.
The problem here is learning how to distinguish between proximate (or
"proximal" or "immediate") cause and root cause.
This is a distinction first developed by lawyers, and means looking not so much
at what happened in the seconds leading up to an accident, but at what had gone
before. The proximate cause of injury in a road accident, for example, is
physical impact, but that explains little. A slightly less proximate cause
might be drunkenness on the part of the driver, which might, in turn, have been
due to depression, which might, in turn, have been due to unemployment, which
might, in turn, have been due to poor education, and so on. So the problem with
root cause analysis is knowing how far to go, and for the purposes of
this course, we recommend the method of "the five whys" (see,
for example, Pojasek, 2000/2003 online).
Further advice on root cause analysis in healthcare is available from Medical
Risk Management Associates, and
further worked examples are available here.
LESSON RATIONALE: And why does all this matter? Because an important part of producing a sound business case is to select the right problem in the first place.
|
EXERCISES (AND STANDARD STUDY TIMES): Depending on how thoroughly you have been exploring the hyperlinks provided, it has probably taken you less than 30 minutes to read the foregoing text, and now you have to do some real work. Complete the following exercises, taking careful note of the expected study times: |
|
|
IT3.2.1 |
Local departments frequently run small departmental systems to compensate for deficiencies in, or lack of access to, major corporate systems. Who "owns" these small systems? Who should run the incidents and known faults logging systems for them? What is the key strategic weakness in such an approach? [30 minutes.] |
|
IT3.2.2 |
Select one of your personal small systems, and, if you do not currently run an incidents log, produce one retrospectively to cover the last ten failures of that system. Produce (or, if you already have one, update) a comprehensive known faults log. [2 hours.] |
|
IT3.2.3 |
Get your IT Department contact to show you a Service Level Agreement and last month's systems metrics reports. [30 minutes.] |
|
IT3.2.4 |
Use the "five whys" method to explain (a) why the RMS Titanic sank [click for details], (b) why Custer got his men massacred at the Little Big Horn [click for story], and (c) why the functionality provided by your selected small system is not already provided by one of the major corporate systems. [30 minutes.] |
|
Submitting Exercises for Assessment and Feedback (Fee-Paying Clients Only): Simply e-mail your answer(s) for full tutorial feedback. State each conclusion clearly, and briefly explain how you arrived at it. You may do this one exercise at a time, or all at once. Additional questions may then be asked, and additional tasks given as required. [Submit an Exercise] Please cooperate with this student-tutor exchange, because it will eventually form the basis of your individual student progress record. Do not proceed to Lesson IT3.3 until all the tutorial tasks are completed and signed off. |
|
Lesson IT3.3: The Formal
Problem Statement
Having
identified and costed your problem, you then need to inform your superiors,
because they may be blissfully unaware of it. This is often a problem in itself,
because senior managers have a different set of priorities. Many will not know
about your department at all, for example, and some will not have time to think
about your problems because they have enough of their own. The best way to
identify a problem to yourself is to mark it off in red highlighting pen on
your document flowchart, and the best way to identify it to others is to
describe what is going wrong in a few brief sentences of text. This calls for
some very precise wording, but can readily be taught by worked example, as
follows:
Skeleton
Problem Statements: Here is a suggested problem statement structure. When
specific details are inserted into the skeleton sentences at the points shown,
it gives a succinct statement of what is currently going wrong with your
system. Note the repeated reference to functionality, as taught in Unit IT2. There are three
basic versions of the statement, one to cope with computer systems which are making
mistakes, another to cope with computer systems which are doing what they were
originally designed to do, but now need updating because the outside world has
changed in some key respect, and a third to cope with manual systems.
Version A - IT
System Repair Required: "The
X system [you must name it,
and locate it precisely within your document flowchart] is a [state
system type] system, owned by [state department nominally responsible for
the system]. Its prime function is to [state prime function] and it does this by taking information from [cross-reference to the system input(s)
shown on the document flowchart] and
feeding it to [cross-reference
to the system output(s) shown on the document flowchart]. The detailed functionality is set out in Requirements
Specification dated [state
date], and the required level of service
is governed by a [state
whether formal or informal] Service Level
Agreement dated [state date] between [state departments involved].
Currently [sometimes only
"imminently"] a [summarise fault or faults] mean(s) that the system is unable to deliver the [state defective function or functions] function(s), resulting in [state problem frequency] costly [state nature of incident]."
Version B - IT
System Enhancement Required: THE FIRST THREE SENTENCES ARE THE SAME AS
VERSION A, THEN ..... "However,
these documents fail to take account of [state change in policy], and
the system [is already/will
become] unable to deliver the [state defective function or functions] function(s), resulting in [state predicted problem frequency] costly [state nature of incident]."
Version C -
Manual System: "The X system [you must name it, and locate it precisely
within your document flowchart] is a manually
operated business system, owned by [state department nominally responsible for the system]. Its prime function is to [state prime function] and it does this by taking information from [cross-reference to the system input(s)
shown on the document flowchart] and
feeding it to [cross-reference
to the system output(s) shown on the document flowchart]. There is no Requirements Specification and no
Service Level Agreement. Currently [sometimes only "imminently"] a [summarise
fault or faults] mean(s) that the system
is unable to deliver the [state
defective function or functions]
function(s), resulting in [state
problem frequency] costly [state nature of incident]."
The Final Forms: This is the sort of paragraph you will end up with
(the translation cannot always be totally word for word, so a little
resourcefulness is required):
Version A -
System Repair Required: "The
SEPSIS system [IT system names
are frequently acronyms, and frequently end -IS because the full name ends with
the words "information system"]
is a networked database system [the
basic strategies and platforms are outlined in Unit IT4] owned by the Finance Department. Its prime function
is to reduce the amount of money tied up in unofficial small stores, and it
achieves this by taking information from the bought ledger and stores systems,
and comparing it to spot-check audit data. The detailed functionality is set
out in a Requirements Specification dated 23rd July 1996, and the required
level of service is governed by an informal Service Level Agreement dated 17th
August 1996 between Mr So-and-So, the then Finance Director, and Mrs
Whats-her-Name, Head of IT. Currently, a serious programming error means that
system is unable to deliver any of its outputs on time, resulting in impaired
strategic purchasing."
Version B -
System Enhancement Required: THE FIRST THREE SENTENCES ARE THE SAME AS
VERSION A, THEN ..... "However,
these documents fail to take account of the management reorganisation on 1st
January 1999, and the system is already unable to deliver any of its outputs on
time, resulting in impaired strategic purchasing."
Version C -
Manual System: "The patient
appointments system is a manually operated business system, owned by the XXX
Department. Its prime function is to inform patients when their next clinic is,
and it does this by taking information from clinicians' diaries and record
cards, and passing it to the departmental secretary for typing and posting.
There is no Requirements Specification and no Service Level Agreement.
Currently a number of factors prevent the system responding to
the resulting patient queries quickly enough, resulting in an average of 10
unnecessary DNAs per week, at an estimated annual cost to the department of
£xxxxx."
LESSON RATIONALE: And why does all
this matter? Because another precondition of a sound business case is the
ability to describe the defect convincingly to those with the cash to cure it.
|
EXERCISES (AND STANDARD STUDY TIMES): Depending on how thoroughly you have been exploring the hyperlinks provided, it has probably taken you less than 30 minutes to read the foregoing text, and now you have to do some real work. Complete the following exercises, taking careful note of the expected study times: |
|
|
IT3.3.1 |
Get your IT Department contact to explain the derivation of some of their system acronyms. [30 minutes.] |
|
IT3.3.2 |
Get your IT Department contact to show you a Requirements Specification and a System Enhancement Request. Get them to comment also on the split between system maintenance expenditure (money spent running to stand still) versus system enhancement expenditure (money spent delivering new functionality). [30 minutes.] |
|
Submitting Exercises for Assessment and Feedback (Fee-Paying Clients Only): Simply e-mail your answer(s) for full tutorial feedback. State each conclusion clearly, and briefly explain how you arrived at it. You may do this one exercise at a time, or all at once. Additional questions may then be asked, and additional tasks given as required. [Submit an Exercise] Please cooperate with this student-tutor exchange, because it will eventually form the basis of your individual student progress record. Do not proceed until all the tutorial tasks are completed and signed off. |
|
If you have got to this point by mistake,
click to return as appropriate:
Otherwise, congratulations!! You have reached
the end of Unit IT3 of the INFORMATICS programme. Click to proceed to Unit IT4, and good luck!
References
See the Master References List
[Home]