Archive for the ‘Testing’ Category

Developers vs Testers

Wednesday, September 24th, 2008

This is a post that has been a while in the making because I can't seem to quite get it right.? It is essentially the basis behind my recent post The Danger of Bad Tests, so I figured that now is the time.? Developers make bad testers.

There I said it.? Everyone knows it.? Some people even talk about it.? But somehow we still manage to ignore it.? Writing code does not qualify you to test code, in the same way that testing code does not qualify you to write it.? This does not mean that developers should never have to test, nor that all testers should be exempt from writing code.? Instead it means that it is time to stop pretending that we can save money by pressing developers into double duty.

A lot of times developers make bad testers simply because they don't want to test.? This is the same reason that no one wants to work on maintenance projects consisting of mostly bug fixes.? All developers want to be making the cool new feature, not reading someone else's code to figure out why it is broken or running through ten different variations of the same function to make sure it can handle all the edge cases.? So, they do the bare minimum they need to do to get by and still be able to claim that a feature was tested, and then they return to the more interesting work of writing new code.

Of course, not all developers are seeking to simply scrape by.? There are a number that genuinely want to release quality software, and they make sincere attempts to test new features to ensure they work.? The problem is that developers know how the code should work, and test to make sure that it works the way they think it should.? They continue refining their code until they achieve their desired result, then they try to hit a couple of outlier cases and claim completion.? Only a few times have I ever encountered a situation where someone knowingly claimed a feature was complete when it wasn't (and this was always due to management plans that set an arbitrary code-complete date and then allowed for as much bug-fixing and testing as needed – a horrible plan that simply encouraged the developers to submit sloppy code and then fix it during the much longer testing and bug-fixing phase).

Finally, most developers become very attached to the feature as they wrote it.? They simply don't conceive of a way in which someone could use their feature in a wrong or unexpected way, so it doesn't occur to them to test those cases.? It doesn't occur to them to test what happens when you try to save a large file to an external drive and yank out the cord halfway through it.? Not because they are dumb or otherwise unimaginative, but because they already know that if you pull the external drive out while a file is saving, the file will not get saved.? Since you are trying to save the file, there is no reason to disconnect the hard drive until you are done.? Developers will argue for hours about whether something is a bug or a user error because if it is a bug, it implies there is something wrong with their code.? So when they test, they assume a perfect user, because a perfect user would never do something as dumb as disconnect a hard drive during a save.? Unfortunately, we all know the perfect user does not exist, but somehow the myth persists on.?

Just like the mythical developer as tester.

The Danger of Bad Tests

Monday, September 22nd, 2008

Bad tests are worse than no tests at all.? It is a simple fact that seems to go unstated or unnoticed on most projects.? Someone decides that there must be a certain number of tests (or a certain percentage of code must be tested), and so the developers (who are already overworked and not suited to write good tests anyway) throw together a set of tests that ostensibly navigates the required functionality.

I am convinced that this seemingly unwritten rule is the reason that developers avoid writing tests.? We have all been forced to spend hours wading through some test failure only to discover that whoever wrote it had obviously spent more time commenting the code than actually testing what was needed.? Take, for example, my experience today.? I had to fix a defect which allowed bad data to be written to our database, so I wrote some code to validate the data before it was written.? It was a pretty simple fix as the code was already written, we had just missed an entry point.? Satisfied that all was well, I checked in the code, and promptly broke the build.?

Now, to be honest, this is a fairly common occurrence with me.? We use an automated build system that detects check-ins and schedules builds immediately.? Unfortunately, I have a habit of checking the differences in all my files before checking them in and including comments on the check-in to indicate what I had intended to do.? This naturally takes some time, so I frequently end up spanning the automated build process.? By the time I have completed my check-in, the build has already started, and will usually fail because of a dependency on an item I have not checked in yet.? This has had the unfortunate side-effect of training me to ignore build errors.? Its a bad habit, but the system is setup such that I expect a failure during my check-ins, so I wait for the second build before I worry about it.

In this case, the error was due to a test failing.? Originally I had assumed the test was failing due to my slow check-in process.? The test was in a related area, but was testing our handling of duplicates.? After the second build failed, I realized that the test was failing because of my change.? After looking at the test, there was no reason for it to be failing due to the changes I had made.? It claimed to be testing how the system saved duplicates – perfectly valid data in our case.? I checked the validation code and sure enough, duplicates were not allowed.? Intrigued, I stepped into the debugger to see what was going on.? Imagine my surprise when the validation failed!

Now I was in trouble.? I had been relying on the existing validation code and if it was flagging valid data incorrectly, I would have to go back through all the validation rules and make sure it was written correctly.? Not a task I was looking forward to.? Fortunately, in addition to my bad habit of not trusting the build, I have also learned not to trust our automated tests.? They were written with the goal of meeting a certain percent code-coverage, and have been shown to be fairly useless in terms of correctly evaluating the behavior of the system.? So I started looking at the actual data that was being written, and as it turns out, the data was in fact invalid.? The test, which was attempting to ensure that valid data could be written to the database, was writing invalid data.? Not only that, but the test had been happily running and passing for over a year.?

False positives such as this cause two problems.? The first problem was that the system was telling everyone that there was no defect in the way in which we handled duplicate data.? In this case there was in fact an underlying problem that was being hidden by our use of invalid data.? The problem had already been discovered in a released version and had been fixed, but this test never indicated that the problem existed, despite that being its reason for existence.? The second problem is that it falsely indicated an error when there was none, causing a couple of hours of development time to be lost in a very tight schedule.? Lapses such as this cause the situation that we are currently in on this project, where we have simply turned off a number of our automated tests because they are failing and no one can explain why.?

Good tests are hard.? They are hard to write, and they are hard to automate.? Unfortunately, the dangers of writing bad tests far outweigh the cost of writing a good test, so most developers simply ignore writing tests unless they are forced to, and then development slows down as everyone spends their time chasing imaginary bugs.

Defect Severity

Wednesday, August 20th, 2008

I always find it annoying as a developer when I am assigned a “High” severity
defect only to discover that someone would like to have a label changed, or
thinks that the page layout is confusing or any number of valid defects that
simply don't merit the attention that a high-severity item merits.? I've found
testers (and developers) who use severity as a means to push a fix through the
process.? Others who seemingly assign severity based upon the severity relative
to the other defects he or she has found (“Hmm, this label is *really*
confusing, probably about twice as confusing as the last one so I'd better make
this a high severity instead of a medium severity…”).? This really shouldn't
be what severity is for, that is why most defect-tracking systems also list a
priority (there are plenty of low severity items that for one reason or another
should be done right away).

Priority and severity are often used interchangeably, but have different
meanings.? The priority is usually meant as the relative importance of a
defect.? This is usually a factor of the severity, number of complaints,
likelihood of occurring and ease of implementation.? The priority is almost
always subjective, and usually the customer has the final say.? If your end user
says you have to change the name of the “File” menu to say “Data”, inevitably in
your next release you will have a “Data” menu item.

Severity, on the other hand, tends to reflect the negative impact of the bug
on the system.? A high severity defect has a highly negative impact on the
system or end user, while a low severity defect has a minor impact on the system
or end user.? Severity, like priority, can be subjective, but I've found that a
fairly simple set of guidelines dramatically reduces the number of poorly
classified defects.? Essentially the system breaks down like this:

  • Low Severity
    • Annoying or otherwise confusing to the end user.? The behavior of the system
      is “correct” simply not clear or as expected.
  • Medium Severity
    • The operation of the system is not correct.? There is a reasonable
      work-around that allows the end user to accomplish the desired
      functionality.
  • High Severity
    • The operation of the system is not correct, and there is no
      work-around.
  • Urgent
    • The defect causes a deployed system to crash.

These fairly simple guidelines allows most everyone to operate from the same
page.? Now when I see a high severity defect, I have a fairly good idea of what
to expect, and when the urgent defect comes across my desk, I know I'd better
stop what I'm doing and investigate right away.

?

Testing in Debug Mode

Thursday, August 14th, 2008

Here we go again.? You might be able to tell given the tone of my posts here that I am in the middle of attempting to put a new release out the door.? We are almost ready to send it out, but we keep running into roadblocks.? The latest problem had to do with, you guessed it, testing in debug mode.?

Essentially, the problem went something like this.? The testers would fire up the latest build of our software and start testing it.? Usually about 30 minutes into a testing session, they would get a random appearing error and the software would crash.? This behavior was not consistent.? Sometimes the software would work fine, other times it wouldn't take longer than a few minutes before they would get the error.? They wrote up all the information they had and sent it back to the developers.? The developers fired up the code and waited.? And waited.? And kept waiting, but everything worked as designed.

Thinking that perhaps it was a machine problem, the code was moved to the test bed and the software fired up with a debugger attached.? Still no problem.? The developers declared the problem a transient issue and the defect was triaged from the release.? Test cleaned the machine and installed the next build, and sure enough, they encountered the exact same behavior.

Eventually, we tracked our problem to a quirk in how .NET compiles debug and release modes.? In debug mode, .NET will artificially extend the life of all variables until they fall out of scope.? This ensures that any time a breakpoint or other event occurs that cause the debugger to break into the code, the developer can inspect the value of all variables that are currently in scope.? In release mode (when run outside the debugger), .NET makes no such guarantees.? In our particular case, we were not particularly careful with how we were disposing of some unmanaged objects in relation to some threading we were doing and could occasionally wind up in a situation where they could get disposed of before we were ready to dispose of them.? Of course, this never occurred in debug mode, because the lifetimes of all the objects were artificially extended, but the second the testers installed a release version, it became all too painfully obvious.

Developers love to test in debug mode.? They get to do all sorts of nice things like break the code, watch the debug statements flyby, inspect memory and variables on the fly, all great and useful stuff.? Testers hate it when things are tested in debug mode.? its not what the end user runs, it doesn't behave the same way, its not installed on a clean machine, all valid and accurate points.? So when it comes time to test, the testers all carefully setup a clean machine, install the latest version and test as if they were users.? The developers meanwhile update to the latest version of the code, hit compile and are up and running before the testers have even created their user accounts.? The developers all race through their tests, declare the version “bug-free” and wait for the inevitable agreement from the testers.

The testers meanwhile are busy complaining about how the software doesn't work if the latest version of .NET isn't installed, or how it will crash if it is installed to a directory other than the default directory, and so on.? This is why every version should be thoroughly tested in an environment as close to the production environment as possible.? No compilers, no debuggers and certainly, no debug versions.? Let the developers run their tests in debug mode.? It is far more productive, especially early on or when there are rapid development cycles, but always set aside some time for release mode testing.? You never know what might fall out.

Controlling Installers

Wednesday, August 13th, 2008

This is a topic that I don't really know what common practice is but I seem
to consistently encounter resistance when I suggest it.? I am a firm believer
that any and all installers created should be controlled and versioned the same
as any other artifact of development.? That means it goes in source control.? To
me, it just makes sense – that way at any point in time you can go back and grab
the exact bits that were built and (possibly) delivered.? There is no substitute
for having the exact copy when an obscure bug report filters in.?

Just recently we delivered a new version of our software to testing.? Our
group has a very limited testing staff, and so it ended up being a day or two
before they were able to install and begin testing the software.? Within 15
minutes of installing the software, we began to get all sorts of bug reports.?
Nothing was working.? Of course during this time, the development staff had
continued pressing on to meet their deadlines, and had not encountered any of
the issues being encountered by the test staff.? Dutifully we reverted back to
the version of the software being run by test in an attempt to verify what was
occurring.? Unfortunately, we don't control copies of our installers, so we had
to rebuild them.? Once that was done, we installed and ran a few tests, but
everything seemed fine.? We showed test what was happening and they agreed that
everything seemed OK.? We decided to re-install the software on our test boxes.?
We installed the software and sure enough, all the problems went away.? A
miracle, right?? Or maybe someone secretly fixed all the problems and delivered
them and that's what was built…

In general, there seems to be two lines of resistance to including the
installer in source control.? The first is the belief that source control
doesn't handle binaries well.? This may have been true many years ago, but any
mature source control system should handle binary files just as easily as text
files.? When people claim that is the case I immediately suggest they upgrade
their system.? The second line of resistance seems to be that since the build is
deterministic and completely automated, it will always produce the exact same
result.? In theory this is true, but in practice there are far too many
variables to rely on a bit-for-bit copy every time.

For example, as it turns out, in our case the problem was due to a licensing
issue.? We had received a new runtime license for some software we are using,
and had forgotten to put it on the build machine.? The build incorporated the
old license (which has expired), and so after it was installed and test began to
run it, the software was unable to retrieve a valid license, and failed.? Of
course, by this time, we had updated the license on the build machine, so when
we attempted to reproduce the issue by rebuilding our software, we built the
correct license into the software and viola! problem solved.

So, it is not enough to revert the source code back to the original state and
have an automated build, you really should be reverting the entire build machine
back to the state it was in when the software was built.? That should guarantee
that you reliably build the exact same bits as when it was originally built.?
But then again, if you are going to go to that much trouble, why not simply
control the installers themselves?

?

Testing for the Sake of Testing

Thursday, August 7th, 2008

So there you are, its the day before the next version of your application is scheduled to be released, and Joe discovers a bug.  After looking at it, you verify that sure enough, if you attempt to open a file that is already open, the software will use the data in memory instead of loading it back from disk.  The defect is not a show stopper, but it would be nice to fix and so Sue fires up her IDE and before you know it, she’s fixed the problem.  With the annoyance averted, the software is wrapped up and shipped out.  Everyone pats themselves on the back and starts planning the next version.

About a week later the customers start calling.  It seems that every time they open any new file, all their other open files are closed and all their data is lost.  They are understandably not happy, your boss is not happy, and everyone wants to know how, with all the testing that was done, such an obvious bug could have gone unnoticed.  Of course, the problem is that the bug didn’t exist when the software was tested, instead it was introduced accidentally and quite innocently because Sue didn’t have time to analyze all the entry paths to the method she was modifying.  Management demands that new processes are implemented to prevent this from ever happening again, and a whole new set of draconian procedures are implemented in the hopes that there will not be a next time.

Most of us have been in this situation at some point in time.  The details may change – sometimes the defect is huge and crashes the application and simply must be fixed, and other times its not even a defect, just something somebody didn’t like.  Unfortunately, as developers, when we encounter a problem, we feel a strong inclination to just “fix it” and sooner or later we give in to the temptation.  Just as inevitably though, when we do, Murphy rears up and bites us.

After spending so much time working on a piece of software, it is often difficult to accept that there are problems.  The first reaction is to always fix this “one last thing”, or tweak “one little setting” to make it better.  This is perfectly natural, but it is most often wrong.  Every line of code we write has an effect on the software we are modifying.  That is kind of the point of writing the code.  Sometimes the change is simple and the effects can be known completely (I changed the name of a local variable, but left its meaning the same).  Other times, the end result may be more complicated (I changed the return value from an integer error code to a boolean because I only saw a success or failure error code).  As developers, we tend to classify our changes as the former – if we think there are unintended consequences, we will continue to work until we have tracked them all down.  Testers, on the other hand, tend to view all changes as the latter and want to start all their testing over every time a new change is made.

Of course, a happy medium needs to be met.  Early on in the development cycle, the point of testing is to uncover problems for the developers to fix.  The job of testing is to make sure that things pretty much work the way they are supposed to.  As the release date approaches though, the purpose of testing changes.  It is no longer a general search for problems.  It becomes a very targeted investigation of the characteristics of the software.  How does it behave under load?  What happens when I press this button with this window open?  Does the system allow me to enter in a string where it should only allow a number?  The difference is subtle, but important.  Testers are no longer uncovering problems for the developers to fix, but are instead looking for problems the end-user may encounter.  Their focus has changed. 

A lot of people, when faced with this situation, want to know why the developers don’t simply fix the problems as they come up.  We shouldn’t be testing just for the sake of testing, or if testing has found them, then they should be fixed, the argument goes.   However, it is almost always better to know what problems you have up front and solve them in a controlled and consistent manner, as opposed to having to rush a fix out the door and not properly testing it.  It is very difficult to make a successful code change when faced with increasing schedule pressure.  Most defects are not worth the risk. 

In the end, don’t try to tempt Murphy for the sake of wanting to fix everything.  Objectively evaluate all your defects, and if you can’t justify the risk or you can’t justify delaying the release to properly test and fix it, let it go.  Fix it in the next version.  Put out a patch after you have enough time to properly address it.  Shorten your release cycle and ship a new version quicker.  Or, if it is important enough, delay the current version and do it right.  Test for the sake of testing.

It will work out much better in the long run.