Software Development Refactoring Guide
Contents |
What is Refactoring?
To quote the expert:
"Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which "too small to be worth doing". However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring — which allows you to gradually refactor a system over an extended period of time."
In other words, if by looking at some code in some class, your stomach churns and you feel the urgent need to suck down on some Pepto-Bismol and flee the building, it's a good chance that the code you are looking at is a good candidate for some Refactoring.
Accumulated CRUD is also usually a good candidate for Refactoring. This happens in most code bases where change is ongoing. Things get added to a piece of code from a ticket, or from a new requirement but there is never enough time to do it "cleanly". This happens many times, and next thing you know, you are left with a pile of CRUD, which you then dread to look at and work on.
This guide is intended to be a simple guide to some common Refactoring approaches to take while in "Preventive Maintenance Mode" and it is not intended to be a full guide to all of the Refactoring techniques and patterns. For more information on Refactoring check out http://www.refactoring.com and read the Refactoring: Improving the Design of Existing Code by Martin Fowler (available on Safari!).
A good summary of all of the Refactoring techniques: http://www.refactoring.com/catalog/index.html
There is also a new chapter in Core J2EE Patterns: Best Practices and Design Strategies, Second Edition, dedicated to J2EE Refactorings. (also available on Safari!)
Refactoring as a design technique
Refactoring is not only useful when maintaining code bases or dealing with introducing functionality to older code. Refactoring is a valid design technique that can lead to class designs with high cohesion and low coupling. Refactoring as a design aid is one of the principal methods in Test Driven Design and Extreme Programming.
How to go about it:
- Identifying what to refactor
- Tools to identify problem spots
- Before you Refactor
- Unit Testing
- Refactoring != new functionality
- Getting down to business
- List of Refactorings and how to apply them
Identifying what to refactor
Identifying what to refactor is somewhat subjective, however the key to effective Refactoring, is to do it in small steps.
Often one of the best opportunities to identify what to refactor, or to least be aware of code that is in need of being refactoring, is when adding new functionality to the code base.
When adding a new piece of functionality you may often think, "wouldn't it be easier do add this new call, if this large if statement was refactored into it's own method?" The key point here is that functionality would remain the same, but the way that the if statement is called, could be isolated by using say Extract Method, so that the area of the code you are working on is easier to understand.
In Object Oriented programming, a common goal is to have small, clear and concise methods, which have a well-defined responsibility. The same applies to the objects themselves which on which these methods belong to. Responsibility and intent should be easy to understand. "Do one thing and do it well". Knowing this, and wanting to achieve it, look for methods or classes that are simply doing too many things, and break down the logic.
However the BEST and easiest indicator of a much needed Refactoring, is DUPLICATION! Code duplication is _evil_. Say that one like Dr. Evil. It is usually the number one indication that code needs to be refactored.
There are two kinds of duplication. The first kind is the easy kind, identical pieces of code. This is easy to spot and usually easy to solve. Especially easy is code duplication in the same class or object. It gets trickier when the duplication is across objects or methods. But it is still easy to deal with. The second kind of duplication is harder to spot. This is the kind of duplication where the code is not identical, but the meaning and intent is. This is harder to deal with, but there are specific techniques that can be applied to deal with this kind of sneaky duplication.
Tools to help you identify problem spots
There are tools that can help in identifying classes or methods that need to be refactored besides just being able to identify them yourself. One of the most effective tools is a tool that can tell you the "Cyclomatic Complexity" in code (this assumes you know what Cyclomatic Complexity is). If not, here is a brief definition:
"This metric (Cyclomatic Complexity) is an indication of the number of 'linear' segments in a method (i.e.sections of code with no branches) and therefore can be used to determine the number of tests required to obtain complete coverage. It can also be used to indicate the psychological complexity of a method.
A method with no branches has a Cyclomatic Complexity of 1 since there is 1 arc. This number is incremented whenever a branch is encountered. In this implementation, statements that represent branching are defined as: 'for', 'while', 'do', 'if', 'case' (optional), 'catch' (optional) and the ternary operator (optional). The sum of Cyclomatic Complexities for methods in local classes is also included in the total for a method.
Cyclomatic Complexity is a procedural rather than an OO metric. However, it still has meaning for OO programs at the method level."
There is a handy plugin for Eclipse called Metrics, which you can download from here: http://www.teaminabox.co.uk/downloads/metrics/index.html (there is another plugin http://metrics.sourceforge.net/ for metrics)
Please read the docs or the website to how to install (it's simple). This plugin, after it is installed and configured, will analyze your source code and report as tasks, the various reports that it can run on source code. This tool can generate a lot of output, so it?s best to configure it to only do a couple of metrics. Cyclomatic Complexity, and also useful has a balance check "Cohesion" and "Efferent Coupling"
Here are some baseline settings for the plugin (this is configured by going into Window -> Preferences then selecting the Metrics node.
- In The Complexity Tab
- Check Cyclomatic Complexity and set the Upper bound value to 3
- Check in the three checkboxes in the Options section
- In the Cohesion Tab
- Enable both checkboxes and set the bounds of "Chidamber and Kimerer" to 40, and the "Henderson-Sellers (%)" to 50
- In the Miscellaneous Tab
- Uncheck all checkboxes except "Efferent Coupling". Set this upper bound to 20.
To enable Metrics on a project, switch to the Package view, right click on the project you want to analyze, find the Metrics node, and check on "Enable Metrics Gathering". After you hit OK here, the project will rebuild. Once it is finished you should see a number of new tasks added to your Task view. It is possible to filter out classes or packages by using regular expressions.
So, now that you have the plugin installed, how do you read it? Let's start with Cyclomatic Complexity. Look for descriptions where the Cyclomatic Complexity value is high. Cyclomatic Complexity is given on a method basis.
Here is another way to view these reports. You can generate HTML reports by right clicking on the project, choosing Export, then selecting metrics. Choose the HTML option and save to a directory. Once exported (it takes a minute or two) open the "index.html" file. The reports you want to see are under "Type Metrics" and "Method Metrics". In those, in "Type Metrics" look at the Order by "Cyclomatic Complexity" and "Efferent Couplings". This will show you which classes are worst offenders in these two metric categories. In the Method Metrics, Order by, you want to look at the Cyclomatic Complexity. This will give which methods are the worst offenders in this metric.
Running these metrics should give you a pretty good idea of what areas look promising for Refactoring. The metrics alone will not be enough however. To even get more focused it's useful to do some analysis of the results of these metrics.
Before you Refactor
Now that you have metrics and have done an analysis of the tracker, you are almost ready to start Refactoring. Almost because one of the most important steps is still left do to. This step is Unit Testing.
Unit Tests are your best friend
In an ideal world, you should already have a JUnit Unit_Test, which exercises the class or method that you are about to refactor. But this isn't an ideal world unfortunately. Before you can safely and confidently start a Refactoring you NEED to have a Unit test for the class you are going to change in order to know if something you have done affected the functionality of the class.
Without this step you are crossing the street with a blindfold. So, if you already have a Unit test, run it, make sure it is passing, identify if there are gaps in the unit test, and implement the tests if this is the case. Only then can you start to refactor.
If you don't have a Unit test, create one. It's always much harder to write a unit test after the fact, but it is possible. Watch out for side effects and boundary conditions. Your Unit test may end up looking more like a functional test, but you still need something to tell you that what you are doing is correct, and you are not adding functionality or introducing new bugs into the code base.
The Rhythm
Once you identified a Refactoring, either from having to add new functionality or by doing analysis using the metrics available to you, and you are armed with Unit tests, the process is then pretty rhythmic.
- Run unit test
- Perform Refactoring step
- Compile
- Fix compile errors
- Run unit test
- Test successful? Go to step 2 and repeat
- Test failed? Undo Refactoring and try it again in smaller steps
That's it.
The Refactorings and how to apply them.
Again this is not a complete list of all the possible refactorings. It's just a list of the ones that are most common, and provide the best benefits.
A lot of content in the refactorings section is borrowed from Refactoring. I've changed, simplified where appropriate_
Please feel free to add comments and in specific handy examples that you've encountered while performing these or other refactorings in your code base.
- Extract Method
- Move Method
- Pull Up Method
- Replace Temp with Query
- Replace Method with Method Object
- Split Temporary Variable
- Extract Class
- Self Encapsulate Field
- Replace Array with Object
- Replace Magic Number with Symbolic Constant
- Encapsulate Collection
- Replace Type Code with Subclasses
- Decompose Conditional
- Consolidate Conditional Expression
- Replace Conditional with Polymorphism
- Introduce Null Object
- Rename Method
- Add Parameter
- Remove Parameter
- Separate Query from Modifier
- Parameterize Method
- Preserve Whole Object
- Introduce Parameter Object
- Remove Setting Method
- Replace Constructor with Factory Method
- Encapsulate Downcast
- Replace Error Code with Exception
- Extract Subclass
- Extract Superclass
- Extract Interface
- Form Template Method
Any fool can write code that a computer can understand. Good programmers write code that humans can understand" Kent Beck
This page has been accessed 1,061 times. This page was last modified 06:47, 13 May 2006.