Often when performance testing in a black box environment, you are left with the onerous responsibility to report against response time performance.
A typical approach by performance testers is to rely on 95th percentiles, which is effectively a Service Level Agreement (SLA) saying that 95 percent of all my samples have a response time below 5 seconds. This is often specified as a Non Functional Requirement (NFR).
But so what? What does this really tell us? More importantly, if the SLA fails, you’re probably left hanging in the wind trying to explain how “bad” it failed to a (now interested) Project Manager. On the flip side, if everything is green, how close were you to failing? Are you an n’th degree away from failing? What about comparing two different application’s response time performance when they have different NFRs to begin with? Getting confused?
Enter the Apdex performance index. Apdex is a numerical measure of user satisfaction that can be built from metrics expressed via more traditional SLAs and/or NFRs. The fundamental objective of Apdex is to simplify the reporting of application response time measurements by making it possible to represent any such measurement using a common metric. This can be reported on by extracting data from Load Runner / JMeter and analysing within Excel as demonstrated below.

Read on to find out more about these scores and how they are calculated.
The apdex score is a unified measurement between 1.00 – 0.00, or in plain English, Excellent, Good, Fair, Poor and Unacceptable. This colour chart represents those values:

You calculate the apdex index by first counting the total number of samples, the number of samples that have ’satisified’ a criteria (e.g. your SLA of 5 seconds), the number of samples that are ‘tolerating’ (e.g. greater than 5 seconds but less than 10 seconds) and the number of samples that have failed (e.g. greater than 10 seconds). You then whack them into a formula as such:
And out pops an index between 0 and 1. Compare it to your colour chart and whammo! You now know how good or bad your application performance (in terms of response time) really is…
Apdex really is just a granular traffic light, but it also provides a single scale which traffic lights or SLA red/green status do not. So for example, one app has an NFR of 30 secs and another has one at 5 secs, apdex provides the same unit of measurement for both. So you could possibly trend for improvement/degradation of that index over time if you sampled frequently enough i.e.
app 1 was 0.85, 0.84, 0.83, 0.82 and app 2 was 0.85, 0.80, 0.79, 0.87, is a better unit of comparison than NFR/SLA secs if app1 was 30, 31, 32, 34, and app 2 was 15, 13, 10, 13. Hard to compare the latter as it is a different scale.
You can also subjectively differentiate with T, so if you see APDEX[subscript T] score of 0.85[30] you know that there is possibly a huge buffer factored into the tolerating score, perhaps reducing the effectiveness of the index. I prefer to set satisfying = NFR, then tolerating to 1.5 x NFR.
Another technique I like to apply is to use the apdex index score in a risk management matrix, so I can use that as a weighting factor i.e traditional risk score / adpex index = revised risk score.
So low apdex scores are duly considered in a so-what decision about risk.
Plus in generating an apdex you get information on the count that was satisfied, tolerating, or failing, which audiences such as Project Managers can better understand. People really want to know how badly they failed a NFR/SLA and apdex is user friendly in this regard.
So how do you use Apdex with LoadRunner (or JMeter, or Grinder etc)?
Pretty simple actually. Export the response times from your measured transactions into a spreadsheet, then with a little array formula magic, spit out an automated report for inclusion in your test summaries.
I am attaching an example spreadsheet in Excel 2002 format, as this is probably the most common denominator in your work environment. For the ultra snazzy client, I prefer 2007 with its new ability to do multiple COUNTIFS (multiple criteria) and snappy looking traffic lights. You be the judge.

Download: apdex excel spreadsheet template
If you’re feeling generous at the end of all this thinking “thank God I don’t have to go through the pain of figuring this out in Excel” then feel free to:


Thanks for that Tim. Highly useful and informative.
Hi Tim,
Excellent blog entry, It’s very helpful. Just to be sure, if I use the results of Jmeter (instead of LoadRunner); Is correct replacing trans_name by label and trans_resp_time
by average?
Thanks in advance!!
Jose
Correct Jose. As long as those columns line up you should be fine.
Thanks Tim!!!
excellent stuff, very useful info !
Very nice idea/concept that I can see being very useful showing people in the real world. The next level of granularity I suppose would be to not break it down to transaction level, but to script level to show the scripts counter. Then taking it one level further to show the applications rating.
Any thoughts on this?
Yes, you could zoom out on granularity so to speak, and create it at the script level. Although this depends on how you write your scripts. I normally write multi-action scripts, so an apdex score across multiple transactions may not be as meaningful.
I think you’ve also touched on a question I’ve had with apdex, which is how do you aggregate (or average) scores, to go from transaction level up to application or system level. I’m not sure it’s wise to try and aggregate the apdex score…
I assume you’d have to take the application level from your requirements and then use the apdex score to rate the transactions within.
I assume it will be a good way to track transactions over time, and get an idea on how much better/worse the transaction is getting.
I suppose you could take the same approach as apdex, and measure how many transactions passed or nearly passed and then use that data to rate the application.
>>>> …of all my samples have a response time below 5 seconds. This is often specified as a Non Functional Requirement (NFR).
Incorrect !!!! NFR is an umbrella term used to refer any non functional requirement (not necessarily performance requirmeent). NFR would include stuff like security, usability, interoperability … basically anything that is non functional. What is functional to an application is very specific to the application itself. Will performance be a NFR for a tool like LoadRunner (where performance is a fuctional requirement for it).
While agree, in general everything mentioned in this post … I suspect by equating(saying it is often specified) performance to NFR would amount to a simplification which is not valid
Shrini
Not sure this was a discussion on NFRs Shrini.
I also mentioned a response time target is ‘often’ specified as a NFR. Subtext might read ‘amongst others’. Of course there are many things which might classify a non-functional requirement, but let’s not hang ourselves on syntax here. The discussion if any should be focused on the use of Apdex instead.