How many shoulder classifications systems are enough?
Or rather, how many shoulder classification systems are too many?
Five sounds fair?
Ten seems a little bit too much?
and 18 would be… ridiculous right? RIGHT?!
Well folks, hold onto your shoulders.
In 2011 (15 years ago) a systematic review identified 18, yes EIGHTEEN, published classification systems in the existing literature, BEFORE they went on to develop their own. So 19 then. [1]
Over the years that number has continued to rise. Has this abundance of classification systems helped to improve the lives of people with shoulder instability? No.
Have the proposed rehabilitation regimes and treatment plans provided definitive answers on the best way to help all our patients? Not quite yet, more data needed.
Do I have all the answers? No.
Am I having a go? Definitely not.
Shoulder instability is a small world. My experience so far in this space has been positive. This is because those who I’ve engaged with have been welcoming, receptive to new ideas, giving of their time and willing to engage in the conversations. (Thanks to all if I’ve not said so already!) They are often very experienced and knowledgeable clinicians all of whom are trying to do what’s best for people with shoulder instability.
It’s important to remember that a lot of the concepts critiqued here came about as a result of trying to make sense of the chaos. They were the original ideas, innovations, fledgling hypotheses and the foundations from which other concepts could develop and compare themselves. Standing on the shoulders of giants and all that.
History tells us that our initial ideas of how the world works aren’t always correct the first time around, (think flat earth theory, milk blood transfusions, and cocaine for hay fever [2]). We need to continually revisit our assumptions and understanding. It’s difficult to do this when there’s no new or reliable information.
It seems like every time I get online, someone, somewhere has published a new shoulder instability classification system, rehabilitation plan or course.
I’m not against that. This is a complicated area. Anything that brings about more clarity and certainty can only be a good thing. But we have to be honest about the state of our clinical practice, the evidence that drives it, and whether these growing contributions are actually helping or even making things worse. Is the information really ‘new’? Is it fit for purpose?
To help answer this, in this article I want to put forward three main points:
* Existing classification systems are conceptually useful but practically useless.
* Research in this area is thin, not well joined up and does not follow the life of the patient.
* We seem to be repeating the same mistakes and it’s time to try something different.
Point 1 is the longest. If you make it through that, you’re on the home straight.
Why is this important?
There is a phenomenon in artificial intelligence (AI) known as iterative degradation or quality decay. You may have seen funny examples of it on the internet where ChatGPT is asked to create exact replicas of a picture over and over again. The final result ends up worse than the original (Figure 1). The cause of the problem comes from the fact that every time a ‘new updated version’ is fed back into the model, small errors and noisy data get larger.
Figure 1. ChatGPT is asked to make exact replicas of a picture of Dwayne ‘The Rock’ Johnson 101 times. Source: https://www.reddit.com/r/ChatGPT/comments/1kbj71z/i_tried_the_create_the_exact_replica_of_this/
Without new or accurate information to correct itself against, the system just gradually becomes a parody of its former self. Can you see where I am going with this?
At the patient level, this potentially results in them not getting the best care. Misdiagnosis and wrongfully attributing things as the cause of their problems may result in delays or withholding of needed investigations or treatments. It may also result in unnecessary or ineffective treatments being used, all of which make the overall healthcare burden on people larger and longer.
At the therapist level it makes entering the world of shoulder instability seem overly complex. As an exercise, I always ask myself, if I were a student trying to learn about this afresh, and all I had available to me was the published evidence, would I arrive at the same conclusions as someone else?
Is the data or evidence clear enough?
If I had to pick, which classification system or treatment plan would I use and why?
How much of current practice is a product of tacit knowledge, departmental or institutional norms? More recently social media, course and masterclass conference presentations are playing a larger role and so we need to make sure we can help people discern the truth.
So here’s my attempt at doing that.
1. Existing classification systems are conceptually useful but practically useless.
The purpose of classification is to identify distinct groups to which people or things can be assigned on the basis of some predetermined measures or characteristics. For example in Figure 2 you could classify Lego blocks by their colour or their shape.
Figure 2. Lego blocks arranged by colour
Photo by Mourizal Zativa on Unsplash
These features (shape and colour) are practical, cheap and easy to observe (measure). You could argue that shape would be a more robust method for classifying Lego blocks. This is because misclassification may occur by people who are colour blind as in Figure 3. The main point here however, is that any object could be classified in multiple ways depending on the measure or feature, or combination of measures and features used.
Figure 3. Examples of changes in colour depending on type of colour blindness
Image taken from https://midtownvision.com/blog-posts/types-color-blindness
Classification systems are therefore only as good as the measures on which they are based. If the measures or features aren’t reliable, are subject to interpretation or can’t be quantified, it may result in misclassification (like in the picture above).
We also need to consider, did we make the classification systems and then assign measures we think should best go with them (top down)? Or, did we let the data and its features objectively (and without bias) tell us if there are actually distinct or overlapping groups (bottom up)? My feeling is that in healthcare we too often do the former.
Another thing to consider is what purpose does the classification system we are using serve? Is it a conceptual and theoretical model to help us make sense of an incomplete understanding of physiological or biomechanical processes? Or, is it used as a definitive treatment algorithm based on objectively quantified measures or tests?
Are we guilty of confusing the two?
Do we erroneously give equal weight to both when we shouldn’t?
So what is actually needed for a classification system to be any good?
* The end treatment, investigation or management plan has to be different.
* If all people irrespective of category get the same investigations or treatments, is the classification system serving a purpose?
* It has to put people in distinct groups i.e. it has to be able to discriminate.
* What if a person can simultaneously exist across all possible categories, each of which has a different investigation or treatment endpoint? Does the classification system have enough discriminatory ability to be considered suitable for practice?
* It needs reliable and accurate measurements.
* People need to agree on the way these measurements are used to combine or group people.
So do existing shoulder instability clinical classification systems work? Not really, especially when compared against imaging or surgery which is usually considered the ‘gold standard’.
Here are some quick stats:
* Moroder et al, 2020 [3] reported that multidirectional instability was over diagnosed. 10 to 20% of patients had bony changes in their shoulder despite the cause of their shoulder problem being a ‘muscle co-ordination issue’.
* Jaggi et al, 2023 [4] found that 10% of participants were later found to be unsuitable for the study after an arthroscopy showed no capsulolabral damage or a bony injury.
* Clarke et al, 2024 [5] - summarised nicely by Adrian Davies [6] also shows an approximate error rate of 20% as well when it comes to confirming the direction of instability (in rugby players).
Note: I have also purposefully mixed ‘traumatic and atraumatic’ instability and go interchangeably at times between them. The misclassification problem is a reason for this. When I refer to shoulder instability I am talking about the less obvious cases, although the concept could be extended to any subgroup.
Now granted, there are some barn door obvious cases which arguably don’t need a classification system e.g. some traumatic dislocations accompanied by imaging. But as a clinician I’m usually more interested in the ones we don’t get right or are unclear, rather than the ones that are obvious. It appears we are getting it wrong for about 1 in every 10 (if you want to be optimistic) or 1 in 5 patients. Is that good enough?
Why is this happening? What is it that existing classification systems aren’t telling us? To help illustrate some of the points I’m going to use the Stanmore Triangle.
The Stanmore Triangle (reproduced below) frames shoulder instability as a dynamic or shifting condition, existing between 3 poles and 5 states (structural, non-structural, traumatic, atraumatic and muscle patterning - can you see why I said this may be overly complex?). The idea is that people can move along these poles and between states with varying levels of each state (basically along the lines of the triangle and across to other corners or poles).
Reproduced from Lewis, A., Kitamura, T., & Bayley, J. I. L. (2004). (ii) The classification of shoulder instability: new light through old windows!. Current Orthopaedics, 18(2), 97-108. https://doi.org/10.1016/j.cuor.2004.04.002
Seems like a handy framework right? But what happens when we get into the nuts and the bolts of it? Can it work practically? Does it do all of those things that are needed for a classification system to be any good? Does it put people in mutually exclusive categories? If the management options are different, but you can’t clearly identify who needs what accurately then what purpose does it serve?
It took me a while to recognise where else I had seen this model before. It’s the same one used in theology to depict the Holy Trinity and it’s been keeping theologians busy for thousands of years as they depict (without committing heresy) an entity that is simultaneously all things at once and yet three distinctly different things at the same time. Somehow we’ve decided it’s a good model for shoulder instability. I’ve also seen people use triangle models to describe states of selective emotions e.g. excitement, nervousness and cynicism for an upcoming event. A triangle model in this case also makes sense. Having a combination of different emotions and transitioning between them regularly seems a conceptually helpful way to illustrate how someone feels.
So it seems triangle models are helpful for concepts but not necessarily clinical decision making. At the end of the day we are therapists not theologians, practitioners not philosophers.
There’s a few more things we need to consider. Irrespective of the shape (although I have seen someone propose the cube of shoulder instability to try to resolve this), the measures on which the classification is based are arguably the most important. For example, what does muscle patterning actually mean? Are we talking about a co-ordination issue? How would we detect this? What if the muscle patterning is a consequence of changes to the underlying structure rather than the cause. As things stand there’s not enough high quality evidence (based on state of the art 3D and surface electromyography measures) to differentiate between movement patterns that could be considered:
* normal variation within a population
* different but an adaptation for maintaining stability based on changes to the joint from instability and
* different and is resulting / causing instability.
If we can’t yet make this distinction with equipment that allows us to go back and look at the different joints and muscles at a pace we can comprehend and revisit, how then can we do this in real time, with our eyeballs during clinic?
There are of course other classification systems available. The Frequency Etiology, Direction and Severity (FEDS) system is a helpful way of standardising the description of someone’s instability. Again, it seems pretty straightforward on the surface, however, when you look a bit more closely there are a lot of ways (30 different combinations) that people can be classified. It also doesn’t really tell us about the primary mechanism. Etiology is discussed in relation to being traumatic or atraumatic but with no reference to what role bony, muscular or soft tissue structures may play.
A lot of the models also do not explicitly mention what role psychosocial factors play. As a result it seems people will borrow categories or attributes from other models creating a sort of Frankenstein’s monster classification system that may have negative downstream effects. For example, as a precursor to the Stanmore Triangle, in 1979 Rockwood identified that the presence of psychiatric problems was relevant for people who could voluntarily sublux their shoulder in type 3 instabilities. Whilst not explicitly stated in the Stanmore Triangle this historical association has continued to permeate into practice. The end result? Some of our research that used hypothetical patients to see how physiotherapists make decisions for shoulder instability patients identified that 1) female patients were more likely to have negative psychosocial factors attributed to them (despite having none stated in the cases) and 2) male participants were more likely to be offered investigations sooner [7].
Stop here and consider the practicalities and real world implications of this.
How can we practically and objectively identify people with genuine psychosocial and psychiatric factors that are contributing to their shoulder instability? I mean what does contributing in this sense even mean? What is the mechanism by which these will impact the stability of someone’s joint beyond the pre-existing constraints of their underlying bony morphology, soft tissue structures and muscle architecture? The risk here is guilt by association. The mere presence of non-traumatic instability makes you look for psychosocial factors more intentionally than you would in other forms of shoulder instability.
If you were constantly worried about your shoulder coming out, or worse, it did keep coming out, your mood or mental health might be affected too. It’s also likely to be sore and limit what you can do which doesn’t help either. Next thing you know some therapist has decided the thing that’s most likely contributing to your shoulder instability is some form of life stress based on your gender although they don’t actually tell you this. We need to be careful about how and who we attach labels to, especially in the absence of any data we can point to and say “look here it is!” Classification systems that inform decision making are meant to provide objectivity in decision making by at least providing a check on our personal biases. They definitely aren’t meant to reinforce them.
So what’s the summary of all this? It seems existing models are incomplete. They can’t seem to identify and describe in unambiguous terms all of the important information required to accurately describe and classify all forms of shoulder instability. This may not be such a big problem if we used shoulder instability classification systems conceptually rather than practically.
But why are we in this cycle of inventing more and more classification systems? Well I’m glad you asked, the reason for this I believe is…
2. Research in this area is thin, not well joined up and does not follow the life of the patient.
Photo by engin akyurt on Unsplash
So what does the research tell us from a rehab perspective? Based on some (of the better) studies we know that:
* In adults with atraumatic surgery there is not much difference in outcomes between diagnostic arthroscopy (placebo) or actual arthroscopic capsular shift surgery [4].
* Existing uncertainties and a lack of robust evidence means that clinical recommendations were centred around expert opinion in arthroscopic shoulder stabilisation surgery for traumatic shoulder instability [8].
* In adults (general public, typically non-athletic sample) with a first time traumatic dislocation, additional/multiple sessions of physiotherapy were not superior to a single session of advice, supporting materials and the option to self refer to physio - The ARTISAN trial [9].
* High-load strengthening exercises are more effective than low-load strengthening in mainly adult females with hypermobile shoulders although results were variable [10].
* In adults with multidirectional instability the Watson multidirectional instability programme seems to perform better then the Rockwood Instability Programme [11].
* There is very little, if any, information available for young people with shoulder instability.
* Even in the basic science/mechanism space, I was surprised when writing up our study [12] that all of the previous research on muscle activity and movement differences were predominantly adults with the odd young person. Our dataset is probably one of the youngest recorded.
There are other studies that provide helpful information such as the Derby Instability Programme [13] and other similar single group, longitudinal studies. But remember when we want to know if something actually works or is better than something we are already doing, it’s usually well-designed and appropriately powered RCTs that are needed. Although there are other ways of evaluating effectiveness that I’ll discuss later.
Consider then how does the existing evidence inform our decision on what to do with a young person who has multi-directional/atraumatic shoulder instability or an overlay of traumatic and atraumatic? Where are the RCTs for them to inform our decision making?
The truth is getting funding in this area is difficult. Compared to other health conditions e.g. cancers, hypertension or heart attacks, shoulder instability as a proportion affects a small percentage of the population. It therefore doesn’t always tick the ‘value for money’ box funders are after. It also means getting enough people through the doors to answer the study question is more challenging.
It’s also partly a hidden problem. Estimating the true incidences for some of the more complex shoulder instability subgroups is challenging due to a lack of robust evidence (a recurring theme in this field). No one really knows how big the problem is and the complications are often delayed. Funders also usually want to fund studies that answer a question in a reasonable amount of time.
What’s not quantifiable in the data but very important to consider is that people with more mixed pictures/complex/multidirectional instability:
* Usually present as children or adolescents. They may have several episodes of instability and do not present until it starts impacting on them and their parents’ or carers’ daily lives in a significant way.
* Likely that the true size of the problem e.g. number of instability episodes and duration of problem is under reported.
* May be reliant on their parents or carers for communicating this information given their age.
* Their voice in the research agenda is therefore not prominent. They also likely have other life priorities or issues to sort out (remember how complex being a teenager was AND then your arm keeps popping out).
* Can spend a long time in the healthcare system going between services, despite being a relatively small proportion of service users.
* They can be complex cases for clinicians.
* They may see multiple clinicians and be referred between multiple services (high healthcare burden for them and us) which,
* may not result in joint up care. Throughout their life how many healthcare providers might someone with recurrent instability see (involves telling your story to lots of different people again and again)?
* Overall a worse experience of healthcare (having to wait for formal diagnosis and feeling like you are being passed between services. Consider also the potential for being labelled as having ‘psychosocial’ problems).
Unlike surgical interventions there’s not a culture / governance requirement to record and report outcomes. The end result is we can’t actually evaluate what we are doing to put a number to the problem.
There’s also a funding gap which I find a bit mind-boggling. We know that shoulder arthritis is a big problem. We know that in traumatic shoulder instability you are between 10 to 19 times more likely to get shoulder arthritis later on.
What we don’t know is how likely arthritis is to develop in someone with atraumatic shoulder instability who is experiencing multiple episodes of instability again and again.
There are charities/funders set up to address arthritis.
There are charities/funders set up to address health issues in children.
Both are at opposite ends of the spectrum, meaning those with the more complex and under-researched shoulder instability presentations fall between this gap.
If arthritis is such a big problem why aren’t we focusing on those most at risk to try and prevent it (if possible)? If funders are serious about addressing underserved groups or ensuring fairness though then surely this is one of the groups who should be considered?
So in essence the reason there’s limited new and rigorous evidence is that it’s pretty hard to get the big definitive studies off the ground for the reasons stated. Often if there isn’t a clear pathway to a big clinical trial that answers a specific clinical question, it can make it a bit more challenging to get the initial funding that’s required to develop these projects. Research studies therefore tend to focus on small development or pilot work which doesn’t always progress on. The end result is lots of research, not all of which can be used to inform our decision making.
Doing good research in this area is not impossible, just really really difficult. There’s also a circular logic problem. A lack of evidence makes getting funding to do more research difficult… which is needed to produce more evidence. The problem here seems to be…
3. We seem to be repeating the same mistakes and it’s time to try something different.
Photo by Collab Media on Unsplash
Albert Einstein is famous for saying “insanity is continuing to make shoulder instability classification systems and treatment plans in the absence of new data or RCTs” , or something like that.
Given that we’ve gone beyond 19 instability classification systems and the rehab evidence base is likely less than 10 papers it’s probably time we did something different.
The way I see it there are two possible options moving forward:
* This blog inspires a drastic change in how shoulder instability work is funded leading to more definitive trials (I wish), or
* We need to try something different, innovating and working within the systems we’ve got.
I think a fundamental step in this is improving how we measure people with shoulder instability, for both diagnosis and over their lifespan. Whilst RCTs are an excellent methodology for demonstrating effectiveness, they aren’t always as well suited (practically) to less common conditions or subgroups.
That doesn’t stop us from trying to answer some of our ongoing clinical questions using alternative methods, as long as we understand the implications of these. Remember in some cases we won’t be starting from a very high bar e.g. just expert opinion. There are some innovative studies going on in this space which provide hope:
* Moroder et al, 2020 [3] used fluoroscopy to classify and confirm different types of shoulder instability. I agree we can’t use that on everyone but it’s a helpful starting point from which other technologies or solutions may develop.
* In other rare diseases or complex movement disorders, things like 3D movement analysis and electromyography are used to help inform decision making. They also tend to have really good datasets that follow the patient throughout their life. Gillette Children’s Hospital is an excellent example of this.
You might say the technology is not there and it’s too complicated for practice. The truth is the technology has been there for quite a long time and is already used for making decisions about rehab and surgery. These arguments were around in the early days of human movement analysis itself and likely CT (just run that past me again… you want me to buy a £100,000+ machine that emits ionising radiation whilst spinning at 120 to 240 revolutions per minute and you have to lie in the middle of it?). However, as we were able to demonstrate how improved diagnosis improved outcomes for people, these new ways of diagnosing people were introduced and made standard in clinical pathways.
If we can improve our diagnosis alongside improved recording of outcomes that follow people throughout their life, we can then start to make positive steps to seeing not only how big the problem is, but what seems to change outcomes. This is always the starting point for further, well-designed definitive trials. In a world of digital health, secure data environments and AI, this should be easier than ever. However the risk is these just become really good systems that store and regurgitate really rubbish information. We need to try to avoid iterative degradation.
So what needs to happen next? My hope is that if nothing else, we stop and take stock of where we are at. The answer to too much unusable research is not more unusable research. The answer to too much complexity is not more complexity. We need to go back to basics. If we are to continue to keep rising while standing on the shoulders of giants, it’s important the foundations are right.
References
[1] Kuhn, J. E., Helmer, T. T., & Dunn, W. R. (2011). Development and reliability testing of the frequency, etiology, direction, and severity (FEDS) system for classifying glenohumeral instability. Journal of shoulder and elbow surgery, 20(4), 548-556.
[2] INGALS EF. COCAINE IN HAY FEVER. JAMA. 1886;VI(8):206. doi:10.1001/jama.1886.04250020066006
[3] Moroder, P., Danzinger, V., Maziak, N., Plachel, F., Pauly, S., Scheibel, M., & Minkus, M. (2020). Characteristics of functional shoulder instability. Journal of shoulder and elbow surgery, 29(1), 68-78.
[4] Jaggi, A., Herbert, R. D., Alexander, S., Majed, A., Butt, D., Higgs, D., ... & Ginn, K. A. (2023). Arthroscopic capsular shift surgery in patients with atraumatic shoulder joint instability: a randomised, placebo-controlled trial. British Journal of Sports Medicine, 57(23), 1484-1489.
[5] Clarke CJ, Torrance E, Gibson J, Brownson P, Funk L. Diagnosing the direction of shoulder instability in rugby players. Shoulder Elbow. 2024 Feb;16(1):33-37. doi: 10.1177/17585732221092025. Epub 2022 Mar 31. PMID: 38435041; PMCID: PMC10902408.
[6] Adrian Davies - Glenohumeral instability under the spotlight: Clinical findings vs. Intra-operative diagnosis, Adrians Shoulder Blog ttps://open.substack.com/pub/atdshoulder/p/glenohumeral-instability-under-the?r=37fnk4&utm_campaign=post&utm_medium=web
[6] Philp, F., Faux-Nightingale, A., Woolley, S., de Quincey, E., & Pandyan, A. (2022). Evaluating the clinical decision making of physiotherapists in the assessment and management of paediatric shoulder instability. Physiotherapy, 115, 46-57.
[7 Wong, C., Jaggi, A., Willmore, E., Maher, N., Bateman, M., O’Sullivan, J., ... & Chester, R. (2025). Critical evidence synthesis on rehabilitation following arthroscopic shoulder stabilisation surgery for traumatic anterior instability: consensus recommendations for clinical practice and research–commissioned by the British Elbow & Shoulder Society. British Journal of Sports Medicine.]
[8] Kearney, R. S., Ellard, D. R., Parsons, H., Haque, A., Mason, J., Nwankwo, H., ... & Underwood, M. (2024). Acute rehabilitation following traumatic anterior shoulder dislocation (ARTISAN): pragmatic, multicentre, randomised controlled trial. bmj, 384.
[9] Liaghat, B., Skou, S. T., Søndergaard, J., Boyle, E., Søgaard, K., & Juul-Kristensen, B. (2022). Short-term effectiveness of high-load compared with low-load strengthening exercise on self-reported function in patients with hypermobile shoulders: a randomised controlled trial. British Journal of Sports Medicine, 56(22), 1269-1276.
[10] Warby, S. A., Ford, J. J., Hahne, A. J., Watson, L., Balster, S., Lenssen, R., & Pizzari, T. (2018). Comparison of 2 exercise rehabilitation programs for multidirectional instability of the glenohumeral joint: a randomized controlled trial. The American journal of sports medicine, 46(1), 87-97.
[11] Seyres M, Postans N, Freeman R, Pandyan A, Chadwick EK, Philp F. Children and adolescents with all forms of shoulder instability demonstrate differences in their movement and muscle activity patterns when compared to age- and sex-matched controls. J Shoulder Elbow Surg. 2024 Sep;33(9):e478-e491. doi: 10.1016/j.jse.2024.01.043. Epub 2024 Mar 10. PMID: 38467183.
[13] Bateman, M., Osborne, S. E., & Smith, B. E. (2019). Physiotherapy treatment for atraumatic recurrent shoulder instability: updated results of the Derby Shoulder Instability Rehabilitation Programme. Journal of arthroscopy and joint surgery, 6(1), 35-41.
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit mskmag.substack.com/subscribe