
髋关节骨关节炎严重程度:Tönnis分级
髋关节骨关节炎严重程度:Tönnis分级
作者:Boris Kovalenko, Prashoban Bremjit, Navin Fernando.
作者单位: University of Washington, Seattle, WA, USA.
译者:陶可(北京大学人民医院骨关节科)


历史
髋骨关节炎是影响老年人的最常见疾病之一,除了带来巨大的社会经济负担外,一直被列为功能障碍的最常见原因之一。世界卫生组织发布的报告表明,60岁以上的男性和女性中约有10%和18%患有症状性骨关节炎。功能残疾的程度可能存在很大差异,但据估计,大约80%的骨关节炎患者的运动受到一定程度的限制,其中多达三分之一可被视为“严重残疾”。随着时间的推移,这个问题的发病率只会增加,因为据估计,到2050年,60岁以上人口的比例将增加两倍。Ackerman等使用来自5个国家的登记数据审查了男性和女性的终生全髋关节置换术(THA)风险,发现到2013年,挪威女性的终生风险高达七分之一,芬兰男性的终生风险高达十分之一;在审查中,女性终生全髋关节置换术(THA)风险始终较高。评估髋骨关节炎的流行病学研究表明,该疾病的患病率存在显着的地区差异,亚洲人和非洲人的患病率分别为1.2% 和2.8%,北美人和欧洲人的患病率分别为7.2% 和20.1%。
已经提出了许多髋骨关节炎的分类系统,包括Tönnis分类、Croft分类以及Kellgren-Lawrence分类。
Tönnis分类最初源于德国多特蒙德Dietrich Tönnis教授及其同事于1972年发表的一系列研究文章。这些研究的目的是开发一种定量方法来区分正常和发育不良青少年的髋关节。在题为“髋关节X线学评估的新方法 - 髋关节因素”的文章中,作者评估了817例成人髋关节X线片(患者年龄21-50岁,不包括接受过髋关节手术的患者),以开发一种定量测量方法评估髋关节发育不良,即“髋关节因素”。作为分析的一部分,作者根据标准前后位(AP)骨盆X线片的评估将患者分为3个不同的骨关节炎等级,这成为现在以Tönnis的名字命名的分类方案的基础。
目的
可以出于多种原因为特定病理创建新的分类系统,例如为了提高提供者和/或研究人员之间的沟通的有效性或者甚至为了规定病理的管理。Tönnis的分类最初由他和他的同事创建,用于研究目的,作为成人退行性放射学变化严重程度的定性分级,后来被Tönnis用于将股骨/髋臼前倾角与关节病的严重程度相关联。在这些文章中,没有提及根据当时已经发布的其他系统选择创建一个新系统。
作为基于骨盆前后位(AP)平片的定性分析,与需要高级成像或定量测量的方案相比,Tönnis分类具有易于应用于临床环境的优点。这些特征使得该方案在日常临床实践中作为帮助细分手术管理的潜在工具具有吸引力。这值得探索Tönnis分类的可靠性以及指导手术管理的可能性。
Tönnis分类的描述
Tönnis分类,最初由Busse等于1972年描述,由髋部3个渐进程度的退行性变化组成;后来Tönnis和Heinecke于1999年重新发表,并添加了0级,即髋部无关节病。1级表示关节间隙轻微变窄,关节边缘轻微骨赘形成,股骨头或髋臼轻微硬化;2级表示存在小骨囊肿,关节间隙进一步变窄,股骨头球形度中度丧失;3级是最严重的,表示囊肿大、关节间隙严重狭窄、股骨头严重畸形和股骨头缺血性坏死(表1)。
表1. 髋骨关节炎Tönnis分级量表

除了Tönnis分类之外,还有许多其他详细描述的骨关节炎分类方案;这些包括Croft、Kellgren-Lawrence、国际膝关节文献委员会、Fairbank、Altman等和Ahlback发布的方案。在上述分类中,只有Kellgren-Lawrence和Croft分类适用于髋骨关节炎,其余分类标准仅涉及膝关节骨关节炎。
Kellgren-Lawrence量表是一个从0到4的5级分级量表;0表示没有关节间隙狭窄或反应性变化(没有骨关节炎[OA]);1表示可疑的关节间隙变窄,可能伴有轻微骨赘(可疑OA);2表示明确的骨赘,可能伴有关节间隙变窄(轻度OA);3表示中度骨赘,伴有明确的关节间隙狭窄、部分硬化和可能的骨畸形(中度OA);4表示进展为大骨赘、严重硬化和明确的骨端畸形(严重OA)。Croft等设计了从0到5的6分评分标准;0表示无放射学异常;1表示仅骨赘增生;2表示仅关节间隙变窄;3表示存在下列中的两种:骨赘病、关节间隙变窄、囊肿的存在和软骨下硬化;4表示上述标准中的三个;5表示进展为股骨头畸形。
验证和可靠性
Tönnis分类被关节矫形外科医生、关节镜外科医生、风湿科医生、放射科医生和物理治疗师广泛使用。尽管Tönnis分类得到广泛使用,但其实用性一直存在争议,因为有关其可靠性的数据相互矛盾。
在2008年的一项研究中,Steppacher等纳入了63名大约20年前接受过伯尔尼髋臼周围截骨术的患者。使用两名矫形外科医生在两个不同的场合对50个骨盆X线片进行分级,评估了Tönnis分类的有效性。他们使用标准化的Landis和Koch一致性基准,发现了显着的观察者间可靠性(κ = 0.74)和观察者内可靠性(κ = 0.73-0.76)。
Clohisy等使用77名患有股骨髋臼撞击、髋关节发育不良或无髋部疼痛的患者评估了可靠性,发现可靠性略低于Steppacher等的结果。具有中等的观察者间可靠性(κ = 0.59)和观察者内可靠性(κ = 0.60)。他们的研究使用了5名髋关节专家和一名研究员来解释这些图像。作者将其报告中与Steppacher相比较低的一致性归因于纳入了对照组。
相比之下,对Tönnis分类的严格审查使用了3位骨科医生对61名患者的髋部X线片按Tönnis分级进行分类,分为两组(一组包括髋关节保留手术的患者,而对照组由没有髋部疼痛的患者组成)。这项研究发现,观察者间可靠性(κ = 0.173-0.397)和观察者内可靠性(κ = 0.364-0.397)仅为少许至一般。本研究中最常见的分歧原因涉及区分0级和1级髋关节。
Nepple等还在一项研究中重点关注患有髋部疼痛的年轻患者,评估了70名接受髋关节保留手术的患者的发育不良和OA的25个放射学参数。四位髋关节专家对射线照片进行了解读。患者平均年龄为31岁,70名患者中有55名患有股骨髋臼撞击症,其余被诊断为髋臼发育不良。他们发现通过Tönnis分类对患者进行分级,仅具有中等的观察者间可靠性(κ = 0.22)和中等的观察者内可靠性(κ = 0.53)。就像Valera等一样,本研究中的放射线照片严重偏重于早期骨关节炎,只有4.5%的患者被评为2级或3级髋骨关节炎。相比之下,关节间隙宽度被发现具有显着的观察者间和观察者内可靠性(κ = 0.62 和 0.71)。
Carlisle等的另一项可靠性分析,发现使用5名不同学科和培训水平的医生来评估45名患有髋关节发育不良、股骨髋臼撞击或正常解剖结构的患者。一名骨科主治医师、两名理疗科主治医师、一名骨科研究员和两名骨科住院医师解读这些图像。他们发现Tönnis分级的观察者间仅有轻微一致性(κ = 0.17),但观察者内的再现性中等(κ = 0.57)。
Troelsen等的一项研究中,4名观察员(医学生、骨科住院医师、骨科主治医师和放射科主治医师)审查了25张盆腔X线片及其相关的CT扫描,并根据Tönnis分类对其进行分级。他们发现观察者间的可靠性较差,κ值范围为-0.02至0.33;他们发现,当Tönnis分级分为0至1级和2至 级(κ 值 0.20-0.39)时,观察者间的一致性增加,并且平片上关节间隙宽度<2 mm是OA更可靠的标志(κ 值 0.40-0.46)。
局限性
Tönnis分类系统的开发基于以下研究:
特别集中在髋关节,通过扩展,它最直接地适用于通过纤维软骨(髋臼盂唇)的存在而增强的球形(即球窝)滑膜关节。由于其他关节关节具有不同的解剖结构,并且在负重和运动中发挥不同的作用,因此Tönnis分级系统无法广泛应用于所有关节。
对分类的研究似乎表明其较低等级的有效性较低,因为低等级OA数量较多的研究表明Tönnis分类的有效性较低。这在Tönnis分类的一个常见应用中尤其重要,该分类旨在帮助指导决定患者是否适合进行髋关节保留手术,而众所周知,即使是患有轻度关节炎的患者,保留髋关节手术的效果也较差。此外,由于Tönnis分类完全依赖于影像学结果,因此显然没有考虑可能对髋关节保留手术成功发挥作用的其他变量(例如关节软骨健康、三维股骨髋臼解剖结构、盂唇完整性等),根据Tönnis的标准,对这些因素的充分评估通常需要先进的成像技术。
最终,对Tönnis分类系统的一个主要批评是它是主观的。Tönnis分类因其术语不明确以及参数重叠而受到批评。分类中使用了5种主要的放射学检查结果:是否存在硬化、关节间隙宽度、股骨头部球形度、囊肿大小和骨赘形成,后者被描述为“关节边缘骨赘”。这些参数都不包含定量定义。例如,Tönnis 2级关节炎包括“小”软骨下囊肿,3级定义为“大”囊肿,但原始文章中没有指定大小。
此外,如果一张X线片上出现两个不同等级的结果,Tönnis的原始文章无法帮助决定使用哪个等级。例如,如果患者股骨头球形度中度丧失(Tönnis 2级发现),同时关节仅有轻微狭窄(1 级发现),例如早期Legg-Calvé-Perthes患者可能会出现这种疾病情况——Tönnis分类尚不清楚该患者是否患有1级或2级骨关节炎。相反,分类假设逐步进展,其中根据所描述的分类,所有参数在放射照相上均等地随时间恶化,但情况并非总是如此。正如Valera等所指出的,虽然股骨头的球形度是Tönnis使用的更可靠、可重复的参数之一,但它很容易导致非退行性凸轮畸形,且无OA影像学表现的患者分级混乱。
结论
尽管Tönnis分类有局限性,但它仍然是一个简单的系统,可以对常见的髋关节放射成像进行定性描述,并继续在临床实践和研究中频繁使用。它依赖于前后位AP骨盆X线片的视觉评估,不需要额外的时间或资源来对X线片进行数字或手动测量。
该分类的主观性以及对其可靠性缺乏共识(特别是在髋关节炎的早期阶段,希望区分0级和1级),因此很难推荐其广泛使用,特别是考虑到包括Kellgren-Lawrence和Croft系统在内的替代分类方案已在研究中被证明是更可靠的措施。
在Reijman等的可靠性研究中,Kellgren-Lawrence系统比Croft系统(κ = 0.52)具有更高的观察者间可靠性(κ = 0.68),并且证明与髋关节OA的临床症状有更强的相关性,并且可以预测最终需要进行髋关节置换术。同样,其他研究尤其证明了对Kellgren-Lawrence方案有效性的支持。
多项研究发现,接受髋关节保留手术的Tönnis分级较高的患者报告的结果评分较差,并且更有可能过早转为THA。这表明,尽管存在可靠性问题,但在适当的情况下,Tönnis分类作为沟通、预测和研究的工具可能会取得良好的效果。
我们认为,Tönnis 分类是对髋关节OA直接定性描述,但在研究环境中的实用性有限。它的可靠性并未始终表现出优于其他分类系统的优势。最终,如果没有更有力的证据支持其可靠性或有效性,它不能推荐作为常规指导手术管理和治疗选择的工具。
Classifications in Brief: Tönnis Classification of Hip Osteoarthritis.
History
Hip osteoarthritis represents one of the most prevalent diseases affecting older adults, consistently ranking as one of the most common causes of functional disability in addition to carrying an immense socioeconomic burden [20]. Published reports from the World Health Organization indicate that approximately 10% of men and 18% of women older than 60 years of age have symptomatic osteoarthritis [29]. The level of functional disability can be highly variable, but it is estimated that approximately 80% of those with osteoarthritis have some limitation in movement and up to one-third of them can be considered “severely disabled.” It is an issue that will only increase in incidence over time, because it is estimated that the proportion of people older than 60 years of age will triple by 2050 [29] Ackerman et al. [1] reviewed the lifetime risk of THA in males and females using registry data from five countries, finding the lifetime risk to be as high as one in seven women in Norway and one in 10 men in Finland by 2013; women consistently had higher lifetime risks of THA in their review. Epidemiologic studies evaluating hip osteoarthritis have demonstrated marked regional differences in the prevalence of the disease with Asians and Africans having a prevalence of 1.2% and 2.8% and North Americans and Europeans having a prevalence of 7.2% and 20.1%, respectively [12, 14].
Numerous classification systems for hip osteoarthritis have been proposed, including the Tönnis classification, Croft classification [13] as well as the Kellgren-Lawrence classification previously reviewed in this section [17].
The Tönnis classification originally rose from a series of research articles published in 1972 by Professor Dietrich Tönnis and his colleagues in Dortmund, Germany [5, 6]. The aim of these studies was to develop a quantitative method to differentiate between normal and dysplastic juvenile hips. In their article entitled “A New Method for Roentgenologic Evaluation of the Hip Joint—the Hip Factor,” the authors assessed 817 adult hip radiographs (patient age 21-50 years, excluding patients who had undergone hip surgery) to develop a quantitative measurement for assessing hip dysplasia, ie, “the hip factor.” As part of the analysis, the authors grouped the patients into three separate grades of osteoarthritis based on the evaluation of a standard AP pelvis radiograph, which became the foundation of the classification scheme now bearing Tönnis’ name [6].
Purpose
Creating a new classification system for a particular pathology may be done for a number of reasons such as to improve effectiveness of communication among providers and/or researchers or even to dictate management of the pathology. Tönnis’ classification was initially created by him and his colleagues for the purposes of research to serve as a qualitative grade for severity of degenerative radiographic changes in adults [6] and was later used by Tönnis to correlate femoral/acetabular anteversion to severity of arthrosis [25]. In these articles, nothing is said regarding the choice to create a new system in light of other systems having already been published at that time [16].
As a qualitative analysis based off of a plain AP radiograph of the pelvis, the Tönnis classification offers the advantage of easy application to a clinic setting when compared with schemes requiring advanced imaging or quantitative measurements. Such characteristics make the scheme appealing for use in daily clinical practice as a potential tool to help subdivide surgical management. This warrants exploration into the reliability, and moreover the possibility in guiding management, of the Tönnis classification.
Description of the Tönnis Classification
The Tönnis classification, as originally described in 1972 by Busse et al. [6], consists of three progressive degrees of degenerative changes to the hip; it was later republished by Tönnis and Heinecke in 1999 [25] with the addition of a Grade 0, or hip absent of arthrosis. Grade 1 indicates slight narrowing of the joint space, slight lipping at the joint margin, and slight sclerosis of the femoral head or acetabulum; Grade 2 indicates the presence of small bony cysts, further narrowing of the joint space, and moderate loss of femoral head sphericity; Grade 3 is the most severe and indicates large cysts, severe narrowing of the joint space, severe femoral head deformity, and avascular necrosis (Table (Table 1).
Table 1.Tönnis grading scale of hip osteoarthritis
In addition to Tönnis’ classification, there have been a number of other well-described classification schemes for osteoarthritis; these include schemes published by Croft [13], Kellgren-Lawrence [16], the International Knee Documentation Committee [30], Fairbank [15], Altman et al. [3], and Ahlbäck [2]. Of the aforementioned classifications, only the Kellgren-Lawrence and Croft classifications are applicable to the hip with the remainder describing the knee specifically. The Kellgren-Lawrence scale is a 5-point grading scale from 0 to 4;0 indicates no joint space narrowing nor reactive changes (no osteoarthritis [OA]); 1 indicates doubtful joint space narrowing with possible lipping osteophytes (doubtful OA); 2 indicates definite osteophytes with possible joint space narrowing (mild OA); 3 indicates moderate osteophytes with definite joint space narrowing, some sclerosis, and possible bony deformity (moderate OA); and 4 indicates progression to large osteophytes, severe sclerosis, and definite bone end deformity (severe OA). Croft et al. [13] devised a 6-point grading scale from 0 to 5; 0 indicates no radiographic abnormalities; 1 indicates only osteophytosis; 2 indicates joint space narrowing only; 3 indicates the presence of two out of the following: osteophytosis, joint space narrowing, presence of cysts, and subchondral sclerosis; 4 indicates three of the aforementioned criteria; and 5 indicates progression to femoral head deformity.
Validation and Reliability
The Tönnis classification is widely utilized by arthroplasty surgeons, arthroscopic surgeons, rheumatologists, radiologists, and physical therapists [4, 9, 10, 24]. Despite its widespread use, the utility of the Tönnis classification has been a point of contention because of conflicting data regarding its reliability [7, 11, 18, 19, 23, 27, 28].
In a 2008 study that included 63 patients who underwent a Bernese periacetabular osteotomy approximately 20 years prior, Steppacher et al. [23] evaluated the validity of the Tönnis classification using two orthopaedic surgeons to grade 50 pelvic radiographs on two separate occasions. They found substantial interobserver (κ = 0.74) and intraobserver reliability (κ = 0.73-0.76) using the standardized Landis and Koch benchmarks for agreement [18, 23].
Clohisy et al. [11] evaluated reliability using 77 patients with femoroacetabular impingement, developmental dysplasia of the hip, or no hip pain and found slightly lower reliability than that of Steppacher et al. with moderate interobserver reliability (κ = 0.59) and intraobserver reliability (κ = 0.60). Their study used five hip specialists as well as a fellow to interpret the images. The authors attributed the lower agreement in their report compared with that of Steppacher to the inclusion of a control group.
By contrast, a critical review of the Tönnis classification used three orthopaedic surgeons to classify the hip radiographs of 61 patients by Tönnis grade, divided into two cohorts (one included candidates for hip preservation surgery, whereas the control group consisted of patients without hip pain). This study found only slight to fair interobserver reliability (κ = 0.173-0.397) and fair intraobserver reliability (κ = 0.364-0.397). The most frequent cause of disagreement in this study involved differentiating Grade 0 from Grade 1 hips [28].
Nepple et al. [19] also focused on younger patients with hip pain in a study evaluating 25 radiologic parameters of dysplasia and OA in 70 patients undergoing hip preservation surgery. Four hip specialists interpreted the radiographs. The average patient age was 31 years, and 55 of the 70 patients had femoroacetabular impingement with the remainder diagnosed with acetabular dysplasia. They found grading patients by the Tönnis classification had only fair interobserver reliability (κ = 0.22) and moderate intraobserver reliability (κ = 0.53). Like Valera et al. [28], the radiographs in this study were weighted heavily toward early arthritis with only 4.5% of patients graded as either Grade 2 or 3. In contrast, joint space width was found to have substantial inter- and intraobserver reliability (κ = 0.62 and 0.71, respectively) [19].
Another reliability analysis by Carlisle et al. used five physicians of various disciplines and levels of training to evaluate 45 patients with developmental dysplasia of the hip, femoroacetabular impingement, or normal anatomy. One orthopaedic attending, two physiatry attendings, one orthopaedic fellow, and two orthopaedic residents interpreted the images. They found only slight interobserver agreement for Tönnis grade (κ = 0.17) but moderate intraobserver reproducibility (κ = 0.57) [7].
In a study by Troelsen et al. [26], four observers (medical student, orthopaedic resident, orthopaedic attending, and radiology attending) reviewed 25 pelvic radiographs and subsequently their associated CT scans and graded them according to Tönnis’ classification. They found poor interobserver reliability with κ values ranging from -0.02 to 0.33; they found that interobserver agreement increased when Tönnis grade was dichotomized to Grades 0 to 1 and 2 to 3 (κ values 0.20-0.39) and that joint space width < 2 mm on plain radiographs was a more reliable marker of OA (κ values 0.40-0.46) [27].
Limitations
The development of the Tönnis classification system was based on studies that specifically focused on the hip. By extension, it most directly applies to a spheroidal (ie, ball-and-socket) synovial joint that is reinforced by the presence of a fibrocartilaginous lip (the acetabular labrum). Because other diarthrodial joints have different anatomy, and different roles in weightbearing and motion, the Tönnis grading system cannot be applied broadly to all joints.
Studies on the classification seem to suggest less validity in its lower grades because studies with higher numbers of low-grade OA demonstrate lower validity for the Tönnis classification [11]. This is especially important in one common application of the Tönnis classification, which is to help guide the decision of whether a patient may be a good candidate for hip preservation surgery, which is known to be less effective in patients who have even mild arthritis [31]. In addition, because the Tönnis classification relies exclusively on radiographic findings, other variables that may play a role in the success of hip preservation surgery (such as articular cartilage health, three-dimensional femoroacetabular anatomy, labral integrity, among others) obviously are not considered under the Tönnis rubric, and adequate evaluation of those factors generally calls for advanced imaging.
Ultimately, a major criticism of the Tönnis classification system is that it is subjective. The Tönnis classification has been criticized as being unclear in its terminology as well as for its failure of overlapping parameters [28]. Five major radiographic findings are used in the classification: presence of sclerosis, joint space width, head sphericity, cyst size, and osteophyte formation, the latter of which is described as “lipping at the joint margins.” None of these parameters includes quantitative definitions. For example, Tönnis Grade 2 arthritis includes “small” subchondral cysts, and Grade 3 is defined by “large” cysts, but there are no sizes designated in the original article.
In addition, Tönnis’ original article does not help the user decide which grade to use if findings from two different grades are present on one radiograph. For example, if a patient has moderate loss of sphericity of the femoral head (a Tönnis Grade 2 finding) alongside only slight narrowing of the joint (a Grade 1 finding)—such as might occur in a patient with early Legg-Calvé-Perthes disease—the Tönnis classification is unclear about whether this would be a patient with Grade 1 or Grade 2 arthritis. Instead, the classification assumes a stepwise progression in which all parameters radiographically worsen over time equally in accordance with the described classification, which is not always the case. As noted by Valera et al. [28], although sphericity of the femoral head is one of the more reliably reproducible parameters used by Tönnis, it could easily lead to confusion in grading in patients with nondegenerative cam deformity and no radiographic findings of OA.
Conclusions
Although the Tönnis classification has limitations, it remains a simple system that provides a qualitative description of commonly obtained radiographic imaging of the hip and continues to be frequently used in clinical practice as well as research. It relies on the visual assessment of an AP pelvis radiograph and does not require additional time or resources to make digital or manual measurements of the radiographs.
The subjective nature of the classification and the lack of consensus on its reliability (particularly at early stages of hip arthritis, where the user wishes to distinguish Grade 0 from Grade 1 [11, 19, 28]) make it difficult to recommend its widespread use, particularly given that alternative classification schemes including the Kellgren-Lawrence and Croft systems have been demonstrated in studies to be more reliable measures.
In a reliability study by Reijman et al. [22], the Kellgren-Lawrence system had higher interobserver reliability (κ = 0.68) than Croft’s (κ = 0.52) and demonstrated both a stronger association with clinical symptoms of hip OA as well as being predictive of the eventual need for hip replacement. Similarly, additional studies have demonstrated support for the validity of the Kellgren-Lawrence scheme in particular [21].
Several studies have found that patients with higher grade Tönnis grades who undergo hip preservation surgery have poorer patient-reported outcome scores and are more likely to undergo premature conversion to THA [8]. This suggests that despite the problems of reliability, the Tönnis classification may be used to good effect as a tool of communication, prognosis, and research in the right circumstances.
In our opinion, the Tönnis classification is a straightforward qualitative description of a stepwise pattern of hip OA but has limited utility in the research setting. Its reliability has not consistently demonstrated superiority over other classification systems. Ultimately, without stronger evidence supporting its reliability or validity, it cannot be recommended as a tool with which to routinely guide management and treatment options.
文献出处:Boris Kovalenko, Prashoban Bremjit, Navin Fernando. Classifications in Brief: Tönnis Classification of Hip Osteoarthritis. Clin Orthop Relat Res. 2018 Aug;476(8):1680-1684. doi: 10.1097/01.blo.0000534679.75870.5f.
本文是陶可版权所有,未经授权请勿转载。本文仅供健康科普使用,不能做为诊断、治疗的依据,请谨慎参阅
评论