Study selection
After carrying out the database searches, a total of 1714 publications were identified (Figure 1). After duplicate items, newspaper articles and commentaries were removed 1285 items remained. From screening the abstracts 933 articles were excluded. Two reviewers screened the full texts of the remaining 351 articles using exclusion criteria and quality assessment and excluded 206. Of those remaining, 56 were used for background information only, leaving 89 studies. A further 9 were excluded as they were already included in systematic reviews included in this review. The total included studies were 9 Shiatsu and 71 acupressure publications.
Details of included studies are presented in Additional file 1, grouped by health condition. Just under one third (27.5%) were graded A (highest quality), 42.5% graded B and 26.3% C (lowest quality) (3 studies were ungraded); this grading refers to the contribution the study made to the evidence, which took into account study design, rigour and reporting.
Shiatsu
Only 9 Shiatsu studies were of sufficient quality to be included in the review. These comprised 1 randomised controlled trial (RCT), three controlled non-randomised, one within-subjects trial, one observational study and 3 uncontrolled studies. These studies investigated quite separate health issues, did not use comparable methodology and data could not be pooled due to their heterogeneity. Subjects were chronic stress, schizophrenia, promoting well-being and critical health literacy, angina, low back and shoulder pain, fibromyalgia, chemotherapy side effects/anxiety and inducing labour. They are grouped by methodology and discussed below.
One RCT was identified (integrated care, which included Shiatsu), for back and neck pain [19]. No significant effects, compared to standard care were identified. The study used a fairly large sample (n = 80) but was underpowered to detect any statistically significant effects.
Three studies compared two or more treatments with non-random group allocation, rather by preference [20], participants in another study [21] or staff on duty [22]. Lucini et al [20] evaluated Shiatsu for chronic stress; 70 volunteer patients chose either active (relaxation and breathing training), passive (Shiatsu) or sham treatment (stress management information). Small sample, limited the validity of results. Although the design accounted for patient preference, results were confounded by more stressed patients choosing sham. Ingram [22] compared Shiatsu to no intervention for post-term pregnancy in 142 women. The Shiatsu group was significantly more likely to labour spontaneously than the control (p = 0.038) and had a longer labour (p = 0.03), but groups were allocated according to which midwife was on duty (although groups were homogenous for maternal age, parity and delivery details). Ballegaard et al [21] conducted a study of cost-effectiveness and efficacy of Shiatsu for angina pectoris. Sixty-nine consecutive patients were treated and compared with those from a separate trial of two invasive treatments for angina[23]. Incidence of death/myocardial infarction (MI) was 7% in this sample, compared to 21% and 15% in the comparison group with no significant difference in pain relief. Additionally a cost-saving of $12000 per patient was estimated. The groups were from different countries (USA and Denmark), additionally 56% of the participants would have been excluded from the one of the comparison groups. It also used a convenience and unpowered sample and no blinding.
One study used a within-subjects repeated measures design, comparing Watsu (water Shiatsu) with Aix massage for fibromyalgia syndrome [24]. A significant improvement was seen after treatment with Watsu (p = 0.01) for SF-36 subscales of physical function, bodily pain, vitality and social function, but not for Aix. The repeated measures design with counterbalancing should reduce carryover effects although order effects may have occurred due to high dropout. In addition it used a volunteer sample.
Three studies had no separate control group, using a single group pretest-posttest design[25–27], limiting the validity of results. Lichtenberg et al's [27] pilot study of Shiatsu for schizophrenia showed significant improvements on scales relating to illness, psychopathy, anxiety, depression and others (p values ranged from 0.0015 to 0.0192). Brady et al [26] tested Shiatsu for lower back pain in 66 volunteers. Pain and anxiety significantly decreased after treatment (p < 0.001), which did not change when demographic variables were controlled for. Iida et al [25] investigated the relaxation effects of Shiatsu on anxiety and other side effects in 9 patients receiving cancer chemotherapy. The small and self-selected samples and lack of control group in these studies limits the quality and generalisablity of the results. In addition 13 of Brady et al's [26] participants had previously received Shiatsu
Long (2008) conducted a prospective observational study of 948 patients of Shiatsu practitioners in 3 different countries[7]. Significant improvement in symptoms, especially for tension or stress and structural problems (effect size 0.66 to 0.77) were demonstrated. This study is of greater quality than other Shiatsu studies as the sample size was powered and it used a longitudinal and pragmatic study design. For a longitudinal observational design, this study had a good response rate (67% of patients on average returned all questionnaires). Recruitment of patients was through practitioners, who received a rigorous training and kept a recruitment log. Confounding factors are reported and outcomes were accurately measured. However, data on non-respondents or those who refused to participate were not reported so evaluation of response bias is problematic.
Sundberg et al [19] and Ballegaard [21] used a pragmatic design - Shiatsu as part of an integrated model of healthcare or with other interventions (acupuncture and lifestyle adjustment). This reflects normal practice but specific effects of Shiatsu cannot be isolated.
There was insufficient evidence both in quantity and quality on Shiatsu in order to provide consensus for any specific health condition or symptom.
Acupressure
Of a total of 71 included studies described as giving acupressure as an intervention, 2 were meta-analyses, 6 systematic reviews, 39 RCTs, five crossover trials, 5 within-subjects trials, 5 controlled non-randomised, 7 uncontrolled trials and 1 prospective study. These are summarised by health condition below.
Pain
Pain was the most common issue addressed by acupressure studies and covered a range of topics. This included a systematic review, six RCTs with control groups and random assignment; 2 with non-randomised control groups or within-subject controls, and the remainder either did not have a control or random assignment. Overall, the evidence for the efficacy of acupressure for pain is fairly strong and can be graded as category 1 evidence. Although some studies had methodological flaws, studies consistently show that acupressure is more effective than control for reducing pain, namely dysmenorrhoea (acupressure at SP6) [9, 28–30], lower back pain [31–33] and labour pain [34, 35]. The evidence for minor trauma [36, 37] and injection pain [38, 39] is less conclusive and the evidence for headache is insufficient [40]. Each pain condition is discussed below.
Dysmenorrhoea
Of 4 papers for dysmenorrhoea, 1 was a systematic review 2 were RCTs, and one non equivalent control group. All studied school or university students, with sample sizes ranging from 30 to 216. Two used acupressure on SP6, The other used a combination of points. Both of the RCTs [28, 30] compared acupressure to rest, which does not control for the placebo effect. Jun et al [29] compared acupressure to light touch, potentially controlling for non-specific effects but used sequential allocation which may create bias, although groups were homogenous in baseline demographics and dysmenorrhoea factors. All studies found a significant reduction in pain. Studies were generally good quality, with low attrition rates and validated measures (usually VAS). Only including students may limit generalisability and create Hawthorne bias. Acupressure procedure was generally well-reported; all studies reported 12 or 13 STRICTA items.
Labour pain
Two of the three studies of acupressure for labour pain were RCTs [34, 35]. They both compared acupressure to touch, thus controlling for the effect of human touch; Chung et al [34] additionally had a conversation only control group. The third was a one group uncontrolled study [41]. Two studies usedLI4 [34, 41]; Chung et al [34] additionally used BL67; Lee et al used SP6 [35]. All studies found acupressure significantly reduced pain,
Back and neck pain
Four studies on back or neck pain were identified, all RCTs and conducted by two groups of researchers, Hsieh et al [31, 32] and Yip and Tse [33, 42]. Hsieh et al unusually used a pragmatic design of four weeks of individualised acupressure compared to physical therapy. They also used powered samples, blinding where possible, valid outcome measures and intention to treat analysis to protect against attrition bias. A no treatment group was not included, limiting assessment of specific effects. Yip and Tse also compared acupressure to usual care, although an acupressure protocol was used. They also had powered sample sizes but no blinding. Comparison groups of aromatherapy and electroacupuncture, limit specific effects of acupressure. All four studies showed a significant reduction in pain.
Minor trauma
Two double-blind RCTs evaluated acupressure for minor trauma pain during ambulance transport [36, 37]. Both used sham acupressure as a control, with Kober et al [36] additionally comparing to no treatment. Both studies showed significant reductions in pain, anxiety and heart rate. Limitations include fairly small sample and lack of no-treatment control.
Injection pain
Two studies evaluated acupressure for pain of injection [38, 39]. Both studies showed reduction in pain but both were subject to limitations - Arai et al [39] only included 22 subjects although it was powered and randomised, with a sham treatment; Alavi et al's [38] trial was larger and randomised, but used a within-subjects crossover design which can create practice bias.
Headache
Only one study investigated headache [40], comparing a course of 8 sessions of acupressure to medication, which reduced pain. Although this used an RCT design, power calculation, intention-to-treat analysis, blinding and long follow up, there is very little detail on intervention (only 7 STRICTA items), randomisation, recruitment or limitations.
Dental pain
One RCT for dental pain [43] compared acupressure at LI4 to medication or sham acupressure, showing reduction in pain 4 and 24 hours after the first orthodontic treatment but not after second treatment. Although an RCT and well reported, only 23 patients completed the study, despite a power calculation specifying a sample of 156.
Nausea & vomiting
Nausea and vomiting (N&V) was the second most commonly investigated health issue. The evidence was somewhat inconsistent and varied with type of nausea investigated. Post-operative nausea had strongest evidence, graded as Category 1 evidence mainly due to a Cochrane systematic review and update [8, 44] and a meta-analysis [45]. The two systematic reviews [46, 47] of chemotherapy-induced N&V give additional quality evidence, although little is true acupressure. Little reliable evidence is added by the RCT [48]. The three studies of acupressure for nausea in pregnancy are of variable quality. Although one has a small sample and uncontrolled study design [49], a well conducted RCT [50]and meta analysis [51] provide Category 2 evidence for nausea in pregnancy.
Post-operative
A Cochrane review [44] (update of a previous review [8]) and meta-analysis [45] indicate the extensive evidence for acupressure in treating postoperative N&V. All the studies in the review and the majority in the meta-analysis used acupoint PC6. The review concluded that acupressure reduced the risk of both N&V compared to sham, and reduced the risk of nausea but not vomiting compared to antiemetic medication. The meta-analysis concluded that all modalities of acupoint stimulation reduced postoperative N&V compared to control, and were as effective as medication. Both reviews were very high quality with comprehensive search terms and pooling of data.
Chemotherapy
Acustimulation, including acupressure, for nausea as a side-effect of chemotherapy also has been reported in a Cochrane review [46], as well as an RCT published subsequently [48] and a non-randomised trial [52]. Chao et al [47] also covered N&V as part of their review of adverse effects of breast cancer treatment.
The Cochrane review identified 11 trials and pooled data demonstrated significantly reduced vomiting but not nausea [46]. It was very good quality, with intention-to-treat analysis of pooled data and controlling for duplicate and language bias.
The RCT (n = 160)[48] was based on a pilot [53] included in the Cochrane review. It found significant reductions in delayed N&V but not acute N&V, results facilitated by the unusually long follow-up period. The main limitations are the lack of sample size calculation (despite conducting a pilot study) and patients breaking the blind.
The non randomised study [52] of self-acupressure on PC6 compared to anti-emesis medication found significant reductions in severity of N&V, duration of nausea and frequency of vomiting compared to control. However, these results are limited by a small and convenience sample.
Pregnancy
Three studies investigated N&V in pregnancy: one RCT [50]; one uncontrolled study [49] and one meta-analysis [51]. All used acupressure on PC6 (neiguan).
As concluded by the meta-analysis [51], the RCT found improvements compared to sham or control. Shin et al's RCT [50] is excellent quality with double-blinding, powered sample size, objective and subjective outcomes and good reporting. Markose et al [49] also found improvements in nausea, vomiting and retching, but due to lack of control group, small sample, high attrition and poor reporting the evidence is limited.
The meta-analysis included studies on all forms of acustimulation and was generally well conducted, although it did not attempt to find unpublished material and only 3 databases were used.
Renal disease
Five papers (based on four RCTs) investigated the use of acupressure for symptoms of renal disease. Due to limitations, repeated in all studies due to the common research team, evidence is category 2. Three compared acupressure to sham points/electrical stimulation and to usual care [54–56], the fourth to usual care only [57]. The studies used different points for different symptoms, including fatigue [55, 57], depression [56, 57] and sleep [54, 56]. All studies showed improvements compared to control but also found improvements in the sham/electrical stimulation group compared to control, suggesting that the effects of acupressure on these symptoms are non-specific. Sample sizes were between 62 (powered) and 106 and had low attrition rates. One study used blinding [54], the others may have been subject to placebo or observer bias. Between 9 and 15 STRICTA items were reported and interventions and outcome measures were validated.
Sleep and alertness
Five studies investigated acupressure for sleep in elderly long term care facilities [58–62], and one investigated alertness in the classroom [63]. Evidence for improving sleep quality in institutionalised elderly is consistent from a number of high quality studies and is category 1. Four of the sleep studies were RCTs [59–62], an additional single-group pilot study of only 13 people contributes little to the evidence base [29]. The four RCTs all used different acupoints. Two compared acupressure to sham points and control (conversation [62]or routine care [60]) but only one found significant improvements in sleep for acupressure compared to sham [62], giving limited evidence for specific effects. Three of the studies had powered and randomly selected samples (between 44 and 246) [60, 62], validated procedure [62], intention-to-treat analysis or triple blinding [60].
The one study on alertness in the classroom [63]was a crossover study, randomly assigning 39 students to either stimulation-relaxation-relaxation or relaxation-stimulation-stimulation. Compared to relaxation, stimulation acupressure improved alertness. Although students were blinded, the majority correctly discerned the treatment. This did not significantly affect the results, although it raised p to 0.0484. Potential Hawthorne effect, small sample size (39) and low generalizability reduce the quality. Crossover design should reduce effects of retesting, carryover or time-related effects, although practise effect may be present (especially with self-report).
Mental health
Five studies investigated mental health, specifically dementia [64, 65] and stress or anxiety [66–68]. The quality was very variable, with two pilot studies with sample sizes of 12 and 31 [64, 68], a small one group study of 25 women [67] and two larger RCTs [65, 69]. Category 2 evidence was present for anxiety related to surgery, although this was compared to sham only[69]. Fairly good evidence existed for agitation in dementia compared to control, although generalisability was limited by small sample size, lack of control and high attrition[65]. Evidence for reducing stress, anxiety and heart rate and thus enhancing spontaneous labour is promising, but limited by lack of control and a small, volunteer sample [67].
Chronic respiratory conditions
Six studies on respiratory conditions were identified, chronic obstructive pulmonary disease (COPD)[70–73], chronic obstructive asthma [74] and bronchiectasis [75]. Overall, the evidence is Category 2, as studies were well designed but had a number of methodological flaws. Study designs included two controlled trials using randomised blocking design, matching groups for demographic and clinical factors [71, 72]; one crossover design [70]; two pilot RCTs [74, 75] and an RCT [73]. Results showed improvements in dyspnoea and decathexis compared to sham, although limited by high attrition, poor blinding and a small sample [70]. The pilot studies (with the same authors) showed improved quality of life for asthma patients [74] and sputum and respiratory scores for bronchiectasis compared to control [75], but are limited by small sample sizes, high dropout and lack of blinding. The matched studies [71, 72] provided high quality evidence for improvements in dyspnoea and related outcomes, with valid and reliable interventions and outcome measures, and blocking design giving more powerful treatment effects for small samples.
Anaesthesia/consciousness
Three studies investigated the effects of acupressure on levels of anaesthesia or consciousness. These levels include the acoustic evoked potential (AEP), changes in which reflect the depth of anaesthesia and transition from awake to anaesthetised [76]; bispectral index (BIS) and spectral edge frequency (SEF) which are measures of the level of consciousness during anaesthesia/sedation [77, 78]. Overall, the evidence is Category 3 as only three studies were identified, all had repeated measures designs and small sample sizes (between 15 and 25), although one was powered [68, 76–78]. Patients acting as their own controls in these studies can cause practice and carryover effects, although reduced by counterbalancing/randomising of treatment order. However, lack of control group and lack of details on sample selection limit the evidence.
Stroke
Three studies investigated acupressure for stroke [79–81]. All three were RCTs; Shin and Lee [80] used a blocked randomised design comparing acupressure to acupressure plus aromatherapy, Kang et al [81] randomised to acupressure or control groups; McFadden and Hernandez [79] used a crossover design comparing acupressure to control. Although studies used good designs and results suggested significant improvements in pain[80], motor power [80], limb function [81], daily living[81], depression [81], and heart rate [79], all findings were limited by small unpowered samples and poor reporting, so evidence is rated at Category 2.
Body weight
Two randomised studies investigated the effect of acupressure on body weight, although for very different conditions - weight loss [82] and weight gain in premature babies[83]. Elder et al's [82] RCT compared 'Tapas Acupressure Technique'® (TAT)1, qi gong and control (self directed support). TAT resulted in greater weight loss than both qi gong and control. Chen et al's[83] RCT compared acupressure and meridian massage to routine care, resulting in significantly more weight gain. The weight-loss study was high quality with a large sample, design-adaptive group allocation (equivalent to randomisation, but balanced for demographic and clinical factors). The weight gain study was randomised and matched for weight and gestation age and used blinding (although details are not clear), but had a small sample size and lack of information on randomisation, allocation, drop outs, harms and ethics. The evidence for weight loss/gain is Category 2 as more studies are needed.
Visual impairment
Two non-randomised studies from China and Taiwan evaluated acupressure for schoolchildren with visual impairment [84, 85]. Both found improvements compared to control but were limited in reporting of study design and findings and did not randomise. With only 2 studies, both with significant limitations, the evidence for acupressure for improving eyesight is Category 3.
Other conditions
The remaining 11 articles on acupressure investigated distinct health conditions which could not be grouped.
A systematic review evaluated the effect of acupoint stimulation for side effects of breast cancer treatment[47]. 26 studies were identified, concluding that evidence is high quality for nausea and vomiting but weak for all other adverse effects. It was well conducted with appropriate inclusion criteria, Jadad scale for rating and two independent raters.
Ballegaard et al [86, 87] studied acupressure for angina. The 1999 study [86] was a cost benefit analysis and used non-equivalent control groups, a volunteer and convenience sample and used co-interventions of acupuncture and the self-care program. The 2004 study [87] had a good sample size although subjects were not randomised, the follow-up period was long, but no equivalent control group or blinding. Again, it was difficult to isolate the effects of acupressure from co-interventions. At baseline the sample did not significantly differ to Scandinavian heart patients. This 'quality control review', is subject to selection, expectation and social biases.
Gastrointestinal motility was studied by Chen et al [88, 89], with significant improvements demonstrated. In [88], although the intervention was well reported, randomisation is not described (although groups were homogenous for a range of variables). In [89] the sample was small and not powered and the study was single-blind, although groups were homogenous. Significant effects were observed.
A poorly reported study observed that acupressure on PC6 significantly reduced gagging in 109 dental patients [90]. The study was described as double-blind although blinding procedures were not described. Details of the sampling were not available.
In a comparison of acupressure with oxybutinin for nocturnal enuresis in children[91], the main flaw was the very small sample size, with no details of sampling, comparison of groups or randomisation, potential selection bias and no placebo/sham group.
A controlled trial of acupressure for 30 patients with peripheral arterial occlusive diseases (PAOD) reported a significant reduction in transcutaneous oximetry[92]. This is a poor quality study with an apparent lack of randomisation and non-equivalent control group, poor reporting and no comparison of groups, although outcomes are objective and intervention is well reported.
A high quality RCT of acupressure for symptoms of diabetes found improvement in Hyperlipidemia, hypertrophy and kidney function [93] Acupressure was given regularly for 3 years, an unusually long follow up period and showed improvements in hyperlipidemia, ventricular hypertrophy, kidney function and neuropathy. The sample size was appropriate (although fairly high attrition) and group allocation was random. Very good description of treatment was provided (14 STRICTA items reported) although discussion is limited.
Yao et al [94] conducted a single group study of massage combined with acupressure for 85 patients with chronic fatigue syndrome. Treatment was effective in 91.8% of cases. This study did not use any clear outcome measures, had no control, and only reported 7 STRICTA items, and given its poor reporting it is low quality.
An uncontrolled pilot study was conducted of vaginal acupressure for sexual problems[95]. This showed significant improvements in symptoms, physical health, mental health, sexual ability and quality of life. This study is severely limited by small sample, lack of control, no details of recruitment, unvalidated and subjective outcome measures and poor reporting of acupressure. In addition the intervention did not appear to be based on meridian theory.
Sugiura et al [96] conducted an uncontrolled study with 22 healthy volunteers of the effects of acupressure on yu-sen, souk-shin and shitsu-min on heart rate and brain activity. Heart rates decreased. This study investigated mechanisms rather than effectiveness.
Analysis/Summary of quality
Twenty-two of the 80 included studies were graded C (the lowest quality grading). All five of the studies in Chinese language were graded C (or ungraded), and most of the Shiatsu studies were graded C. Analysis of results over time suggests some improvement in the evidence base. Figure 2 shows an improvement in the average number of STRICTA items reported by studies, shown by the line of best fit. Figure 3 indicates a reduction in the percentage of C graded papers over time, and an increase in those graded B. Figure 4 shows the numbers of studies and numbers of studies for each A/B/C grading for the different countries. This shows no obvious trend, although countries publishing more studies (Taiwan, USA and Korea) seem to have better quality studies, compared to countries with only one or two publications. Regarding quality appraisal, in a third of papers, a third reviewer was need to reach agreement on quality grading.