RAREA GRAPH
This post shows how to prepare an rarea graph in STATA.
The section below will simulate data for our rarea graph. The data will show the percentage of patients with various number of disorders, plotted against age categories. Note that the data for the graph needs to be cumulated in two ways. First, percentages of patients with any disorder number are cumulated from the maximum number to lower number of disorders. For instance, percentage of patients with three disorders is calculated as percentage of patients with 3 or 4 disorders, percentage of patients with two disorders is calculated as percentage of patients with 2, 3, or 4 disorders, and so on. Second, the data is also cumulated across age categories so that higher age categories show percentage of patients with certain number of disorders in that age category as well as all younger categories. For instance, percentage of patients in age category 19-28 shows percentage of patients in that category but also includes percentage of patients in category below 18 years of age. Understanding the data for rarea graph is crucial for plotting your data successfully. For further elaboration on data preparation please get in touch.
clear
set obs 6
* assume we have six age categoris
gen age_cat = _n
label define age_cat_label 1 "<18" 2 "19-28" 3 "29-38" 4 "39-48" 5 "48-59" 6 "60+"
label values age_cat age_cat_label
* generate random variables mocking number of disorders in each age group where the number of disorders increase with age, yet there's still a large fraction of patients with 0 disorders
forvalues i = 0/4 {
g _`i'_disorders = int(runiform(1,`i'*3))
}
replace _0_disorders = int(runiform(5,10))
* convert all patient counts to percentages of total number of patients
egen age_cat_total = rowtotal(_*)
summarize age_cat_total
scalar total =r(sum)
drop age_cat_total
forvalues i = 1/4 {
replace _`i'_disorders = round(_`i'_disorders / total,0.001)
}
* cumulate across disorder count (from the right side) so e.g. variable _3_disorders shows percentage of patients with 3 and 4 disorders, variable _2_disorders shows percentage of patients with 2, 3 and 4 disorders and so on...
forv i = 1/4 {
egen row_cumul_`i' = rowtotal(_`i'_disorders - _4_disorders)
}
* cumulate across age (from younger to older) so e.g. age group 19-28 shows percentage of patients in age group 19-28 and younger
forvalues i = 1/4 {
gen col_cumul_`i' = sum(row_cumul_`i')
}
* multiply by a 100 to get percentages
forvalues i = 1/4 {
replace col_cumul_`i' = col_cumul_`i' * 100
}
* given the cumulation explained above, the cumulative percent of patients with 0 disorders will always be equal to 100
gen col_cumul_0 = 100
gen _zero = 0
Next, we prepare our rarea graph, with minimal use of options at this point.
twoway (rarea _zero col_cumul_0 age_cat) ///
(rarea _zero col_cumul_1 age_cat) ///
(rarea _zero col_cumul_2 age_cat) ///
(rarea _zero col_cumul_3 age_cat) ///
(rarea _zero col_cumul_4 age_cat)
Now let’s make this a nice looking rarea graph of publication quality.
twoway (rarea _zero col_cumul_0 age_cat, fcolor(dknavy) fintensity(5) lcolor(black) lwidth(vthin)) ///
(rarea _zero col_cumul_1 age_cat, fcolor(dknavy) fintensity(15) lcolor(black) lwidth(vthin)) ///
(rarea _zero col_cumul_2 age_cat, fcolor(dknavy) fintensity(25) lcolor(black) lwidth(vthin)) ///
(rarea _zero col_cumul_3 age_cat, fcolor(dknavy) fintensity(35) lcolor(black) lwidth(vthin)) ///
(rarea _zero col_cumul_4 age_cat, fcolor(dknavy) fintensity(45) lcolor(black) lwidth(vthin)) ///
, ///
xlabel(, angle(forty_five) valuelabel notick) ///
ylabel(, nogrid notick) ///
graphregion(fcolor(white)) ///
plotregion(margin(zero)) ///
legend(order(1 "0 disorder(s)" 2 "1 disorder(s)" 3 "2 disorder(s)" 4 "3 disorder(s)" 5 "4 disorder(s)") position(11) ring(0) col(1) nobox size(small) ///
region(lcolor(none) fcolor(none)) bmargin(3 0 0 3)) ///
xtitle(Age groups (years)) ///
ytitle(Patients (%)) ///
title("Percentage of Patients Cumulated Over # of Disorders" "by Age Groups", size(msmall))