#If not installed already, you need to install them using the following commands:
#(remove the # in that case)
#install.packages("tidyverse")
#install.packages("questionr")
#install.packages("Hmisc")
#install.packages("esquisse")
#install.packages("kableExtra")
#When opening a new session you need to load packages:
library(tidyverse)
library(questionr)
library(Hmisc)
library(esquisse)
library(kableExtra)
#Find the path of the repository on your computer where the data are stored
#And
setwd("/home/groups/3genquanti/SoMix/HIES for workshop")
#Load data
edu<-readRDS("edu.rds")4 A short example with HIES 2019
In this section, we provide an example of statistical analysis using data from the HIES 2019. We focus here on a short analysis on school attendance in Sri Lanka, and on the socio-demographic charactistics of individuals who attend school and of those who drop out.
In this section, you will find:
the R script that we used;
the statistical outputs that we produced (Tables and Figures);
how to read and interpret these statistical outputs;
what we conclude from these statistical analyses.
This is the kind of work you will need to do on the topic you have chosen. Of course, you will have to adapt the variables used and the kind of analysis (tables, figures, etc.) to your specific research question!
In this tutorial, we focus on the final statistical outputs that we have decided to keep. Hence, we do not show all the different steps of our statistical reasoning that we have done to explore the database and our variables (for example, using univariate statistics) nor the (many!) attempts that we have performed to eventually end up with only 3-4 outputs that we think best summarize the information we need for our research question. In particular, we do not present the unweighted-N frequency tables here, but it is always advisable to run them, as cells with very small N may require more cautious interpretation.
Keep in mind that this is only one possible analysis: other outputs could have been possible focusing on the same research question. The important aspect here is that no matter how you eventually decide to present your results, the key information needs to be clear and accessible (in this example, who attends school and who drops out, basically).
4.1 Introduction
As we explained in previous sections, we first install (and load if necessary) the different packages that we will need to perform our statistical analyses. We then load the edu.rds database that we are going to use for our research question.
Our main dependent variable is r2_school_education which refers to the third column of the table on “School Education (For persons aged from 5 to 19 years)” in the questionnaire Section 2 (p.324 of the report available on the shared Drive).
It is coded as follows: modality 1 stands for “Currently attending school”, 2 for “Never attended school” and 3 for “Attended school in the past”. Yet, in the edu.rds database, we only have these modalities as numbers (1, 2, or 3) and not what they mean. In order to make our tables and graphs easier to read, we apply the appropriate labels to each value of this variable.
This is what the following script does. Note that you can well use the function irec that we introduced in the second chapter to recode your variables. Besides, it is always good practice to check whether your recoding went well by comparing the initial and recoded variable, here: edu |> freqtable(r2_school_education,r2_school_education_rec).
#Recode the variable on school attendance
edu$r2_school_education_rec <- edu$r2_school_education |>
as.character() |>
fct_recode(
"Currently attending" = "1",
"Never attended" = "2",
"Attended school in the past" = "3"
)Similarly to what we did in the previous section on bivariate and multivariate statistics, we need to create a function that would combine the analysis of three different variables:
# Function ----------------------------------------------------------------
kbl_grouped_3way <- function(tab, digits = 1) {
# Convert to a dataframe without dropping: NA conservés
df <- as.data.frame(tab, stringsAsFactors = FALSE, drop = FALSE)
# Identify the three dimensions
dims <- names(df)[names(df) != "Freq"]
dim1 <- dims[1] # ligne
dim2 <- dims[2] # colonne
dim3 <- dims[3] # groupe
# To character
df[ dims ] <- lapply(df[ dims ], function(x) ifelse(is.na(x), NA, as.character(x)))
# Pivot wider
df_wide <- df |>
tidyr::pivot_wider(
names_from = all_of(dim2),
values_from = Freq,
values_fill = 0
)
# Keep order
grouping_var <- df_wide[[dim3]]
# Construct kable
tab_out <- df_wide |>
dplyr::select(-all_of(dim3)) |>
kbl(digits = digits)
# Group levels (including NA)
group_levels <- unique(grouping_var)
# Line index by group
groups <- split(seq_len(nrow(df_wide)),
f = ifelse(is.na(grouping_var), "NA", grouping_var))
# Pack_rows for each group in kable
for (g in names(groups)) {
rows <- groups[[g]]
tab_out <- tab_out |>
pack_rows(group = g,
start_row = min(rows),
end_row = max(rows))
}
tab_out
}4.2 School non-attendance by sex and age
We first explore how attendance to school vary by sex and age. Indeed, there are reasons to think that drop out will be more likely to happen for older students. In Sri Lanka, school is compulsory until age 14, so we can imagine that attendance will decrease only from this age. Is this was we observe in the data? And are there some differences for male and female students?
We use the following code to produce a frequency table (with weighted percentages) to see how the distribution of each category of our dependent variable (school attendance, in column) vary according to the two independent variables we just mentioned: sex and age.
# Frequency table of school attendance by age and sex ----------------------------------------------------------------
m<-edu |> freqtable(age,r2_school_education_rec,sex,weights=finalweight_25per) |>
lprop()
kbl_grouped_3way(m,digits=0)|>
kable_classic_2(full_width = FALSE)| age | Currently attending | Never attended | Attended school in the past | Total |
|---|---|---|---|---|
| Male | ||||
| 5 | 58 | 42 | 0 | 100 |
| 6 | 100 | 0 | 0 | 100 |
| 7 | 100 | 0 | 0 | 100 |
| 8 | 100 | 0 | 0 | 100 |
| 9 | 100 | 0 | 0 | 100 |
| 10 | 100 | 0 | 0 | 100 |
| 11 | 100 | 0 | 0 | 100 |
| 12 | 100 | 0 | 0 | 100 |
| 13 | 99 | 0 | 1 | 100 |
| 14 | 100 | 0 | 0 | 100 |
| 15 | 98 | 0 | 2 | 100 |
| 16 | 74 | 1 | 25 | 100 |
| 17 | 60 | 0 | 40 | 100 |
| 18 | 61 | 2 | 37 | 100 |
| 19 | 22 | 0 | 78 | 100 |
| All | 84 | 3 | 13 | 100 |
| Female | ||||
| 5 | 54 | 46 | 0 | 100 |
| 6 | 100 | 0 | 0 | 100 |
| 7 | 100 | 0 | 0 | 100 |
| 8 | 100 | 0 | 0 | 100 |
| 9 | 100 | 0 | 0 | 100 |
| 10 | 100 | 0 | 0 | 100 |
| 11 | 100 | 0 | 0 | 100 |
| 12 | 99 | 1 | 1 | 100 |
| 13 | 99 | 0 | 1 | 100 |
| 14 | 97 | 0 | 3 | 100 |
| 15 | 98 | 0 | 2 | 100 |
| 16 | 86 | 0 | 14 | 100 |
| 17 | 80 | 2 | 18 | 100 |
| 18 | 68 | 0 | 32 | 100 |
| 19 | 37 | 1 | 62 | 100 |
| All | 88 | 3 | 9 | 100 |
#Note the digits=0 argument here which means that % will be shown without any digit
#We don’t need that level of precision (especially since the data comes from a survey and
#has a margin of error), and the extra digits can obscure the message of the tableThis table shows specific patterns by age.
Until age 15, the vast majority of children are currently attending school. This is true both for male and female students. However, from age 15, drop outs increase. Among male respondents aged 15 at the time of the survey, 97.6% were currently attending school. Yet, this proportion later shrinks: they are only 74.1% among male respondents aged 16, and 21.9% for those aged 19.
Conversely, the percentage of individuals who dropped out school increases with age: it goes from only 2.4% of 15 year-old men to 77.6% for 19 year-old men.
Overall, we observe a similar pattern for women: school attendance rates are very high until age 15, and then consistently decrease with age. Interestingly, we also see some differences by sex. Drop out rates are larger for men than for women: while drop out rate among male respondents aged 19 is around 78%, this proportion only reaches 62% among female respondents.
Finally, note that the proportion of students who never attended school is very negligible.
4.3 School non-attendance and locality
Are there some differences in school attendance by place of residence? We saw in previous sections that distance to school tends to be higher in rural areas. In that context, can we expect school attendance to be larger in urban areas in comparison to rural sectors?
To investigate this question, we want to check whether the rate of school attendance is similar in urban, rural areas and estates. As we observed in the previous table, school attendance greatly varies by age, so we may want to take this variable into account when exploring the differences by locality.
For the sake of concision and to ensure that the table is not too difficult to read, we need to select the variables we want to study. Therefore, we have decided not to include the sex variable: while we observed some differences by sex, the main driver of school attendance observed in the first table was age. Further analyses could later explore the specificity of school attendance by type of locality and sex. In this same vein, we run analyses only on individuals above 14, as we saw in the previous table that school attendance does not vary for younger respondents: in that case, it is best to keep only the important information so as not to overwhelm the reader with too much detail!
# First, recode the sector (residence) variable
edu$sector_rec <- edu$sector |>
as.character() |>
fct_recode(
"Urban" = "1",
"Rural" = "2",
"Estate" = "3"
)
# Second, we keep only individuals in our database aged at least 14--------------------------
edu14<- edu |>
filter(age>=14)
# Third, frequency table of school attendance by age and locality--------------------------
m<-edu14 |> freqtable(age,r2_school_education_rec,sector_rec,weights=finalweight_25per) |>
lprop()
kbl_grouped_3way(m,digits=0)|>
kable_classic_2(full_width = FALSE)| age | Currently attending | Never attended | Attended school in the past | Total |
|---|---|---|---|---|
| Urban | ||||
| 14 | 97 | 0 | 3 | 100 |
| 15 | 98 | 0 | 2 | 100 |
| 16 | 74 | 0 | 26 | 100 |
| 17 | 77 | 0 | 23 | 100 |
| 18 | 64 | 3 | 33 | 100 |
| 19 | 30 | 0 | 70 | 100 |
| All | 72 | 0 | 28 | 100 |
| Rural | ||||
| 14 | 99 | 0 | 1 | 100 |
| 15 | 98 | 0 | 2 | 100 |
| 16 | 82 | 0 | 18 | 100 |
| 17 | 70 | 2 | 29 | 100 |
| 18 | 65 | 1 | 34 | 100 |
| 19 | 29 | 1 | 70 | 100 |
| All | 74 | 1 | 26 | 100 |
| Estate | ||||
| 14 | 100 | 0 | 0 | 100 |
| 15 | 100 | 0 | 0 | 100 |
| 16 | 79 | 0 | 21 | 100 |
| 17 | 58 | 0 | 42 | 100 |
| 18 | 57 | 0 | 43 | 100 |
| 19 | 36 | 0 | 64 | 100 |
| All | 76 | 0 | 24 | 100 |
Interestingly, there do not appear to be strong differences in school attendance status across residential locality types. If anything, urban children seem slightly more likely to be out of school than rural or estate children. This pattern is somewhat surprising, as we might have expected rural children to be more prone to dropping out due to more limited access to schools at higher grade levels.
4.4 School non-attendance and family background
School attendance is likely to be associated with family’s social background for at least two reasons.
First, cultural capital may play a role: parents with higher levels of education are often better equipped to support and value continued schooling.
Second, economic factors matter: for families with fewer resources, the opportunity cost of keeping a child in school—rather than contributing to household income or domestic work—may be higher.
Do these hypotheses ring a bell with specific sociological theories? Say, maybe Bourdieu’s theory of cultural reproduction or Boudon’s theory of opportunity (secondary effects)?
We can have a look whether parental education and household wealth matter.
For parental education, notice that for about 16 percent of children parental education is unknown (in fact, these children are not the children of the household head so in the survey we could not match them to any parent).
Let’s examine how parental education affects school drop out at each age:
m<-edu14 |> freqtable(edu_parent,r2_school_education_rec,age,weights=finalweight_25per) |>
lprop()
kbl_grouped_3way(m,digits=0)|>
kable_classic_2(full_width = FALSE)| edu_parent | Currently attending | Never attended | Attended school in the past | Total |
|---|---|---|---|---|
| 14 | ||||
| Less than primary | 91 | 0 | 9 | 100 |
| Primary | 99 | 1 | 0 | 100 |
| Junior secondary | 99 | 0 | 1 | 100 |
| Senior secondary | 100 | 0 | 0 | 100 |
| Collegiate | 100 | 0 | 0 | 100 |
| Tertiary | 100 | 0 | 0 | 100 |
| NA | 95 | 0 | 5 | 100 |
| All | 98 | 0 | 2 | 100 |
| 15 | ||||
| Less than primary | 82 | 0 | 18 | 100 |
| Primary | 96 | 0 | 4 | 100 |
| Junior secondary | 99 | 0 | 1 | 100 |
| Senior secondary | 100 | 0 | 0 | 100 |
| Collegiate | 100 | 0 | 0 | 100 |
| Tertiary | 100 | 0 | 0 | 100 |
| NA | 98 | 0 | 2 | 100 |
| All | 98 | 0 | 2 | 100 |
| 16 | ||||
| Less than primary | 66 | 0 | 34 | 100 |
| Primary | 77 | 0 | 23 | 100 |
| Junior secondary | 83 | 0 | 17 | 100 |
| Senior secondary | 89 | 0 | 11 | 100 |
| Collegiate | 70 | 3 | 28 | 100 |
| Tertiary | 95 | 0 | 5 | 100 |
| NA | 79 | 0 | 21 | 100 |
| All | 80 | 0 | 19 | 100 |
| 17 | ||||
| Less than primary | 52 | 8 | 40 | 100 |
| Primary | 64 | 3 | 33 | 100 |
| Junior secondary | 66 | 1 | 33 | 100 |
| Senior secondary | 88 | 1 | 11 | 100 |
| Collegiate | 88 | 0 | 12 | 100 |
| Tertiary | 100 | 0 | 0 | 100 |
| NA | 45 | 0 | 55 | 100 |
| All | 71 | 1 | 28 | 100 |
| 18 | ||||
| Less than primary | 17 | 0 | 83 | 100 |
| Primary | 40 | 0 | 60 | 100 |
| Junior secondary | 59 | 1 | 41 | 100 |
| Senior secondary | 93 | 0 | 7 | 100 |
| Collegiate | 88 | 3 | 9 | 100 |
| Tertiary | 100 | 0 | 0 | 100 |
| NA | 58 | 3 | 39 | 100 |
| All | 64 | 1 | 34 | 100 |
| 19 | ||||
| Less than primary | 10 | 0 | 90 | 100 |
| Primary | 25 | 0 | 75 | 100 |
| Junior secondary | 24 | 1 | 75 | 100 |
| Senior secondary | 31 | 0 | 69 | 100 |
| Collegiate | 41 | 1 | 58 | 100 |
| Tertiary | 60 | 0 | 40 | 100 |
| NA | 27 | 0 | 73 | 100 |
| All | 29 | 1 | 70 | 100 |
Below age 16, the share of children not attending school is much higher among those whose parents have less than a primary education; for children of more educated parents, non-attendance is virtually nonexistent.
From age 16 onward, the overall pattern follows a clear gradient: children with lower-educated parents are more likely to leave school, while those with highly educated parents are more likely to remain enrolled.
It is also noteworthy that children whose parents’ education is unknown show above-average levels of non-attendance. This group may have a distinct background—for instance, some may not be the household head’s own children and might be growing up in large joint families or without their parents.
m<-edu14 |> freqtable(hhwealthcat,r2_school_education_rec,age,weights=finalweight_25per) |>
lprop()
kbl_grouped_3way(m,digits=0)|>
kable_classic_2(full_width = FALSE)| hhwealthcat | Currently attending | Never attended | Attended school in the past | Total |
|---|---|---|---|---|
| 14 | ||||
| Poorest | 95 | 1 | 4 | 100 |
| Poor | 100 | 0 | 0 | 100 |
| Middle | 95 | 0 | 5 | 100 |
| Rich | 100 | 0 | 0 | 100 |
| Richest | 100 | 0 | 0 | 100 |
| All | 98 | 0 | 2 | 100 |
| 15 | ||||
| Poorest | 96 | 0 | 4 | 100 |
| Poor | 100 | 0 | 0 | 100 |
| Middle | 93 | 0 | 7 | 100 |
| Rich | 100 | 0 | 0 | 100 |
| Richest | 100 | 0 | 0 | 100 |
| All | 98 | 0 | 2 | 100 |
| 16 | ||||
| Poorest | 80 | 0 | 20 | 100 |
| Poor | 84 | 0 | 16 | 100 |
| Middle | 75 | 1 | 23 | 100 |
| Rich | 86 | 0 | 14 | 100 |
| Richest | 77 | 0 | 23 | 100 |
| All | 80 | 0 | 19 | 100 |
| 17 | ||||
| Poorest | 50 | 6 | 44 | 100 |
| Poor | 64 | 1 | 35 | 100 |
| Middle | 71 | 0 | 29 | 100 |
| Rich | 81 | 1 | 19 | 100 |
| Richest | 83 | 0 | 17 | 100 |
| All | 71 | 1 | 28 | 100 |
| 18 | ||||
| Poorest | 47 | 0 | 53 | 100 |
| Poor | 63 | 2 | 35 | 100 |
| Middle | 61 | 1 | 38 | 100 |
| Rich | 72 | 2 | 27 | 100 |
| Richest | 80 | 0 | 20 | 100 |
| All | 64 | 1 | 34 | 100 |
| 19 | ||||
| Poorest | 30 | 0 | 70 | 100 |
| Poor | 25 | 1 | 74 | 100 |
| Middle | 26 | 0 | 74 | 100 |
| Rich | 27 | 1 | 71 | 100 |
| Richest | 39 | 0 | 61 | 100 |
| All | 29 | 1 | 70 | 100 |
Wealth differences become more pronounced in late adolescence:
At ages 14–15, past attendance remains very low across all categories, indicating that early dropout is rare regardless of household wealth.
From age 16 onward, however, the proportion of children who have left school rises steadily, especially among poorer households.
By ages 17–18, children in the poorest group show the highest levels of past attendance (44–53%), while those in the richest households are much less likely to have exited school.
At age 19, past attendance becomes the majority status for all groups, though it remains somewhat lower among the richest, highlighting that wealth increasingly shapes school persistence as children grow older.
4.5 Reasons for dropping out
We can dig in a bit deeper on the students who attended school in the past by checking the reasons for leaving school. Again, we need to recode the categories of this variable called r2_school_education.
edu14notcurrently<-edu14 |> filter(r2_school_education_rec=="Attended school in the past")
edu14notcurrently$reason_leave_school_rec <- edu14notcurrently$reason_leave_school |>
as.character() |>
fct_recode(
"Further schooling not available or too far away" = "1",
"Financial problems" = "2",
"House keeping / Family business" = "3",
"Disability" = "4",
"Illness" = "5",
"Not willing / poor academic progress" = "6",
"Pending results (GCE)" = "7",
"Complete GCE / Grade 13" = "8",
"Engaged in an economic activity" = "9",
"Other" = "99"
)
edu14notcurrently |> freqtable(reason_leave_school_rec,weights=finalweight_25per) |>
freq() |> select(-n) |>
kbl(digits=0) |> kable_classic(full_width = F)| % | val% | |
|---|---|---|
| Further schooling not available or too far away | 1 | 1 |
| Financial problems | 7 | 7 |
| House keeping / Family business | 4 | 4 |
| Disability | 1 | 1 |
| Illness | 1 | 1 |
| Not willing / poor academic progress | 37 | 37 |
| Pending results (GCE) | 14 | 14 |
| Complete GCE / Grade 13 | 14 | 14 |
| Engaged in an economic activity | 13 | 13 |
| Other | 7 | 7 |
This variable is rather subjective, and its categories may overlap with one another. Ideally, we would also have preferred to rely on more objective indicators of school achievement. Despite the presence of ten distinct response options, the reasons for leaving school can be grouped into a few broader blocks.
Opportunity-cost reasons include financial problems (7%), household or family business responsibilities (4%), and engagement in economic activity (13%).
Academic-related reasons are dominated by ‘not willing / poor academic progress,’ which is by far the most frequently cited reason (37%).
A small share relates to institutional or availability issues, such as further schooling being unavailable or too far away (1%, very low!) or pending GCE results (14%) to continue further.
Personal health issues, including disability (1%) and illness (1%) account for a small share of the cited reasons.
The Other category represents 7% of the children not attending school, and unfortunately we cannot do anything about it.
Having completed GCE / grade 13 is important to keep in mind here: these children already have completed secondary and this section of the survey does not keep track whether children are engaging into tertiary education so they should be left out in further analyses.
Let us recode the reasons for leaving school into these broader categories (noting that this recoding may be open to criticism or revision) and filter out the children having completed secondary school in order to streamline the analysis.
edu14nocurnocomp <-edu14notcurrently |> filter(reason_leave_school_rec!="Complete GCE / Grade 13")
edu14nocurnocomp$reason_leave_streamlined <- edu14nocurnocomp$reason_leave_school_rec |>
fct_recode(
"Institutional / Availability" = "Further schooling not available or too far away",
"Opportunity-cost" = "Financial problems",
"Opportunity-cost" = "House keeping / Family business",
"Personal health issues" = "Disability",
"Personal health issues" = "Illness",
"Academic achievement-related" = "Not willing / poor academic progress",
"Institutional / Availability" = "Pending results (GCE)",
"Opportunity-cost" = "Engaged in an economic activity"
)
#Remove unused levels (Complete GCE / Grade 13)
edu14nocurnocomp$reason_leave_streamlined <- droplevels(edu14nocurnocomp$reason_leave_streamlined)
edu14nocurnocomp |> freqtable(reason_leave_streamlined,weights=finalweight_25per) |>
freq() |> select(-n) |>
kbl(digits=0) |> kable_classic(full_width = F)| % | val% | |
|---|---|---|
| Institutional / Availability | 18 | 18 |
| Opportunity-cost | 29 | 29 |
| Personal health issues | 2 | 2 |
| Academic achievement-related | 44 | 44 |
| Other | 8 | 8 |
With this streamlined variable, we then investigate how they vary according to parent education and household wealth:
edu14nocurnocomp |> freqtable(edu_parent,reason_leave_streamlined,weights=finalweight_25per) |>
rprop() |>
kbl(digits=0) |> kable_classic(full_width = F)| Institutional / Availability | Opportunity-cost | Personal health issues | Academic achievement-related | Other | Total | |
|---|---|---|---|---|---|---|
| Less than primary | 9 | 37 | 0 | 54 | 0 | 100 |
| Primary | 13 | 26 | 1 | 51 | 9 | 100 |
| Junior secondary | 13 | 33 | 2 | 47 | 5 | 100 |
| Senior secondary | 39 | 34 | 3 | 20 | 4 | 100 |
| Collegiate | 35 | 15 | 3 | 24 | 24 | 100 |
| Tertiary | 0 | 0 | 0 | 0 | 100 | 100 |
| NA | 16 | 24 | 3 | 46 | 11 | 100 |
| All | 18 | 29 | 2 | 44 | 8 | 100 |
edu14nocurnocomp |> freqtable(hhwealthcat,reason_leave_streamlined,weights=finalweight_25per) |>
rprop() |>
kbl(digits=0) |> kable_classic(full_width = F)| Institutional / Availability | Opportunity-cost | Personal health issues | Academic achievement-related | Other | Total | |
|---|---|---|---|---|---|---|
| Poorest | 11 | 39 | 1 | 47 | 2 | 100 |
| Poor | 13 | 28 | 2 | 50 | 8 | 100 |
| Middle | 18 | 26 | 2 | 44 | 10 | 100 |
| Rich | 20 | 20 | 2 | 45 | 14 | 100 |
| Richest | 33 | 30 | 2 | 26 | 9 | 100 |
| All | 18 | 29 | 2 | 44 | 8 | 100 |
Taken together, the two tables show that both parental education and household wealth shape the reasons why children leave school, but in somewhat different ways.
Among children from less educated or poorer households, school exit is most often linked to academic-achievement–related issues and opportunity-cost pressures, suggesting that learning difficulties and the need to contribute economically remain key constraints. As parental education and wealth increase, however, academic and economic pressures diminish.
E.g., more than half (54%) of the children with less than primary educated parents cite academic-achievement-related issues (while only 24% of children with tertiary-educated parents cite this issue)
E.g., about 40% of poorest children cite opportunity-cost pressures, but only 30% of the richest children.
As parental education and wealth increase, institutional or availability factors—such as limited options for further schooling—become more prominent. At the top of the socioeconomic spectrum, these structural constraints partly replace academic or financial motivations as the main reasons for leaving school.
- E.g., 33% of the richest children cite this reason but only 11% of poorest children do.
Notice that having filtered out children who had completed GCE/Grade 13 (i.e. those who have completed secondary school) means that none of the children with tertiary-educated parents are out of school, reinforcing the association between parents’ education and children’s dropouts.
4.6 Limits (and possible ways to solve them)
Our analysis has some limitations (or observations to bear in mind).
We adopted an age-cohort approach, implicitly assuming that children of different ages (e.g., 14 vs. 15) can be compared directly. This could imply, for example, that children who dropped out at age 15 are assumed to have done so in that year. But it would not be accurate to think that way—children aged 17, for instance, may have left school several years earlier. This issue could be explored further using the variable
when_stop_schooling, which indicates the year in which each child dropped out.We also did not consider another issue beyond dropping out: class/grade for age. Even if children are attending school, they may be late for their age, because they have started school later than their peers or because they repeated a class. The variables
grade_this_yearandgrade_last_yearmay be of interest to examine this.
4.7 Possible qualitative extensions
Qualitative field material could enrich our understanding of school attendance and dropout. For example, interviews with students, parents, and teachers could provide insights into:
The reasons why children leave school that are not captured by survey categories (and whether they sometimes overlap);
How parents’ expectations shape educational decisions (to drop out);
How parents and children perceive the value of education (and whether it is worth going to school if it is not compulsory);
Localized barriers to school attendance (even if rural does not seem associated to more drop outs, are there some specific areas where school density is less present at higher grades?).
4.8 Preliminary conclusion
Overall, school attendance in Sri Lanka is high at younger ages but declines steadily from age 16, with dropouts more common among children from poorer households or with less-educated parents. Academic challenges and opportunity costs are the main reasons for leaving school among disadvantaged children, while institutional constraints become more important for children from wealthier or more educated families.
For a report:
This analysis should be extended with further academic literature on education school drop out, for instance, in Sri Lanka, see Lindberg, 2010 or Arunatilake, 2006 (they are now a bit old and you should also search for more recent articles).
We have not included graphics here, but they are always a useful way to convey information to an audience. You can use esquisse, as introduced in previous chapters, or export the tables you want to visualize into your preferred spreadsheet software to create graphics when finalizing your analysis.
In a final report, all the tables and graphs must have a title, as well as some explanations of the data used (HIES 2019 here) and the restrictions applied to the analytical sample (for example for the latter tables, individuals aged between 14 and 19, etc.).