Deploying Surveybe’s Structural Functionality for Complex School Questionnaires

Cait pic  Cait Spence

I have recently converted some of the largest, most complex questionnaires I have ever encountered into the latest version of Surveybe. These questionnaires make up the evaluation for a large, multi-round randomized experiment to improve learning outcomes in educational establishments in Tanzania. This experiment is a cash-on-delivery programme, where teachers are paid a bonus based on the test results of their students. The focus is on Grade 1-3 students who are tested in the subjects of English, Mathematics, and Kiswahili. These are known as the focal-grade-subjects.

The experiment began with 350 schools across 10 Districts in Tanzania, and it is now in the 4th year with 220 schools remaining in the programme.  EDI has conducted all of the previous 8 rounds of data collection as part of this evaluation and has more recently taken over implementing the intervention as well. Each data collection round typically includes interviews with teachers, the head teacher, classroom observations, student tests, and a record of the student attendance roster.

Complexity of the Questionnaires

An ongoing challenge in this project is the configuration of the teacher questionnaire. This questionnaire begins with the head teacher answering certain sections, and then the teachers are interviewed individually. All the teacher interviews are within the same questionnaire because they are based on information that is dynamically updated from the head teacher’s responses.

Whilst CAPI methodology facilitates the ability to structure this questionnaire compared to trying to handle such a questionnaire on paper-based methods or lightweight, linear software there are two main difficulties with this teacher questionnaire configuration. Firstly, it is a very long questionnaire, particularly at large schools where there can be more than 30 teachers who are all interviewed within the same questionnaire. This means that progress through the questionnaire can be slower than optimal, as a lot of information needs to be fed in from reference data or be dynamically updated based on previous responses.

Secondly, the multiple levels of observation make the SQL within the questionnaire very complex. Each level of observation is stored in a separate table, so within the teacher questionnaire, there are separate tables for the school level, teacher level, grade level and subject level of observation. In many parts of the questionnaire, information is needed from multiple levels of observation, which requires combining the tables within Surveybe in order to extract the necessary data. As such, the SQL statements can become long and complicated.

Surveybe’s Change in Table Structure

Since beginning to covert the teacher questionnaire into the latest Surveybe version, I have come to greatly appreciate the improvements in functionality. The main difference is the table structure within Surveybe – a change that may not be properly appreciated by beginner users of Surveybe. However, for those who make full use of Surveybe, you will come to value this improvement hugely as it will allow your SQL statements to be simpler and more intuitive, and it should enable the speed of your questionnaires to increase.

It is worth explaining that the change in table structure can be divided into two components. First, the table structure is now of a ‘tree’ form where parent tables and sub-tables are explicitly defined. Secondly, there are cascading IDs where parent table identifiers are carried down into sub-tables within Surveybe. These two changes will now be explained more fully, by first setting out the limitations of the previous table structure, and then contrasting this with the improved table structure set-up.

In earlier Surveybe versions there was no explicit sub-table structure, which meant that the unique identifiers of a table were not carried down into the sub-tables. As such, if you were writing SQL within a subtable, you would have to use the invisible ‘Parent_ID’ variable to match the rows in the sub-table with those from the parent table. Additionally, if you were trying to link up a table that had a level of observation two levels above the current sub-table, it required combining all the tables together in order to match the sub-table rows with the parent rows (using an ‘inner join’ statement). Needless to say, this SQL could get complicated very quickly!

In the latest Surveybe version, you can set the ‘tree’ structure which means that you can specifically define the levels of observations and the sub-table structure. Then, all parent identifiers from tables at a higher level will be carried down into the sub-table, functionality that is termed ‘cascading IDs’. It must be noted that there has been no change in the format that data is exported into Stata, Excel or SPSS. Previously, all parent identifier variables were present in the exported sub-tables. The change is that the internal Surveybe table structure now matches the format that tables are exported to in the statistical analysis programs.

This new tree structure and cascading IDs functionality is visually depicted in Figure 1 below which presents a rough structure of the school establishment questionnaires. The highest level table would be rows of all schools in the study. The next level of observation is that of teachers at the school. The observation level below is that of the different grades that the teacher currently teachers. The lowest level of observation is the different subjects that the teacher takes in each grade. At this lowest subject table, the identifiers that will be present within this table in Version 5 are all the parent table identifiers, namely, the SchoolID, the TeacherID, the GradeID and then the SubjectID. This table structure differs from the previous set-up where only SubjectID would be present in the Subject table.

Figure 1: Table Structure Improvement

cascading ID's school

Benefits of the Change in Table Structure

There are several benefits to this change in table structure within Surveybe. Firstly, it allows the SQL to be much simpler and shorter. I no longer have to combine tables to draw information from multiple levels of observation or use the slightly confusing “Parent_ID” references. The resulting SQL is a lot more intuitive and easier to get right. A simple example from the teacher questionnaire is if I only wanted to enable a question for teachers who are currently teaching a focal-grade-subject.  I no longer have to combine all the tables with an ‘inner join’ statement and use the ‘Parent_ID’ variable to do this. Instead, I can do a select statement from the Subject table which specifies that SubjectID=Math, English or Swahili, and GradeID=1,2 or 3. This is only possible because the GradeID variable has been carried down into the lower level Subject table. The resulting shorter SQL in this enablement condition can be seen in Figure 2.

Figure 2: Shorter SQL Statements

SQL for cascading ID school

Secondly, the new table structure has helped speed up the operation and progress through the questionnaires. By no longer combining tables for SQL conditions that involve multiple levels of observation, it makes the questionnaires more efficient and able to do query statements more rapidly. And, when working with such a long and complex questionnaire like these, any speed improvement is a significant benefit to field work.

Overall, when conducting the questionnaire conversions for this project, I have had much satisfaction cutting down long SQL statements and seeing the improved speed of the questionnaires. The fact that Surveybe can be configured to deliver such an interwoven questionnaire stands testament to its flexibility and suitability for such surveys.

With this in mind I am sure Surveybe users will soon share my appreciation of this recent advance in functionality and the improved table structure set-up will prove to be very welcome!