Behind the scenes of a DataThon!

By Sabine Seyffarth – Researcher and aspiring Data Scientist


Last month, I decided to take part in the DataThon organized by Data for Good in Calgary. I love playing with data and have been interested in pursuing a career in data science for a while now.

A DataThon is an event where volunteers come together for one weekend and use their analytical skills to tackle a given problem. It brings together people from various professions and backgrounds: social science, psychology students, professors, data scientists, business analysts, computer scientists, etc. The dataset for each event is provided by a non-for-profit organization that would otherwise not have access to data-driven insights, and who could use the results of the DataThon as a justification to implement a data science program.

The Calgary DataThon started on a Friday night with an introduction to the work of Data for Good and the Distress Centre Calgary. The Distress Centre operates 24 hours a day,  seven days a week, answering phone calls from Calgarians in need. The Centre runs several phone lines and chat services through which volunteers hand out information, direct people to services in their communities, help with emotional distress and personal troubles, and even do suicide prevention. Data on all these interactions have been collected from as early as 2003, amounting to about 1.25 million records of Distress Centre'€™s calls, 211 contacts, chats and texts. This is a vast amount of data with untapped potential. In preparation for the DataThon, the raw data was diligently anonymized and transformed into organized and annotated databases, ready to be unleashed on about 50 volunteers.

The Distress Centre had prepared background information and questions to guide the analytic efforts. These questions included: Do weather patterns have an influence on call lengths or types?; What type of calls are more likely to be de-escalated?; Are there seasonal patterns in terms of issues being discussed?; Is there a relationships between the amount of suicidal callers and officially reported suicides within the province?

The questions were grouped into six assignments with similar workloads, and participants got to select which assignment they wanted to do. While reading through all the information, I paid close attention to how applicable my skill set was, and where I would have an immediate intuition on how to approach the problem. I decided to work on the correlation between suicidal callers and actual suicides within Alberta. After the presentations, everyone grouped themselves into teams, and started to develop a game plan for the next day.

On Saturday morning, our team began putting that game plan into action, which proved a lot more difficult than anticipated. Different backgrounds meant we had different approaches to the data, so the first few hours were spent trying to get everyone'€™s ideas heard, while simultaneously deciding how to modify the data into an appropriate format for the analyses. After lunch, everyone settled into a groove, talking quietly in nuclear groups and concentrating on their assignments.

To correlate the information on suicidal callers with actual suicide numbers, we were limited to the publicly available resources. Justice Alberta only provides an average number of suicides by month from 2005 to 2009, leaving us with a mere 56 data points to analyze. This meant we were only able to use just over half of the information provided by the Distress Centre. Interestingly, we found no relationship between callers who simply discussed or mentioned suicide at some point during the call with actual suicides. However there was a notable (but not significant) correlation between male callers who were rated as acutely suicidal and reported suicides within the province.

As with many of the results found during the weekend, there were major limitations to these findings. The most obvious being the small amount of data that lead to the results. There were also many variables that factored into these relationships that we didn'€™t account for, and an underreporting of suicides in general. Still, our findings were interesting, and could possibly contribute to organizational strategies. For example, they show that volunteer training seems to work, as suicidal callers are escalated appropriately to the authorities. On the other hand, the somewhat inconclusive results might mean that a more appropriate approach than merely looking at the relationship between actual suicides and suicidal callers might be to look at people admitted to the ER for suicide attempts, or even a combination of both.

This shows that further research and analyses are needed to fully understand the knowledge contained in the Distress Centre'€™s records. I am convinced that the DataThon helped to show the potential and glean the impact that future work with this type of data can have. It is exciting to be part of a project that can help a non-for-profit adjust its strategies towards higher organizational efficiency. For example, further analysis could show the Distress Centre how to better allocate resources according to high call volumes and types of problems, depending on the time of day. At some point, they could even predict an increased need for volunteers following major political, natural or economic events.

I would highly recommend that anyone who has an interest in working as an analyst or data scientist take part in events like this. Apart from the great feeling you get by making a difference, it is also great to see how your own skills measure up against professionals. And you can connect with people who have similar interests, learn new tools, and all in all, have a fun '€” albeit stressful '€” weekend!