One of the great features of Canvas is the ability to access all of the data from your institution's Canvas instance. These are provided as a set of downloadable database tables using a star schema format. Each table comes as a directory full of compressed files. A small node.js script, provided by Instructure, takes those files, unzips them, and reassembles them into a single, tab-delimited file for each table.
In theory, the unpacked tables can be imported into a database. From there, visualization tools such as Tableau or statistical software such as R can be connected to the database and used to generate insights into ELMS usage.
This step in the process has been elusive for us.
One table in particular, the requests table, has proven to be a bear to deal with. The compressed size is over 300 GB, at an unknown compression rate. We have yet to completely unpack the compressed requests files -- our first attempt filled up our machine's 1 TB drive before the process was complete.
We recently brought a much more robust server online, which should have plenty of space for our needs (two drives with over 10 TB each). Unfortunately, the unpacking process for the requests table (and only the requests table) seems to cause our connection to the server to abort halfway through. I don't have a background in computer science or data engineering, so the process has been quite frustrating as I have limited ability to troubleshoot the issue -- both for lack of knowledge and lack of administrator credentials on the server.
Even without uploading the data tables to a database, we've been able to do most of our analyses by reading the flat files directly into R, Tableau, or Python, so things have not been at a standstill. Still, I look forward to the day when we don't have to worry so much about that darned requests table.

No comments:
Post a Comment