By: Pak Chung Wong & Alina A. von Davier
In today’s schools, hardly a day passes when educators from the classroom to the superintendent’s office don’t hear the phrase, “It’s all about the data.” But what does that truly mean for learning and teaching and what impact can access to the vast amount of data available to educators have on linking student assessment to student achievement?
In recent years, work with educational testing and learning data has evolved due to the capabilities provided by technology, the availability of large data sets, and by advances made in data mining and machine learning. This has led to new types of educational measurement data analysis, evolving from traditional classical test theory to computational psychometrics. Simply defined, computational psychometrics grew out of traditional psychometrics (the science of measuring skills and knowledge, abilities, attitudes, personality traits, and educational achievement) and incorporates techniques from educational data mining, machine learning, and other computer/cognitive science fields.
Yet despite these advances in the methodology and the availability of large data sets collected at each test administration, there are still roadblocks for real-time computational analyses. The traditional way testing organizations collect, store, and analyze the data from multiple test forms at multiple times is not conducive to real-time, data-intensive computational psychometrics and analytics methods that can reveal new patterns and information about students over time.
Founded just two years ago, ACTNext, the change agent for ACT, is working every day to innovate ways for educators around the country to leverage the vast amount of data available to truly transform learning and teaching.
The “data cube” paradigm
Currently, psychometrics data are mostly processed and stored as a two-dimensional matrix that contains items by student test takers. The item content and its standards or taxonomies are often stored as narratives in various other data systems, ranging in sophistication from Excel spreadsheets to OpenSalt.
In the big-data era, the expectation is not only that we have access to large volumes of data, but also that the data are matched and can be aligned and analyzed on different dimensions in real time—including item features like content standards.
One promising approach to mitigating the unique challenges at the intersection of psychometrics and big-data analytics is the “data cube,” a multidimensional data structure optimized for OnLine Analytical Processing (OLAP) and data warehousing applications. This data cube structure has dominated the business intelligence and analytics landscape for the last two decades but has never been applied to psychometric data analytics. Despite its implicit 3D structure derived from the word “cube,” a data cube can represent any number of data dimensions.
We argue that psychometricians and data scientists can effectively navigate their learning and assessment data and interactively visualize the results through foundational “data projection” paradigms, such as slicing, dicing, drilling, rolling, and pivoting. These projection operations can be implemented effectively when using a multi-dimensional data model such as a data cube. The traditional relational database model stores separate data dimensions in different tables, which requires (computationally expensive) joining steps when querying data.
Example of “data projection operations”
Of these data cube operations, the drilling operation is particularly powerful when dealing with psychometrics data related to 21st-century skills. For example, ACT’s Holistic Framework describes a hierarchy of four capabilities—core academic skills, crosscutting capabilities, behavioral skills, and education and career navigation. Each of these capabilities can be drilled down further into strands, sub-strands, and skills. Similar hierarchical data structures will not be as effectively represented and executed in a prevailing relational database model.
While we chase advances in applying the data cube concept to psychometrics analytics, we are also rethinking the traditional role of the data warehouse associated with a data cube. We are exploring a new storage repository technology known as a data lake, which is “a method of storing data within a system or repository….”
At ACT, we are building a new Amazon Web Services (AWS) based on a data lake solution known as Learning Analytics Platform (LEAP) to support our mission focused on adaptive learning, informal learning, and lifelong learning. Information collected from the vast learning environments and system infrastructures will mostly be stored in its native formats. These may include structural data such as spreadsheets, and unstructured data such as audio and video clips.
What will this mean for educators and students in the classroom? If done right, moving the data cube into the data lake will help us to “democratize the data,” providing relevant data to everyone in the learning environment who needs access—with no gatekeepers. Even more importantly, that access must be provided in a way that makes it easy for educators to understand the data and to use it to transform learning. With this scientific evolution, “It’s all about the data” will become meaningful to all educators and not just the catchphrase of the moment.
For more, see:
- Three Ongoing Trends in Education Data
- Making Data Work Together Is the Key to Better Achievement
- Data Interoperability in K-12: A Teacher’s Perspective
Pak Chung Wong is the former principal advisor on data science at ACT. Alina A. von Davier is senior vice president at ACTNext. Connect with them on Twitter at @ACTNext.
Stay in-the-know with all things edtech and innovations in learning by signing up to receive our weekly newsletter, Smart Update.