Wednesday, March 12, 2014

Big Data - The New Coin of the Realm

My predictions for the new coin of the realm for the next 25 years for both business and government?  Data and lots of it.  And now, we have "digital data backpacks" so that your child's data can be "carried" around AND made use of anywhere your child goes. 

From The Atlantic (with the what-should-be-troubling-to-you title, Your High School Transcript Could Haunt You Forever, the story of how what seems to be help - remedial classes in college - could follow you around. 

What if the data collected by the software never disappeared and the fact that one had needed to take remedial classes became part of a student’s permanent record, accessible decades later? 

Some educational reformers advocate for “digital backpacks” that would have students carry their electronic transcripts with them throughout their schooling.  

Big data—the ability to collect, store and process more data than ever—is poised to overturn traditional education. It will add a quantified component to aspects of learning and teaching that never experienced this before, enabling society to improve not only student performance, but the instructor’s work as well. However, there are risks.

Two risks named:
  •  Parents and education experts have long worried about protecting the privacy of minors. 
  • Also, people have fretted over the consequences of academically “tracking” students, which potentially narrows their opportunities in life. 
Big data doesn’t simply magnify both of these problems: it changes their very nature.

Don't like tracking? This is the ultimate in tracking.

One hope, and it’s just a hope, is that big data will make tracking disappear. As students learn at their own pace, and the sequence of material is algorithmically optimized so they learn best, we may see less need to formally track students.
But the reality could well be in the inverse. Customized education may actually lock in these streams more ruthlessly, making it harder for one to break out of a particular track if they wanted to or could. There are now a billion different tracks: one for every individual student. The upside is that education is custom tailored to each individual. The downside is that it may actually be harder to leap out of the canyon-like groove we’re locked into. We’re still trapped in a track, even if it is a bespoke one.

What about all that data?

 Where traditional data protection has mostly been focused on addressing the power imbalance that results from others having access to one’s personal data, here the concern is more about the threat posed by an unshakable past. School records may not get stored in cardboard boxes and left to molder before being thrown out: They may be stored and saved forever—and continually called up at the speed of light.

Hence, the first significant danger with comprehensive educational data is not that the information may be released improperly, but that it shackles us to our past, denying us due credit for our ability to evolve, grow, and change.

Other issues:

But much of the appeal of big data is that its value lies in its reuse for purposes that were scarcely contemplated when the data was initially gathered. So, informed consent at the time of collection is often impossible.

In education, this could permit the use of personal data to improve learning materials and tools, while using the same data to predict students’ future abilities may be allowed only under much more stringent safeguards (such as transparency and regulatory oversight). It may require the explicit consent of the students themselves. It will also need tough enforcement, so that firms that use the data know that they cannot afford to break the rules.

Big Data, Big Brother?


mirmac1 said...

This will end up like the ubiquitous credit score - you'll pay higher car insurance if you have a math disability. Or you won't be able to renta small apartment because your grades tanked during your parents' divorce. etc etc.

Melissa Westbrook said...

That's an interesting thought Mirmac - an education "credit" score for each child.

notatallcorrect said...

Big Data is awesome and we shouldn't let fear mongering (the "ultimate in tracking"? seriously?) stop it. Sure there are potential pitfalls, but we should manage them just like any other new technology.

Data is has already enabled us to identify things like links between income, family makeup, etc and educational success. It would be awesome to have real analysis around things like: If a child is in a special needs program how does that impact their ability to get a job 20 years later; do charter schools or private schools impact whether or which college you go to; how much is teacher success linked to school location or demographics; and on and on.

This is good stuff. Of course we shouldn't just dive full-bore into it without thinking about ways to avoid the potential problems, but we also shouldn't focus only on the problems without regard to the benefits.

Melissa Westbrook said...

"Of course we shouldn't just dive full-bore into it without thinking about ways to avoid the potential problems, ..."

Sounds like Common Core.

I actually haven't seen that big data on schools has moved the needle at all. Could you let me know what you are basing that on?

That states are working to pass student data privacy laws because of this rush for data should tell you something.

Anonymous said...


How would you feel if your medical records had the same lack of protection that students' educational records are about to have?

The problem with this invasion is that names are attached. It is not some numbers game or blind experiment.

--enough already

mirmac1 said...

"Big Data is awesome"? Yes, the NSA's ability to know where we've been, where we are and where we're going to be is truly awesome.

Often the "links" you praise are tenuous, contradictory and/or only valid in the aggregate.

I'll bet we will soon see Data that suggests "digital backpacks" hurt rather than help students.

Mary Griffin said...

People need to stop drinking the koolaid about limitless data.

As Melissa stated, there is no proof that more data translates into better outcomes for students. I believe that people need to insist that there be proof of better outcomes before spending more money on data.

notatallcorrect state correctly that data has allowed us to identify some links between socio-economic status, family makeup and education success. So? What does ed reform say about this data? They say it doesn't matter. That the only thing that matters is teacher quality. And that teacher quality needs to be measured through standardized testing. Either way, society is unwilling to put money down on education. We can have all the data in the world and we still have a legislature that is unwilling to fund well-researched data-backed legislation. Providing better social/emotional supports for students improves outcomes for students, and in the end, better outcomes for society. But it costs money that neither the legislature or the Bill Gates types are willing to cough up.

If you want to see the end result of womb-to-tomb tracking, check out ALEC's "Student Futures Program Act" Model Legislation, in which a career planning program developed and administered by the
Department of Workforce Services, the State Board of Regents, and the State Board of Education which allows businesses full access to student records contained in the state education database maintained in the cloud. When districts and businesses can't access, use, or protect the data that they already have, there is no reason to believe that such problems will improve with more data. This week alone, there have been two data breeches with the district which allowed the confidential data from hundreds of students get into the wrong hands.

Anonymous said...

I don't believe access to data for legitimate educational research purposes vs. protections for students (and their families) against the misuse of those data should be an either/or proposition. Both can be done.

With that said, I think policymakers should ALWAYS err on the side of protecting data over granting access to data if there is ever a question that demands a choice between the two be made. This has not, unfortunately, been the case over the last decade at least.

This is the bottom line for me: Since children are compelled to attend school, they should not also be compelled to give up their privacy rights at the door.

notatallcorrect, the issue in play here (from my perspective) is whether the study examples you provide can be conducted with aggregate data and without Personally Identifiable Information on students in the data sets. In all the examples you provide, aggregate data would suffice.

If PII is necessary, there should be strict controls over and limited access to those data, IMO.

--- swk

Anonymous said...

Don't forget the role of private data brokers in tracking you through your life and selling that info to the highest bidders.

I work for a security firm that actually aggregates data from various brokers to build a profile of prospective employees for other firms. We correlate information about various on-line personas ( sockpuppets, avatars,e tc) and then bring together credit info, educational info, increasingly information about genetic testing. Combined with actuarial data and other models showing genetic disposition for education, companies can decided who to invest in...

--The future is here

Anonymous said...


To conduct high quality educational research, a researcher generally needs access to de-identified individual level data. This is because of selection and problems of omitted variable bias, as well as the ecological fallacy. The types of studies that notatallcorrect cites would need to access individual records and to statistically control for other variables to be able to make those inferential links, and rule out alternate explanations.

Further, to conduct high quality longitudinal studies, researchers need to be able to link the same student over time usually with a unique anonymous ID.

I agree that researchers really never need (or want!) personally identifiable information such as names or birth dates. However, I do not agree with the assertion that high quality research can be done without individual level data.


notatallcorrect said...

Big Data is an overused term everywhere so it's understandable that everyone thinks of it a little differently.

Basically, you collect a lot of data about a population, then analyze that data for patterns. The reason this is powerful is that if you're able to analyze a ton of data you can sometimes find patterns that you didn't even think to look for, or that are the result of lots of tiny causes that together add up to something big.

Healthcare is a good example, especially since it struggles with privacy issues many here are concerned about. It's easy to know that smoking causes cancer because you don't need a big sample to figure that out. But what if (making something up here) eating fish every Wednesday and Friday was just as likely to cause cancer - there might be enough people that eat fish that often to figure it out, but you would need a LOT bigger sample size.

So fundamentally it's about analyzing a lot of data, about a lot of people. That requires detailed data to be kept about each individual. But very useful conclusions can be drawn through aggregate (anonymized) access to that data. In the example above, the researcher that determines eating fish is bad does not have to know the identity of each person eating fish. Of course, that identity could be used (mirmac1's credit score example) to provide personalized recommendations, but we don't have to allow that if we don't think it's valuable.

Point being, there can be great benefits to aggregate data analysis without ever disclosing someone's personal data.

This is not an argument that unfettered data collection is always good (quite the leap of logic to equate collection of education data with NSA spying), but rather an argument against extremism on either side. It is possible to have substantial benefits without giving away all your privacy, we just have to work out what rules we want to put in place.

Melissa Westbrook said...

Here's the thing - you don't necessarily need PII to figure out who someone is.

It's MORE data going to MORE entities who are clamoring for it. Who draws that line? A superintendent? The Legislature? Congress?

I am not against use of data (just like I"m not against standards or assessments) but the way it is playing out - you should be worried.

It's not my kids I"m worried about - it's yours.

Anonymous said...

WWhen can we see Arne's school records, et al?

Big data collection/spying can really bite the hand that feeds it.

NSA whistleblower accuses Dianne Feinstein of double standards, pointing out her lack of concern about widespread surveillance of ordinary citizens.

Sen. Dianne Feinstein's bombshell accusation about the Central Intelligence Agency Tuesday set off a scramble on Capitol Hill — with Democrats and Republicans ignoring the usual party lines in response to her claim that the agency improperly interfered in a congressional investigation. Feinstein (D-Calif.) won immediate backing from top Democrats like Majority Leader Harry Reid and Judiciary Committee Chairman Patrick Leahy while some Republicans, including Lindsey Graham and John McCain, began to echo her concerns...* Have these same Senators supported the NSA spying on average Americans? TYT Network


Patrick said...

notatallcorrect said "... we should manage them just like any other new technology"

Are you joking? People's track records managing new technology is pretty much a string of disasters. If the innovators see profit, they can pretty much ram anything necessary through the legislature to allow their new technology to be used, no matter what its bad effects on anyone else.

We may get management of technologies right eventually, but it takes several decades at least... by which time they're no longer new, and several catastrophes have happened along the way.

Anonymous said...

So-called de-identified data can often be re-identified fairly easily.

I am sure there are lots of benefits to be obtained by collecting and storing data about my children without my explicit consent, but it does not make breath a sigh of relief to know that the data has been "de-identified," unless there are assurances the data will be stored properly and used only for non-profit research purposes.


Anonymous said...


I agree with you completely! Proper storage, destruction of data, and limiting entities who can access data to researchers at not-for-profit institutions who provide valid and important research question(s) and a clear research proposal should all be part of data sharing procedures.


Melissa Westbrook said...

Chris, well sorry, but OSPI is giving away SPS student data to Seattle Times, a for-profit company. The only reason? Because the Times got a Gates Foundation grant for education reporting which allowed OSPI, under new FERPA rules, to name ANY entity or person, an education provider.

Again, who will make these decisions about who sees what? I'm not waiting for the district - I'm going to work for a bill to be passed in the Legislature.

More on this soon.

Anonymous said...

Oh, yes, Melissa, I know - disturbing. I can imagine worse in the future.

I would likely support such a bill.


Anonymous said...

If data sets are this awesome, then pay for them. Pay my kids for their data.


mirmac1 said...


The article is about disaggregated data. I say it's not a stretch to equate the topic with rampant loss of privacy in our society or, as using your example, linking, Friday fish-eaters with Catholics with "papists" and "The "Whores of Babylon". There are real risks that are not "AWESOME"

The article is about packaging a person's educational records into a kind of "backpack" that will ride like a monkey on their back while (I would bet) someone, somewhere, is making a buck.

Let's look at a white man's leap at logic (oh! there I go lumping people together based on demographic): SPS has a list of orgs with an "institutional service designation" that get automated data dumps of PII-linked data. The rationale provided is so we can all see how these groups help these kids. Except there are no controlled studies. And, for the most part some politician or wealth-backed foundations, is pushing this in the name of "accountability" or performance management. Do we even know what exactly,the Vietnamese Friendship Association, say, does to collect $100Ks of taxpayer dollars to help kids? Have they or any other orgs that, by definition, support youth education, applied their service delivery equitably to all who need or wish it? What is the curriculum, or method of delivery? Who delivers it and with what training? Will there be statistical and verifiable results? Like the charter school question. No, no, and no.

In the end, it will be up to whomever can finance the fake research to provide the results they seek.

Don't use my kid in your propaganda