BlogsRSS

Hacking fields of dreams

Baseball – the evolution of UK sports like cricket and rounders mixed with early US sports like townball – is played professionally in northeast Asian countries like Japan and Korea, but remains most popular in North and Central America.

Most baseball statistics are public record. But now one US baseball club, the venerable St Louis Cardinals (who began professional life as the Brown Stockings in 1881), stands accused of pillaging the website of another—the distinctly less storied Houston Astros.

Big data "hit by pitch"
A less savory part of the game: when a pitcher hits a player on the opposing team with a pitched ball. It happens by accident, but can also be on purpose – the latter isn't considered sporting behavior.

According to The Register, the US FBI is investigating "someone [who] allegedly gatecrashed computer systems belonging to the Cardinals' former rivals, the Houston Astros ... office staff at the Cardinals may have gained unauthorized access to the Astros' internal databases using nothing more than a guessed password."

"Investigators have uncovered evidence that Cardinals employees broke into a network of the Astros that housed special databases the team had built, law enforcement officials said," according to a New York Times article. "Internal discussions about trades, proprietary statistics and scouting reports were compromised, said the officials, who were not authorized to discuss a continuing investigation."

However, "subpoenas have been served on the Cardinals and Major League Baseball for electronic correspondence," said the NYT.

Why this matters
Baseball is an exemplar for big data. The sport revolves around numbers – pitched balls are judged and counted, and games are scored by "runs" which are typically scored by base-runners, who reach the on-field bases by yet more complex equations (in the case of "hit by pitch," for example, all runners advance one base, which can score a run...meaning the event can result in an instant loss for the pitcher's team). Defensive numbers are also part of baseball's relentless calculus.

These parameters create complex data-sets and the higher echelons of these data are always examined by rival clubs seeking an edge. That's fine. What's not fine: cracking into someone's database to steal information.

The "Nerd Cave"
This unprecedented (alleged) intrusion seems inspired by the efforts of Jeff Luhnow, general manager of the Astros, and, perhaps not coincidentally, former VP for baseball development for...the Cardinals. According to a Bloomberg article, when Luhnow worked for the Cardinals, he "surrounded himself with a flock of adherents, including engineers, consultants, data scientists, and a physicist—people who, like Luhnow, wouldn’t have had a place in baseball until recently. 'These sorts of skills were not valued 10 or 15 years ago—or really valuable—because the data that you can use today to help you make decisions wasn’t available,' he says. Many of them occupy a room, dubbed the Nerd Cave, that’s lined with whiteboards covered with algebraic formulas."

According to The Register, "When Luhnow moved to the Astros, he built a similar system, called Ground Control." You can see where this is going, can't you? Well, we have an ongoing investigation, and everyone's innocent until proven guilty.

But this case shows that no aspect of our lives is immune from big data and its potential misuse. Some of the data now available to retailers and social media firms was never known before.

What can a preponderance of new data create? Here's a sample, from the Bloomberg article: "new data sources have suddenly become available … major league stadiums are wired with systems such as Pitch f/x and TrackMan that use Doppler radar to track the ball in three dimensions. “For every single pitch thrown in every game,” says [Sig Mejdal, the ex-NASA engineer brought over from St Louis who is the Astros’ director for decision sciences], “we now know the location, acceleration, movement, velocity, and the axis of rotation of the ball. If you believe, as we do, that this data has predictive ability, then you’re in an arms race to learn it and take advantage of it.”

Data previously unknown now leveraged for "predictive ability." This new vector creates an "arms race." And now allegations of the "Nerd Cave" bunch snooping on "Ground Control"?

Yes, the operatives in this expensive game are using high-end machines to compile obscure data. It's valuable. I argue that your personal data (where you shop, what you buy, what your doctor tells you, what your friends/spouse/child sends you by mobile comm, etc) is just as valuable.

We're not professional athletes with hefty pay-packets, but we have a right to privacy – and any firm, whether telco-related or not, must respect that.