Big Data: The Excreta and the Ecstasy

image RiptideIO

“I really think we’re living in the single most interesting time that I could have predicted as a computer scientist… [at school] I remember walking across rooms thinking I’m going to spend my entire life perfecting this notion of set math on datasets as big as possible. Whatever the biggest data set I can get my hands on, that’s the one I’m going to work on that day. And it literally played out that way so far. I’m very excited to be living in a world where sensors are just pooping out tons of data all the time.”

Max Levchin, GigaOm Interview

As one of an elite group  known as the PayPal Mafia Max Levchin is a well recognized tech futurist in the Silicon Valley.  If he declares sensor data to be ‘poop’ – there must be some profound truth to the statement.  Who am I to argue with the wisdom of someone that comes from the same small circle as the founders of Tesla, YouTube, LinkedIn, Yelp, 500 Startups, etc. While the focus of his latest company has nothing to do with the real-time sensor data streaming from smart buildings, the “from excreta to ecstasy” observation about working on very large data sets holds true.

Albert M. Putnam, VP Technology Operations, Cimetrics, says “There is a vast amount of data available in a large building—if you were to read one sensor every 15 minutes, you would have 35,000,000 data samples after one year! Some buildings have thousands of sensors and actuators.”  So there is sufficient volume of data to enthrall any data scientist, especially sensor data in combination with all the traditional building data that arrives via protocols like BACnet, 802.11 Wi-Fi and all the neighboring bands of wireless radio transmissions, M2M cellular networks, etc. Before any person, or algorithm (aka intelligent agent) can start to analyze the data, it has to be cleaned up and normalized.

In a recent blog post entitled  Data Janitor Work: What No One Tells You About, one founder of RiptideIO comments, “the big barrier to analyzing data is getting clean, workable data.” The job of prepping data for a building systems integration project can be a big mess, due to proprietary protocols, a mix of interconnection mechanisms, and varying use cases. Founded by veterans of Cisco Systems Smart+ Connected Buildings Group, RiptideIO differentiates its products and services by its commitment to keeping its BAS integrations open, so that its building operator customers have flexibility and can avoid unnecessary complexity and costs.

In this context, the words ‘clean’ and ‘open’ are almost synonymous.  Data experts from the Building Automation  industry have worked over the last decade on standardization efforts like BACnet to get away from the ‘poop’ and make it possible to operate on data at a higher level.  Ken Sinclair, editor of, puts each in perspective in this editorial.  Like Putnam and RiptideIO he believes it takes a commitment to openness, a well-informed strategy and past expertise to work through the data cleansing phase in a timely way.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s