Data Janitors vs. Data Scientists: The Unsexy Work of Building a Solid Data Foundation
The smart building industry is obsessed with the promise of Artificial Intelligence. We’re sold a future where sophisticated data scientists and powerful machine learning algorithms will unlock unprecedented levels of efficiency in our buildings. But this vision skips the most important step.
Before you can have a data scientist, you need a data janitor.
The reality is that our buildings are drowning in a sea of “Big Data”—millions of disorganized, uncontextualized, and often incorrect data points from thousands of sensors. We are starving for the “Right Data”—a small stream of clean, reliable, and actionable insights. The unsexy, unglamorous work of turning the former into the latter is the single most critical and overlooked task in building analytics today.
The AI Analytics Paradox: Garbage In, Gospel Out
There is a dangerous paradox at the heart of AI for buildings: the more advanced the algorithm, the more exquisitely sensitive it is to the quality of the data it’s fed. A “black box” AI model trained on messy, inconsistent data from a building’s BMS will not produce magic; it will produce statistically confident nonsense.
This is the “Garbage In, Gospel Out” phenomenon. We feed an AI a stream of raw, unfiltered data filled with sensor errors, faulty equipment, and operational noise. The AI, doing what it’s designed to do, finds patterns in that noise and presents them as insights. Because it comes from an “AI,” we are tempted to treat this output as gospel truth, leading to wasted effort and a deep-seated distrust in the technology when its predictions inevitably fail.
The Power of FDD: The Most Important Layer in the Analytics Stack
To escape this trap, we must recognize that true building intelligence is built in layers, and the foundational layer is not AI. It is Fault Detection and Diagnostics (FDD).
FDD is the data janitor. It is a set of automated rules and logic, built on deep engineering expertise, that constantly scrubs your building’s raw data stream. Its job is to:
- Detect and Flag Errors: Identify when a sensor is malfunctioning, offline, or providing nonsensical readings.
- Normalize Data: Ensure data from different systems and vendors is presented in a consistent, comparable format.
- Identify Real Faults: Use engineering logic to distinguish between a genuine equipment fault (like a stuck valve) and a simple data anomaly.
Without this robust FDD layer, any attempt at higher-level analytics is like building a skyscraper on a foundation of sand. The FDD platform does the hard, essential work of ensuring that the data fed to any machine learning model—or, more importantly, to your human operators—is clean, trustworthy, and reliable.
Practical Application: Three Signs Your Data Foundation is Cracked
How can you tell if your current analytics platform is built on a solid foundation or a cracked one? Here are three warning signs that you’re dealing with a “Garbage In, Gospel Out” system.
1. You Suffer from “Alarm Fatigue.” If your team is constantly bombarded with hundreds of low-priority alarms or “potential issues” that turn out to be nothing, your system lacks a robust FDD layer. A good system doesn’t just find potential problems; it filters, prioritizes, and only alerts you to the ones that truly matter, backed by clear evidence.
2. Your Team Doesn’t Trust the Recommendations. If your operators frequently say, “The system says X, but I know from experience that it’s really Y,” you have a trust problem rooted in poor data quality. This happens when an AI model isn’t being fed clean, validated data, causing it to make recommendations that don’t align with the physical reality of the building.
3. You Can’t Get a Straight Answer on Data Normalization. Ask your vendor a simple question: “How do you handle data from different BMS vendors and normalize it into a single, open standard?” If you get a vague, jargon-filled answer, it’s a major red flag. A solid data foundation requires a clear, transparent process for data cleaning and normalization.
The industry’s fascination with the futuristic promise of data scientists has caused us to neglect the immediate, practical value of the data janitor. Before we can predict the future with AI, we must first have an accurate understanding of the present. And that requires a relentless commitment to the unsexy but essential work of building a solid data foundation. You can’t have AI without IA (Information Architecture).