<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=2854636358152850&amp;ev=PageView&amp;noscript=1">
16 min read

SBA 239: What is data?

By Phil Zito on Feb 22, 2021 6:00:00 AM

Topics: Podcasts

What is data? How is data produced? What do you need to understand about data?

In this episode of the Smart Buildings Academy Podcast we explorer what data is, how data works, and what data is used for.


Click here to download or listen to this episode now.

Resources mentioned in this episode



 
Subscribe via iTunes


 Subscribe via Stitcher

Show notes

Phil Zito 0:00
This is the smart buildings Academy podcast with Phil Zito Episode 239. Hey folks, Phil Zito here and welcome to Episode 239 of the smart buildings Academy podcast. And in this episode, we're going to be talking about what is data. So lately I've seen a ton of talk about digital twins analytics, data visualization. And at the end of the day, all these solutions rely on one thing, data. But you know what is data, it still seems that as an industry, we're confused about what exactly data is. So in this episode, we're going to be looking at what data is how data is produced, how its formatted, how it can be consumed, and much more. Now, normally, I would take you through a series of explanations and definitions, but in this case, we're going to start in the reverse. And we're going to look at a basically a couple of examples of data as it flows through normal kind of day to day use. Now, before we do that, I do want to let you know everything we discuss, is going to be at podcast at smart buildings academy.com, forward slash 239. Once again, that is podcasts, smart building academy.com. For slash 239, I encourage you to go there, check out all of our previous episodes, check out our FREE Mini courses, and also check out our training programs. So let's look at the life of a couple data examples. And to kick this off, I am going to talk about a data example, that almost all of you listening to this have and have experienced. But most people don't think of it as data. That's like a set of as belts, a set of owner O and M manuals, you know, sitting in a basement, in a closet on a shelf, probably haven't been used in a while the paper is pretty brittle. You know, maybe it's got a little yellow tinge to it because it's got some water on it in the past. But this unformatted physical data is what a lot of us are working with out there especially considering that the good majority of built assets are legacy in nature, they're, you know, 2030 years old. And they've got these older O and M manuals with little to no electronic documentation. So if you wanted to utilize this data, how would you begin to do that? Well, first, you would have to go and find a way to consume this data, right and going and manually consuming it, you know, looking at it and then replicating it electronically is not efficient. So most likely have to use some form of scanning service. And then once you've interpreted that data, you'd have to validate that data and make sure that it's actually still accurate, right, almost performing retro commissioning, and looking at does this snapshot of data actually represent the truth of the as his system, you'll hear me use those phrases a lot through this podcast to be as his truth of the system etc. Because we can consume data that is not actual representing is not an actual representation of the true installed condition of the systems that we're dealing with. Now we have another example, which would be visualized data in digital twins, you know, kind of going from one extreme to the other, you know, on one hand, we have all of this kind of really specific, old kind of paper, you know, you can touch it, you can feel it, but you can't really do anything with it electronically data. And then on the other hand, we have something that is very non physical, you can't touch it, feel it, it's very ethereal. And that is the data and digital twin. So digital twins being a digital model of physical assets in a, you know, data environment. So taking data samples, and that may be data at rest, you know, in the form of product specific data like cut sheets and information that doesn't change, you know, it's static information. Or it may be data in motion things like real time collection of data from sensors and from operations of systems. So in that case, you're visualizing the data electronically and you're able to work with it. Then we have things like consumed date data API's, things like protocol integrations, right going and taking data from select endpoints and select systems and then processing it and utilizing it for our different uses. So you know, three different scenarios of data and kind of how we're using it in a day to day basis. But you know, what is data at

Phil Zito 4:59
the end of the day Data is notoriously difficult, yet simple to define. all at the same time, you know, at its core data is simply information that can be processed or stored. Right? It's at least from a electronics perspective, but you could say even physical information in the form of those physical o and 1's, right, that can be stored, we're storing it in paper format. And we're storing it, you know, in some broom closet or something. Now, when I make the statement, that a at its core data is information that can be processed or stored, you know, what comes to mind when I say that, too. Now, I, you may be thinking, kind of the past examples, right, the trend data, the data loggers, just real time data, you may be thinking of the physical documents, you may be thinking about observations that people are making conversational data, this is a lot of data that oftentimes in organizations does not get captured, because they don't have the structures in place to capture it, you know, being able to appropriately capture triage data to capture the data from people as their reporting system failures, and then being able to analyze that data. It's stuff that's often not captured or not properly and entered into a system. And then we have no way to really contextually analyze this data and more on that in just a little bit. But as I mentioned, data, you've heard me use the term context. Now data is useless without context. You know, in electronic format data is just ones and zeros how computers work is they have CPUs. And at least until quantum computing really becomes mainstream CPUs use ones and zeros to go and process data. And what happens is this data, the ones and zeros flow into the CPU, I mean, there's more to it than that memory and all this stuff, but they basically flow into the CPU and the CPU processes. Now, what happens is that there is context to these ones and zeros. Otherwise, it would be kind of useless, it would just be a stream of ones and zeros. So what happens is, this data is actually formatted. And there's kind of context laid on top of data. But before we get that, let's talk about how data is produced. So data is produced through data collection. Now we're going to be shifting to mainly talking about electronic data here. But data is produced, as I mentioned, through data collection, right? So what happens is you have a variety of systems. And those systems are collecting data. Now, in the case of some of the newer systems like digital twin and whatnot, we're entering data where you're using like OCR scanning of documentation, to be able to pull data off of specific documents, we're grabbing equipment profiles from manufacturers and pulling in their specific static data, like performance standards for pieces of equipment, etc. And then there's, you know, live streamed data that is coming into our systems via data collection, the variety of different means, but the primary data collection format that we're going to be utilizing is either real time data streams, or trending where we're going and we're grabbing data, either change a value or historic time interval trends. But at the end of the day, we're collecting either operational data, or design data. operational data is the one most of us or our former, what's the word I'm looking for here? operational data is what we're most used to, right. Most of us are used to putting into the spec that we're gonna gather trend data, we're gonna grab there, it didn't CLV or interval format, etc, etc. However, there is a growing trend, for lack of a better word, to gathering design data and documenting that in our systems. So utilizing design data, and understanding system capacities, things like that, and then tying it into analytics and other software solutions to really get a good sight picture on what is the performance of a building, and how is that building performing? How are the systems of that building, performing? Alright, so at the end of the day, data coming in, it needs to be formatted. And data is formatted in a variety of ways. But for most of us, when we're looking at data, we have almost zero control of the formatting.

Phil Zito 9:59
We may Be able to configure, you know, facets or unit types or snippets, whatever you want to call it depending on the protocol you're using and the system you're using. But that's about it. We can't change strings to Ents, and the floats and things like that unless we want to programmatically go and edit things. So we're stuck when it comes to data formats, we get what we get from the manufacturers. And that's it. Now, the problem comes when we actually want to utilize data, which we'll talk about a little bit later in this podcast. But for the most part, our data formatting schemas have served us well, thus far in the industry, but things are changing. And why are things changing, you're probably heard of brick schema, you've probably heard of haystack, we're going to be talking to folks on both of those in the near future on the podcast. But the reason we're seeing a more structured approach to data is because we need to have data about data in order for application developers in order for people who maybe we don't understand our data, or just simply can't really visualize it on a individual scale, because it's just not efficient to go in and analyze exactly what each data point means to provide context. In order for us to deal with that, like, let me give you a picture, because I just said something that maybe you all can't visualize. If I gave you a campus, and we had a bunch of z, T zone temps or sp spacetimes, right? The problem is, is that you get a developer who maybe is space utilization focused, an app developer, other space utilization focused or occupant comfort focused or something like that, they have no idea what these things mean the z and t SPT etc. So we need to provide data about data which is known as a term called metadata. Now, metadata is kind of all the talk and has been all the talk for several years now. Because in order for us, I mean, first, we have to get the data out of our systems, which is a challenge in and of itself is because of how immature our industry is when it comes to API's, and things like API's. But once we've actually got that data out of the system, it needs to have context. without context, we can't do anything with the data. So we need metadata. And that's where these data schemas, these data models come into play. Because what happens is that as as good as things like BACnet, and lawn and Modbus are as far as standards of data format, and then, you know, back that with this data, objects lawn with this data objects and kind of giving a structure to it, the unit types, the device properties, the device descriptions, the object descriptions, the object types, etc. They leave a lot to interpretation. So what you're seeing with, you know, brick schema with haystack, and these different models coming out is they're providing metadata models, which are data about data, and they're providing structure and kind of a organization, a schema, which is basically a, how do I describe schema to you without showing you what a schema is? So imagine this, if I told you, you know, 531003012, Richmond Road,

Phil Zito 13:47
Dallas, Texas, there's just making up numbers and things right here. Those of you may be able to just kind of say, oh, okay, well, the first number was probably a zip code, street address and city address, because you have a schema in your mind of how addresses should be organized. And you can quickly contextualize that and figure out kind of, oh, that's what that data means, even if it was all kind of disorganized, and things like that. A data model and data schema provides that organizational structure, you know, it says that a system of this type should have these points. And these objects, and these objects are of this type. So now, when you have people doing analytics, when you have people doing applications, they can look at data sets coming out that are pre organized, pre structured, and then they know how to consume those data sets. They know how to use those data sets and manipulate those data sets. In order to build applications in order to basically create use cases and to create tests against data to say, you know, my assumption is that space utilization is x and it's based on these variables. We're going to test that. And we can test that because we understand these variables coming out of the system. We couldn't do that before. I know that seems like a super simple thing. But I can't tell you how hard you know, being involved in the early days of analytics, how difficult it was to go and not only take data out of a campus or take data out of a building, but then to contextualize that data to provide formatting and structure and this is even like myself, who I feel like I am pretty well versed on building automation. And I still would struggle to make sense like, what exactly does this mean for this? Is this a space temp? Is this a zone temp, whatever. And you stretch that across entire campuses, and you can, you know, get highly complex and highly time consuming data normalization projects. So data modeling and data schemas, help us to overcome that. So then we find ourselves going. Alright, so we've got the data produced, we've got it formatted, we've got it structured. Now, what do we do with that? Well, assuming that you actually want to keep the data historically, which sometimes you do, sometimes you don't, we need to store data. In the case of a historical data analysis, if you want to be able to refer back to the data you need to store it. And that's where databases for the most part come into play. There's relational, non relational databases, relational databases, quite simply, are databases that have a primary key. So we say like room numbers, the primary key and then we have all of these attributes associated with that table. And that entry into that table. So room one, a one may have a zone template may have a fan coil, fan status, etc. And all of these points are associated and are related, then you have non relational data, which can be structured or semi structured data. And this data has, you know, no relationship to tables and things like that. This would be when you're pulling in data from documents, or you're pulling in unstructured data, that would be a form of non relational data. But either way, you're storing this data into databases, so that the data can then be later analyzed. That being said, data does not necessarily have to be stored. It can be real time, you can do data streams, you can do data polls. So data streams are just exactly what it sounds like a stream of data coming from a system that another system can utilize. streaming data is very common, especially in the business world. And we analyze streaming data, and do all sorts of things with it. streaming data can also be stored at the same time, by the way, data pools These are things where you go to like API endpoints and or WebSockets. And you grab data, and you pull it out, although with WebSockets, it's not really a pull, because those are two way communication. And then you have historic, which is data analysis, I mean, these are the three main ways that we consume data, right data streams, data pools, and data analysis. There is kind of a fourth way where we have static manufacturer information, you're seeing this in like digital twins, and you're seeing this in design tools, where there's an online library of equipment, or building materials, and then that data is being pulled in. Maybe it's the insulation factor on a specific material. So you can do a good model of how well insulated your building, as, you know, just a variety of ways you can use data. But all of this data is coming in. And then it's up to you

Phil Zito 18:45
to make the use cases for the data. And this is where things kind of fall apart. You know, there's kind of two pivot points that I find with people, when it comes to data and just working with data. The first pivot point is, can you actually classify and collect the data? That seems to be the big challenge, identifying what data to collect, classifying it, and then collecting it. If you can get over that hurdle. And you can do that hurdle in a cost effective way, then you've won a good portion of the value or battle. At that point, it now all becomes about creating effective use cases and asking good questions in regards to how you're going to utilize this data. And that's kind of the key. What questions are you going to ask about the data? What are you going to ask like, what do you want to know? Do you want to know which spaces are being utilized? Do you want to know how they're being utilized? Do you want to understand when they're being utilized? Do you want to understand how someone is interacting, you know, for example, with the scheduling system, and then the parking management system and then the elevator system, and then the room booking system? All of these issues? relationships? Are you looking to drive efficiency in how people go about daily tasks and buildings? Are you looking at tracking occupants and understanding when they badge in and where they go? So that you know for COVID? And occupant tracking? What is the question you're trying to solve? Like? Question maybe? Where has the occupant been in the buildings so that we know for occupant tracking purposes? Or what is the occupants experience when they go to utilize the space? And how are they utilizing a space? These are all questions that you need to be able to ask. And when you ask these questions, you follow a very, very standard format. And that format is actor, system. Start state and state failure state is the standard format for our use case write the actor is the person or the system who is using the use case, the system or systems are what are involved in the use case, the start is what initiates the use case. The success state is what the use case looks like when it finishes successfully. And the failure state is what looks like when it fails. So for example, if you're trying to track occupants, through the coming to the building, and using a scheduled space, you may have a start, which is the occupant schedules meeting. And then from there, right, the occupant has Wayfinding or uses some sort of maps software to find the building. And then they go and badge into the parking which guides them to a spot that is pre reserved, which then through digital signage, try, you know, directs them to an elevator, then the conveyance directs them up to their floor, and then more digital signage, directs them to the conference room, you know, pretty basic generic use case. But that's all dependent on multiple data streams, we have data at rest, and we have data in motion, right. So some of the data in motion also can become at rest. So what will happen is the occupant is using, essentially an API to interface with the scheduling system through most likely outlook to schedule. And then data is getting sent to the building automation system, as well as to the conference booking system, which then is creating a historical record that is going to be utilized later to interface with a variety of systems, signage, etc. Maybe we're using some sort of Bluetooth beaconing, to know where the occupant is, as they move through the space to provide just in time updates as to direction on the digital signage. Or maybe we're using a Wayfinding app on their laptop or laptop on their mobile phone, that would be interesting walking around with your laptop, now on their mobile phone, so that they can go and find where they at. And then that's spatial data, right. And that's very, it's structured, but not structured at the same time. I mean, spatial data technically is structured. So anyways, I digress. The important point here, though, is data. Once you get it formatted, once you get it out, and you're able to collect it and interface with it, then you need to come up with the use case.

Phil Zito 23:33
That's where people tend to fall flat. You know, if they don't come up with a solid use case that delivers an ROI for the business data, they don't ask the right questions. And then they don't get the right answers. And all of this data collection stuff becomes useless. So here's what I want to leave you with, we kind of went through a lot. My hope is that, at the end of this podcast, you feel a little bit more comfortable with understanding what data is you understand that data can be physical as well as it can be electronic, it can be collected by systems, it can be manually collected. And it can be both at rest as well as in motion at rest, meaning it's stored in motion, meaning it's coming from a system live. It can be designed or operational data, it's formatted. And that formatting is usually dictated by the system it comes from. And then it's structured. And these are things that aren't always the case, especially with legacy buildings. But we are seeing formatting schemas in the case of haystack, and brick coming out to provide structure and format. The data then is either communicated to the systems or stored. If it's stored, it's stored in databases. Typically, if it is communicated to systems, it's either done so in a real time data stream or data pool or in a stork data feed and then from all of this Once we've gathered our data, we need to be able to go and make a use case for our data. And that use case needs to be tied back to business value and we need to analyze, kind of using that user or actor, system start and, and fail condition use case model. All right. If you have any questions, do not hesitate to reach out in the discussions where you're listening to this, or in the comments just depends where you're listening to this podcast episode. Thanks a ton for listening. I look forward to talking to you in next week's episode, where we're gonna have a really good interview. And I encourage you to check that out. It's all about commercial real estate valuation and the market and provides a very interesting perspective on how people at the high end of business think about commercial real estate. That being said, Thanks a ton. I look forward to talking to you all next week. Take care

Transcribed by https://otter.ai


Phil Zito

Written by Phil Zito

Want to be a guest on the Podcast?

 

BE A GUEST