Subscribe via iTunes
Subscribe via Stitcher
Show notes
Phil Zito 00:00
This is the smart buildings Academy podcast with Phil Zito Episode 225. Hey folks, Phil Zito here and welcome to Episode 225 of the smart buildings Academy podcast. And in this episode we will be talking through troubleshooting common issues. So over the past couple weeks, I've been paying attention to Facebook to LinkedIn to some of the hva see forums, and I've been listening for common issues that people were asking questions about. Then I also looked at my own past experience, and the questions that our students ask us through our learning management systems forums, what I came out of this little exercise with was a series of common issues that people seem to experience with building automation. And some of these are directly related to building automation. And some of these building automation is just kind of on the periphery. So we're going to be looking at related issues like input and output issues, communication issues, protocol issues, some it issues and hpac issues, this will probably be a longer episode due to the amount of issues we're going to be looking at. When we are going through each of these issues. We're going to be talking through kind of the strategy of approaching these the methodology of approaching these and how you can best troubleshoot common issues. This will be important information, both for technicians and service techs out in the field as well as managers who are looking to help their technicians become more effective in troubleshooting systems. And this will be good for engineers and people on the sales side, who want to understand kind of what's going on with their systems that they designed or they sold and it gives them a baseline of where to engage at everything that we discuss including the show notes links to any pertinent courses can be found at podcast smart buildings Academy comm Ford slash two to five. Once again, that's podcast at smart buildings academy.com Ford slash two to five and today's episode is sponsored by our technician path. If you are looking for the shortest way to get yourself up to speed to be a building automation technician, whether that's a technician, a service technician and installer, then I encourage you to check out our building automation technician path this four course training path is completely online and enables you to learn in 90 days, what normally takes people three to four years, many companies use our training paths in order to rapidly upskill their new and existing hires so that they can effectively perform their job tasks on a day to day basis. So I encourage you to learn more about that by going to podcast at smart buildings Academy comm Ford slash two to five and checking out the technician path hyperlink, which will be down below the podcast recording. Alright, so whenever I approach troubleshooting, I like to take a very structured methodology to troubleshooting. And this is something that I've honestly had difficulty explaining to folks for quite some time now. Because it's something that seemed at least at the time to be naturally intuitive to me, it was something that I felt was a skill I had. And as I started to evaluate how I go about troubleshooting, I realized it was less of a skill. And it was more of a way of thinking and a process and structure to that way of thinking. And I always thought for the longest time I'm a horribly unorganized person. I mean, if you saw the office I'm in right now it is an absolute disaster. There's stuff everywhere. But yet people kept telling me, Phil, you're so structured, you're so organized, and I thought, do you are you talking about the right person? Because that's not me. And I realized they were referring to my thought process. And so as I started to really analyze, how do I approach troubleshooting, what enabled me to go from coming out of the Navy, barely knowing anything about HPC to leading a service team at a fortune 100 company within two to three years.
Phil Zito 04:35
You know, what enabled that to happen and it was my methodology and thought process and procedural approach to troubleshooting. So whenever I come upon an issue, if you called me out to your job site and you said Phil, something's not working, I would say to you, well, what is the desired outcome? What are we trying to achieve? What is working That or not working, that you are trying to get working? And then I would say, Okay, well, the desired outcome maybe is to have the zone temperature at 72 degrees. But what is the actual outcome? And this is where I feel a lot of folks, their troubleshooting immediately goes off the rails, they have a desired outcome, which is 72 degrees within the space, and their actual outcome is 76 degrees in the space. And instead of figuring out what the delta is between that, like what could be causing that what could be causing the desired outcome is 72. But the actual outcome is 76. What could that be? Instead of doing that, they immediately just start changing things. You've seen it right, you've been called out to job sites where people have overridden the flow set points, they've gone and change damper settings, they've went and permanently disconnected actuators and manually cranked actuators to a setting. And did they ever go check that the temp sensor was accurate? Nope. Did they ever go maybe see? Or is there some upstream blockage between the air handler and the VA? vbox? Nope. Did they validate that they're actually getting 55 degrees from the air handling unit? Nope, they just went and immediately started just flipping switches until they could solve the pain. And usually this comes in the form of Hey, we're hot, called a hot call, we're basically being a firefighter. And we're just reacting. And we're not being strategic in our reaction. And sure, you can do a short term effects, turn the flow up to 100%, and just douse the space in conditioned cold air. And that may solve the problem that may lower the tap, but then you might have sub cooling, you're actually starving boxes and other spaces and causing hotspots in there. And then it becomes a giant game of whack a mole. And so you've got to have some process. So like I said, I like to look at desired outcome versus actual outcome. Identify the Delta, what what is the difference? Well, in this case, that difference is a four degree difference in desired temperature setpoint versus actual temperature setpoint. And then I look at the root cause analysis, well, what could be the root cause what causes a space to cool or heat? What causes that to happen? And if you've been through our control sequence fundamentals course, you know that the primary source of heating or cooling within a space in most systems is going to be airflow. Now granted, if you have reheat boxes, yes, that is your primary source of heat. But how does that heat get delivered from the hot water coil or the electrical coil to the actual air stream and then to the space through airflow. So our primary suspect is almost always when troubleshooting viavi or space related issue. So almost always going to be airflow. If we validate that our designed airflow is actually being achieved, then we can move up the ladder of progression in a root cause analysis and we can say, okay, airflow is achieved. Next is temperature. Now, in this case, we know it's a cooling issue. So we go and we take a look at the discharge air tab of the air handler, and so on and so forth. And we're actually going to talk through a couple h fact issues. So I'm not going to follow this logic path all the way through. But that's kind of the thinking that I want you to have when you approach these problems. You really, this is why I hammer in all of our courses. And in a lot of our podcasts and in our guides, I hammer systems thinking, understanding systems, understanding relationships, understanding current state versus desired state, all of these key concepts that as you learn them, and you actually apply them should make your troubleshooting a lot easier. So I saw this one post on Facebook. And it was about inputs. And the guy was saying that he's getting a negative value from his resistive temperature sensor. and was
Phil Zito 09:34
like, What do I change in the controller? And that's the wrong question to ask because remember, we have to think through what is root cause. So in the case of input and output issues, and in the case of this right, you have a zone temp sensor, and it's reading a negative temperature. You have to ask yourself, what is my desired outcome? Well, to read an accurate temperature, what is the actual outcome? Well, it's not reading an accurate temperature, it's a negative temperature. And then we can say to ourself, well, what is the Delta? Well, it's the difference in temperature. But is it really the difference in temperature? What causes a temperature reading? within a building automation controller? What is the cause of that reading? Well, it's the interpretation of resistance of ohms. In most cases, some cases, it's, we're using zero to 10 volts. Some cases, we're using four to 20 milliamp for temperature, but most cases, it's resistive ohms. It's the interpretation of ohms against a temperature table, you know, like a 10, k type two or one K, nickel, whatever temperature table, and you are correlating, hey, if I've got this ohms reading, then I reached this temperature. Now, this is where having a knowledge of building automation becomes important because a thermistor is different than an RTD in their tables and how you read their tables whether, you know, you increase temperature via an increase in resistance, or does a decrease in resistance actually increased temperature. So understanding how your table works, because the resistive tables don't necessarily work the same? So that's the first thing, right? We understand what kind of device we're working with, then once we know that if we know that we're working with a device where the lower the resistance, the lower the temperature, then we have to say to ourselves, if we have no resistance, what is that indicated of? That could be indicative of a couple things, right? Most likely, it's an issue with the wiring. So we would take a look at our wiring, and we would figure out what is causing that issue. And we would understand that lack of resistance, and we would say, Okay, now we're going to troubleshoot, we wouldn't go into our controller, and start checking out temp tables potentially, yet, we wouldn't go in to our controller and open it up and start looking for the jumper from Digital to resistive. yet. We're going to validate the device first, once we validated the device, because that is a direct causality relationship, right? There's a direct relationship between no resistance and low temperature on some temperature sensors. And so once we've identified that, and we've validated it with our meter, and we've determined we have no resistance, then we can pretty much say, hey, we've probably got a wiring issue. And we're going to troubleshoot that. If however, we notice, you know, we've got a one k home, and that's giving us whatever, what's the one kale nickel table, isn't it like 76 degrees for one cam, something like that? I don't have it memorized. But it's somewhere around there. So we're right at that. And we say, Okay, we've got the resistance, yet, we're still not showing the temperature reading we should have. Well, then from there, we want to move on to hardware, not software yet. But hardware. Do we have any external jumpers that need to be set? Do we maybe have them set for volts DC or amperage instead of for resistance. And so we would go and make sure any jumpers that need to be set any DIP switches that need to be set on our physical hardware set? And then and only then, do we move the software and we start to look at how is the input mapped in? Is there a temp table that we manually need to code in there or select, we can start to work through so these are kind of processes that we want to think through logically. And part of it is understanding how these things work in the first place. And the other part is understanding what is the most likely point of failure. Based on the Delta, you're seeing what is the most likely root cause analysis based on the delta between desired outcome and actual outcome. And kind of the same thing can result when you're dealing with an actuator, maybe you're dealing with a zero to 10 volt DC actuator,
Phil Zito 14:26
and you're sending out a command from your controller and you know, you've got the output setup because you just set it up, but in the software, but for whatever reason you are not having the damper stroke. So you have to then ask yourself root cause what does a damper actuator need in order to stroke Well, we know that it needs power, and in some cases, it needs a control signal. So if we're using a proportional zero to 10 volt DC driven actuator, we know that most likely it needs 24 to 120 volts depending on torque that you need, and it needs a zero to 10 volt DC signal. So from there, we have to actually troubleshoot a couple things. If we're getting no action whatsoever from the actuator, then immediately we're going to want to test power, right, we test power at the actuator, do we have power? Once we have power, then we can test our control signal, do we have the control signal, if we're not getting the control signal, then it's most likely an issue with wiring, or an issue with the output setup, the physical output setup with jumpers or things like that. And that's kind of how we work through this, and we troubleshoot those IO issues. So you can kind of see that logical approach, going from the most likely suspect, the most likely root cause and working through that. Next, we move on to communication issues. When we deal with communication issues, there's kind of three common issues that we have, right, we have issues that are related directly to the wiring of the communication bus, we have issues that are related directly to the physical setup of the protocol on the communication bus. And then we have issues that are related to the logical setup of the protocol on the communication bus. So each issue is going to have indications. Oftentimes, if I have a communication bus that has a bunch of controllers working, and then all the sudden after a specific controller, the communication bus just dies, and there's no more controllers that are coming on, then I have a completely different troubleshooting path. versus if I have two controllers, that one of them sometimes shows up. And then sometimes the other one shows up, those are indicating of maybe a duplicate MAC address or duplicate device instance. And whichever one gets its token faster is going to be the one that shows up initially during polling. So you can start to see a couple issues related to common issues and protocol issues come up. First is you have to have an understanding of Rs 45, rs 232, you have to have an understanding of mod bus law and BACnet, you need to understand how these protocols work. You cannot effectively troubleshoot things, if you do not understand how they should work when they're working properly. And that is one of the biggest issues I see with common protocol, whereas IO, it's more just people skipping steps with common protocol, the biggest issue I see is that because people don't understand how rs 45 works, or how Modbus works or how BACnet mstp works, because they don't understand that at a deep level, they do not know what a desired outcome should be their desired outcome would be, hey, my controller should be communicating. Yes. But you should also have specific desired outcomes related to the implementation of the wiring. Is it 32 devices? Is it 3200 feet per segment? Do you have end of line resistors? Is polarity true? These are all desired outcomes that you would want from the installer. So it's not just your desired outcome is that it communicates your desired outcome is that it communicates and it's been properly installed and set up. So from there we work through first we start off with our wiring standard, are we violating our wiring standard in any way? Are we doing t taps? Are we putting end lines on on every device? Are we crossing polarity? Are we shifting from two conductor to three conductor, there's a variety or we terminating the shield on both sides and creating an antenna. There's a variety of things we can do wrong when we don't adhere to the wiring standard and the wiring. From there. Once we validated that, then we can move on the physical settings, things like MAC addresses, things like end of line resistors. Those are all the physical settings that we move on and implement. And then from there, we can move on to network settings, things like device instance. baud rates,
Phil Zito 19:24
your different bit polarities, things like that. And we logically work through these as we do our root cause analysis of our communication issues. Then we have what I like to call logical protocol issues. So these are where the communication bus is working just fine. But the protocol issues are actually more of a misunderstanding of the capabilities of a protocol than an actual technical issue in itself. So let's say you were mapping In an air cooled chiller common task, right. And you're tasked by the sequence of operations to control the chilled water setpoint to do reset for it. Now you go, and you pull in an AI titled chilled water setpoint from a BACnet device, you pull that in. But yet, there's no priority array, you can't write to it. Now, if you understood the BACnet protocol, you would understand that API's do not have a priority array, and thus have no way of being overwritten, they do not support that functionality. But if you're like most folks who don't really truly understand BACnet, they're going to be sitting there trying to figure out why is this AI not working? I've mapped in this AI, titled chilled water setpoint, which in actuality, it's just a reflection of the chilled water setpoint at the chiller, and you're unable to actually command that AI value. So you would need an A v or an eo value in order to command if you're adhering to the BACnet protocol. So you are working through this, you're saying my desired outcome is to command the actual setpoint. My actual income is I can't command the setpoint. What is the difference? Well, here's the point. And this is where you have to have an understanding of BACnet knowledge and understanding that AI's do not have a priority array, and thus are not commendable or writable. Let's move along now to IT issues. I feel like we've hammered some VA s issues pretty solidly, let's talk through some common it issues. Some of the most common it issues are network issues being unable to communicate across networks. So here, we have to have a solid understanding of how it networking works. We need to understand and analyze the network issue. Let's say that I'm on subnet a, and I want to communicate to subnet B. That's my desired outcome. But my actual outcome is I can communicate within subnet a, but I can't communicate to subnet B. So there's a couple issues. Right here, we would say our delta is, hey, we can communicate on Sunday, but we can't communicate the subnet B. So my root cause analysis would say, All right, let's look at the OSI model. Let's understand how things are communicating. We obviously know that network communication works at least within subnet A. So the fiscal aspect of the network and the DataLink aspect of the network seemed to be working just fine. It seems to be at layer three, the network layer, that we are having issues. So we have to ask ourselves, what needs to happen in order to have inter subnet communication or sorry, extra subnet communication outside the subnet. We need to have routes, right. That's how IP networking works. And so we start to analyze things like pinging, can we ping if we can ping another network, but we cannot communicate through the network that is a whole different set of issues. But usually what will happen is you are unable to ping. And that indicates a routing issue that you do not have a route set up if you are able to ping, but your traffic is not able to cross the network boundary, then that actually indicates a separate issue, which would be an ACL and access control list issue. In which case, we are going to have to make sure that our ports and protocols can communicate across the network boundary. So you can see the logical progression to troubleshooting. But you can also see the required knowledge that needs to exist in order to make that troubleshooting approach work. Same with things like database issues and server issues, a very common issue that masquerades as a server issue, but is usually a network or cybersecurity issue is the inability to communicate between the server and the supervisory device. It's a running joke that one of the first things you do when you set up your servers, you disable the firewall. And a lot of cybersecurity people cringe. But for the longest time due to the lack of it knowledge in our industry, the solution to
Phil Zito 24:19
getting your port traffic to pass from the supervisory device to the server and back and forth was to disable the firewall. Now that we're seeing more network level security, we're seeing access control lists exist in the actual network devices. And you can't really disable those. I mean, sure you can if you have access to the network devices, but most people don't. And so we're finding ourselves showing up on a site. And we can ping between our devices because the ping is allowed. But we're not able to have our protocols communicate between devices. And thus we think there's an issue with the server when In actuality, there's an issue with the network. So you can kind of see how that issue is masquerading. And to logically think through that right? We would say, Okay, we have a server. And we have supervisory devices, they can't communicate between one another is that the server, the quickest way to rule that out is to add your laptop to the network, and see if your laptop can communicate to the supervisory devices. If it can, then it's most likely a firewall at the server. If it can't, then it is most likely access control lists at the network. And you can further determine this by using a crossover cable and basically bypassing the network and directly connecting your laptop to the supervisory device. If in doing that, you immediately have connectivity and communication, then it is almost certainly an access control list or network setting lit issue on the actual switch or router itself. So that's kind of how you can logically work through that path. Next, we come up to hv AC issues and hv AC issues are interesting. You know, as I've said many times before, both on the podcast, as well as just in my videos and my training, I am not a mechanic, I've never been a mechanic, I came straight out of the Navy as a weapon specialist and moved into building automation. So by all intents and purposes, you would think my HVC experience would be limited. However, I was fortunate to be under a mentor. Early on in my career who taught me that hv AC is less about understanding, can I change fan belts? Can I change compressors out. And it's more about understanding the interrelationships between the variables that the HV AC systems themselves control. It is the system's thinking and philosophy that enable you to be really effective in programming and troubleshooting, even if you have never changed a compressor in your entire life. So for example, when we look at that scenario of zone temperature on a unit, and we look at zone temp being too hot or too cold, we first need to understand how exactly does a space get too hot or too cold? What exactly enables a space to control itself. And we learned that through air changes. BTS are either absorbed from the space or transferred to the space. And then that air stream that enters from a diffuser is then exhausted and either recycled through a return duct or exhausted from the building. So we understand that principle exists. Once we understand that principle, then we realize that airflow is the primary factor in conditioning a space. Once we have isolated airflow and we ensure that airflow works, then we can move on to the secondary factor, which is going to be temperature. And this is where we can look at are we getting 55 degrees in the space in the case of cooling is our reheat working so that we're getting 80 or 90 degree discharge in the case of a reheat box, are we properly controlling that temperature variable, and in all honesty, short of a bad temperature sensor, those are the two primary issues not sure you'll find some fluke issues like bad space pressurization, or a diffuser that has the balancing damper shot, or maybe a diffuser that for whatever reason was set up next to the exhaust or return diffuser. And thus, you're completely bypassing the air change methodology within the space, the volume mixing of air.
Phil Zito 28:47
Those are rare, I will say most often it is improper air flow control due to bad airflow sensor, or just improper settings overrides on the unit, or it is improper control of the temperature, either at the unit in the case of reheat or at the air handler in the case of di t So then, naturally, if it's da te issue discharge air temp issue at the air handler, we move our logical flow on to the next system. And so the next system being the air handler, and then we ask ourselves, what affects discharge air temperature on a unit? Well, first thing we have to look at, the most obvious thing we would want to look at is entering and leaving temperature on the cooling coil. For us we're looking at a 10 to 12 to 14 degree Delta on a fully loaded coil. And that right we have an entering return of you know 16 degrees, and we drop that down to 54 degrees exiting the coil. So if we're able to see that delta, then we can tell that hertz discharge air temp is right. And somewhere between the air handler and the VA vbox, we are losing a bunch of temperature, which is pretty rare, short of a hole in the ductwork or something. Usually what it is, is somehow the coils been plugged, or we are not getting the proper pressure to the actual coil to basically make the flow go and work the way it needs to work in order to get a proper exchange. The coil itself is dirty, there's a variety of things we're getting, the preheats may be turned on, and it's reheating that return air and the DA t that the cooling coil is just not able to overcome that. There's a variety of things, but it's actually a fairly simple troubleshooting process in itself. Now, if you were to say the flow was the issue, then we would naturally look at the unit itself and look at its pressurization. How is it controlling system pressure? Do we have system pressure? Do we have the one two inches water column at the unit? Are we properly doing that? are we losing pressure somewhere down the ductwork? Do we have maybe a partially closed fire damper there's so many ways you can see right kind of the logical path as we progress from a terminal unit up to the air handler and so on and so forth. And that's the logical progression you would have. Whereas a lot of folks will just naturally go and start flipping switches and changing set points and putting things in hand. Your temptation should be to avoid that or use your natural reaction. Your intuitive reaction should be to avoid that temptation. And to actually go and work on that issue at the lowest common denominator, in this case, this space and work your way up. So my hope is that this approach to troubleshooting this talking through several different issues, giving you my perspective on how to approach these issues should give you some tools, and some approaches and some strategies that you can utilize to move through as you're servicing your customer sites or as you're servicing your own site. As always, everything is available at podcast smart buildings academy.com Ford slash two to five I encourage you to check out the recording, check out the links leave any comments with questions. Hope this episode has been very valuable to you. And I look forward to talkin