Hot news! We announce our new heat demand model that estimates space heating and hotwater demand of swiss buildings using machine learning. Knowing the heat demand of any particular building is a key piece of information for many use cases involved in the energy transition. Actual heat demand measurements are often not easily accessible. Hence, an accurate heat demand model on building-level will be valuable to many of our customers, the stakeholders of the energy transition. Results: We developed a model for estimating the energy reference area, one of the most important input features for estimating the heat demand of a building, where we obtain a median absolute percentage error of 9%, compared to 15% obtained by the common approach of multiplying the building area times the number of floors. Considering heat demand estimation, our machine learning model obtains a median absolute percentage error of 21%, compared to 32% obtained by a comparable "standard" approach. Our insight analysis shows that our trained model recovers relationships between the estimated heat demand and some important input features (such as, construction year and specific hotwater demand) from our dataset that agree well with values used in the literature. For other input features (such as, heating degree days), the recovered relationships look different than one would expect from reading the literature.

Figure 1: We all use heat to create a friendly atmosphere as long as we can remember. Image by Moein Moradi from pexels.

## Why Using a Machine Learning Model?

When we want to make an offer for installing a new heating system, or plan the construction of a district heating network, or identify buildings with a high retrofitting potential, or make analyses about the current and future energy consumptions of a region, we always need to know the heat demand of single buildings.

Often, the first choice would be to work with actual heat consumption measurements. However, depending on what type of company or organization we belong to, we might not have easy access to actual measurements. If we do have access to measurements, we might still not be allowed to use them for all of our use cases due to data protection reasons.

And even if we have access to measurements and are allowed to use them, we are likely to be confronted with a couple of technical issues, such as:

that we are actually interested in a particular building, but the heating system producing the measurements for that building is actually used to heat multiple buildings,

or only part of the building,

or hasn't been run/measured for a whole year period,

or is used either for space heating only or space heating and hotwater production combined, but we are interested in two separate demands for space heating and hotwater each,

or even worse, we don't know exactly the details of the measurement acquisition process concerning any of the above points.

In any of these cases using a model can be a viable alternative to direct measurements. When using a model, it is always clear for which building with which attributes we are making a prediction. (At least in theory; in practice there might also exist uncertainties in the attributes of a building, i.e., the input features. For many building attributes it should usually be possible to manually validate the plausibility, where for raw heat consumption measurements this is very hard.)

Further, modelled heat demands may serve as a reference to which we can compare measured consumptions, in order to detect unexpectedly low or high values in measurements.

### Machine Learning vs. Other Approaches

Commonly used methods calculate the heat demand by multiplying the energy reference area times the specific heat demand (measured in kilowatt hours per square meter) [1, 2]. The energy reference area is usually approximated by the building area times the number of floors. The specific heat demands for space heating and hotwater usually are modelled as average values over different building categories that defined based on several attributes, such as, building class, construction period, last renovation of the building, and climatic conditions at the location of the building.

With a machine learning approach, in contrast, we fit a parametric model to a dataset consisting of a (basically arbitrary) set of building attributes as input features and one target (or output) variable, i.e., measured heat demands in our case. Once the model is fitted, we can make predictions of the output variable given the set of input features of any particular building. Of course, in practice the set of input features shouldn't be arbitrary but carefully adapted to the problem at hand.

Some advantages that come with our machine learning approach are:

it gives more accurate predictions compared to commonly used models,

the model is based on real measurements, and in this sense only contains correlations that actually exist in reality,

additional input features, i.e., more detailed informations about the buildings, can be integrated into the model straight away (given that they are available for the majority of buildings in Switzerland),

new heat demand measurements, either from more current years or from other regions in Switzerland, can be easily integrated into the model by retraining the model with the additional measurements,

from the trained model we can get insights about how the heat demand depends on different attributes,

the model can be used to simulate future scenarios based on today's and past measurements.

For the technically interested readers, we describe our model in greater depth in the following.

## The Model

Inspired by the methods in [1], as a first step we fit two additional models to estimate two key input features, namely the energy reference area that is also used by conventional methods as described above, as well as the heating degree days that account for the climatic conditions.

The complete model pipeline is shown in following figure:

Figure 2: Schematic model pipeline for estimating heating degree days, energy reference area, and space heating and hotwater demand for single buildings.

### Heating Degree Days Model

Heating degree days are a measurement used to account for climate-related influences on heating energy consumption. They are calculated from daily temperature values at a specific geographic location. They are considered to be directly proportional to the heating demands of a building at this location.

Our Heating Degree Days Model takes three features altitude, latitude, and longitude as input. It is trained with longtime averaged heating degree days calculated from temperature measurements of 50 MeteoSwiss weather stations with a simple linear regression.

The trained model, consisting of only four fitted parameters, obtains a mean absolute percentage error of 5% evaluated on the same 50 weather stations, indicating for remarkably well generalization capabilities.

With the trained model, we then predict the average heating degree days of every single building in Switzerland based on its altitude, latitude, and longitude, and further use the prediction as an input feature to the Heat Demand Model, as will be described below. Our dataset used for training the Heat Demand Model contains measurements of only a few Swiss cities and regions. Due to the rather low geospatial representativeness of our heat demand dataset, we use heating degree days as the only location-dependent input feature.

### Energy Reference Area Model

The energy reference area describes the actively heated floor area of a building. Whereas this area, as already mentioned, can be approximated by multiplying the building area times the number of floors, this value can substantially deviate from a buildings actual energy reference area.

That is why we try to make a more accurate estimate by training a machine learning model on a dataset of manually captured energy reference areas. In addition to the building area and the number of floors, we also include other meaningful input features, such as, building class and volume.

With our Energy Reference Model we are able to improve the accuracy in estimating the energy reference area of a building to a median absolute percentage error of 9%, compared to 15% obtained by simply multiplying the building area times the number of floors.

The improvement is especially significant for single family home buildings where our model achieves a median absolute percentage error of 11%, compared to 21% achieved by the simpler approach. In accordance with the big performance increment, we observe that for single family homes the building volume (used by our machine learning model but not by the simpler approach) has a significantly stronger correlation with the energy reference area than the building area and the number of floors.

The building category with the smallest error are multi family homes, for which our model achieves a median absolute percentage error of 7%, compared to 12% obtained by simply multiplying the building area times the number of floors.

### Evaluation and Application of the Energy Reference Area Model

Compared to the Heating Degree Days Model, described above, the Energy Reference Model has significantly more parameters. For an unbiased estimate of the models generalization capabilities, it is therefore important to evaluate the model on a test set, i.e., a separate subset of the dataset that is not used for training the model.

A severe difficulty for the evaluation are outliers: Our energy reference area dataset contains a considerable amount of buildings where the predicted area deviates largely from the true values. A major reason therefor is that for some buildings only parts of the floor area are actively heated. The model evaluation is especially sensitive to such outliers in the testset: adding or removing already a few outlier buildings may have a significant effect on the calculated error metrics. Hence, it is important to compare different models on exactly the same testset. In addition, we evaluate the model primarily using the median instead of the mean absolut percentage error, as the median is more robust to outliers than the mean.

When applying the Energy Reference Model in practice, it is rather difficult to identify such outliers from raw input data only. Properly visualizing some key input features may help a user interested in a particular building to detect erroneous input values in many cases.

### Heat Demand Model

The Heat Demand Model is trained on yearly averaged final heat consumptions collected for a set of buildings. The final heat consumptions are obtained by multiplying measured heat consumptions (measured in different units, e.g., litres for oil heatings, cubic meters for gas heatings, kilowatt hours for district heatings, ...) with a carrier-dependent heat value, a system-dependent efficiency factor, and a factor correcting for heating degree days variations of each year.

The model takes the priorly predicted energy reference area and the heating degree days along with other attributes of each building (e.g., construction year, last renovation, building class, ...) as input features.

With our Heat Demand Model we are able to improve the accuracy in estimating yearly averaged final heat consumptions of a building to a median absolute percentage error of 21%, compared to 32% obtained by multiplying the improved energy reference area estimate from above by a common heuristic value for the specific heat demand based only on the construction year and the last renovation.

Considering single building categories, our model achieves a median absolute percentage error of 23% for single family homes, and 18% for multi family homes, compared to 32%, respectively 30%, obtained by multiplying the improved energy reference area estimate by a simple estimate of the specific heat demand.

Similar to the Energy Reference Area Model, the evaluation of the Heat Demand Model is complicated by outliers in the dataset: First of all, deviations in the predicted energy reference area (the most important predictor variable for the Heat Demand Model) usually result in deviations in the predicted heat demand. In addition, there also exist outliers in the target variable, i.e., the measured heat consumptions, due to various reasons already described at the beginning of this text.

### Space Heating and Hotwater Demands from Heat Demand

So far, we are able to predict one heat demand for each building as covered by one single energy system, such as, an oil or gas heating. Sometimes, such a system is used to provide heat for space heating and hotwater combined, sometimes one or more additional systems are used for hotwater production, such as, an electric boiler or a solar thermal system.

Ultimately, we are interested in having two separate predictions, one for space heating demand only and one for hotwater demand only. To this end we use a little trick: We encode the information, if the measurements belong to combined space heating and hotwater production or space heating production only, in the 0/1-valued input feature called "has heating". During prediction we can then predict for every building once the space heating demand (by setting "has heating" to 0) and once the demand of combined space heating and hotwater production (by setting "has heating" to 1). From here, we get the hotwater demand by subtracting the former of these values from the latter one.

Evaluating the accuracy of the obtained hotwater demand predictions is rather difficult, as we currently have very few measurements from systems used for hotwater production only.

## Insights

One very interesting method to gain more insights into the trained model is by means of partial dependence plots. Such plots are obtained by sweeping one single input variable over its value range, while keeping all other inputs fixed, for many different samples of buildings.

The following figure shows three partial dependence plots for the specific heat demand:

Figure 3: Partial dependence plots showing the dependence of the specific heat demand on the three different input features construction year, heating degree days, and has hotwater. The plots are obtained by calculating the outputs of the Heat Demand Model divided by the predicted energy reference area for a set of sample buildings. The faint orange lines show partial dependences of single buildings. The thicker orange lines show the average partial dependences. The thick vertical lines at the bottom of the plots indicate the value distribution of each feature over the dataset. For the construction year and the heating degree days the value range is extended to values that do not occur in our dataset (but do or will actually occur in reality), in order to see how the model generalizes to future construction years or other climatic regions respectively.

The first subplot in Fig. 3 shows the dependence of the specific heat demand on the construction year. The graph coincides well with common observations that buildings built from about 1960 to 1980 have the highest heat demand per square meter, whereas newer buildings have decisively lower demands. Note, that a majority of samples creating the individual lines in the plot come from heating systems that are used for combined space heating and hotwater production. So the average line represents more or less the specific heat dependence of the two energy demands combined.

The second subplot in Fig. 3 shows the dependence of the specific heat demand on the binary variable "has hotwater" that encodes if the predicted heat demand corresponds to space heating only (0) or to combined space heating and hotwater (1). From the plot we see, that, as expected, heating systems including hotwater production have a higher specific heat demand in average. For some buildings the difference seems to be more pronounced than for others.

The third subplot in Fig. 3 shows the dependence of the specific heat demand on heating degree days. Our dataset so far contains heat consumption measurements from four different regions in Switzerland, covering different altitudes between about 300 to 800 meters above sea level. Quite surprisingly, there seems to be almost no dependence on this feature at all. Actually, we would expect to have a positive relationship: the more heating degree days the higher the specific heat demand of a building. As mentioned earlier, the heating degree days are the only input feature of our model encoding the geographic location of a building. One explanation of the mismatch between expected and observed dependence is, that in addition there are other structural effects between different regions in our dataset, that more or less happen to equalize the climatic influences.

### Simulating Future Scenarios

Apart from predicting current heat demands of buildings, another potential application of our Heat Demand Model is the simulation of future scenarios. The attributes that may change in the future, and are hence of most interest for this use case, are:

the construction year (if a building gets destroyed and rebuilt),

the condition/performance of the building envelop (represented by the last renovation year, or more detailed informations about what has been renovated),

and the climatic situation (represented in terms of heating degree days in our model).

Concerning the first point, construction year, it is clearly visible from the first subplot in Fig. 3 above, that a complete reconstruction of a building likely will result in a significant heat demand reduction.

The second point, performance of the building envelop, is more tricky. We are currently using the last renovation year as an input feature to the model. However, here, the gain in specific heat demand after a renovation (when visualized with a corresponding partial dependence plot) is clearly smaller as could be expected. We are currently intensively working on improving the quality and degree of detail of data about building renovations.

Concerning the third point, climatic situation, we still need more heat consumption data to train our model, in order to learn a better picture about the influence of the climatic factor. If we have a more uniform coverage of measurements over different Swiss regions, we could introduce an additional input feature accounting for regional differences, and thereby decouple local structural effects from climatic influences. In addition to a spatial extension, we can also increase our dataset in the temporal direction. As described above, we are training our model with long-time averages of yearly heat consumption measurements. Training the model with yearly measurements together with different heat degree day values per year could also increase the amount information that we are able to extract from our training set about climatic influences.

## Conclusion

Hopefully you got a feeling about how prediction of heat demand is a complex, extensive, and, last but not least, interesting topic.

If you are interested in using our Heat Demand Model, feel free to contact us.

If you want to tune our model to match your own special use case, also contact us. As described, our machine learning approach is quite flexible and can be further developed in many directions.

You could also help us to further improve our model by supporting us with heat consumption measurement data of your region.

## Literature

[1] Schneider S., Hollmuller P., Le Strat P., Khoury J., Patel M. and Lachal B. (2017). Spatial–Temporal Analysis of the Heat and Electricity Demand of the Swiss Building Stock. Front. Built Environ.