You're seeing high peaks in the real observation data, or in the current simulations of the RL model?
My main worry would be that there is imprecision in the controls (set flow rate to X gpm, actually get more or less by an amount that isn't predictable), and delays in impact (time from starting to heat to seeing the air temperature change) which your simulation is making too precise or differently from the real world.
There is a training phase (1-6 years of weather observations) where the RL-agent trains using the building simulation program. Then I evaluate on 2022 weather data using the same building simulation program with the agent I previously trained. The graph contains real measured values from 2022 (blue line) of the building. The agent is evaluated on the weather data from year 2022.
Yes. The building has a certain inertia, this is something I hope the agent want to learn as well. The 36 hours outdoors temperature forecast is supplied in the observation state so that the agent knows it should preheat the building when forecast temperature is going down to lower the heating peak penalty.
I assume you've seen these, but if not, there are some relevant papers here: https://scholar.google.com/scholar?q=deepmind+reinforcement+learning+cooling+data+center&hl=en&as_sdt=0&as_vis=1&oi=scholart
I have read some articles, I can have a look again, thanks. Were to much information and possibilities so felt better to talk to a human.
In short, we are trying to use Reinforcement Learning to try to control the heating of a building (district heating) with the input buildings zone temperature, outdoor temperature. To not use the real building during training of the RL-algorithm we are using a building simulation program as an environment.
The building simulation program has inputs:
Outputs from the building simulation program are:
The aim of the RL-algorithm is to make a more efficient control of the buildings district heating use, then the current district heating control function. The primary goal is to make the RL-algorithm peak-shave the district heating use.
We are using ClippedPPO as an agent using a RL-framework. As a comparison we have district heating data from one year from the building we want to control. The building is modelled in the building simulation format.
Action space of the RL-algorithm is:
Observation space of the RL-algorithm is:
In each timestep the RL-environment takes the input from the building simulation program and calculates a penalty from the observation state that is returned to the agent. The penalty is calculated as a sum of 4 different parts. Each part has a coefficient that by art I have been trying to figure out. Some of parts are for example the -coeff1*heating_rate^2, -coeff2*heating_derivative and -coeff3*unfomfortabletemp (large penalty when indoor temperature less than 19C)
The problem is that we are seeing heating with high peaks that we want the RL-algorithm to shave. So if anyone has any idea on how to get this working or give some insight on how to progress.
The orange part is the RL-resulting hot water heating rate and the blue part is the real-world measured values for 2022: