There is a training phase (1-6 years of weather observations) where the RL-agent trains using the building simulation program. Then I evaluate on 2022 weather data using the same building simulation program with the agent I previously trained. The graph contains real measured values from 2022 (blue line) of the building. The agent is evaluated on the weather data from year 2022.
Yes. The building has a certain inertia, this is something I hope the agent want to learn as well. The 36 hours outdoors temperature forecast is supplied in the observation state so that the agent knows it should preheat the building when forecast temperature is going down to lower the heating peak penalty.
I have read some articles, I can have a look again, thanks. Were to much information and possibilities so felt better to talk to a human.