The electrification of heating and cooling systems alters energy consumption patterns, presenting both challenges and opportunities for the operation of power systems. To this end, RC (grey-box) models have emerged as promising modeling structures for forecasting the thermal needs of buildings. This paper presents a tuning algorithm and performance evaluation of the grey-box model using realistic measurement data. It provides insights into the sensitivity of the learning process to sub-optimal solutions and the computational burden. Four different grey-box structures with varying complexity are evaluated. The proposed methodology is critical for integrating heating and cooling systems into future power systems. The results reveal that while the 4R3C model is the most detailed model structure among the evaluated structures, the 3R2C model proves to be the most stable, offering the best trade-off between computational burden and model complexity. Moreover, the physical representation of the building through this training structure can be challenging, as the optimization process does not consistently converge to a unique set of parameter values, indicating the presence of multiple local optima. The tuning framework is provided as an open-source modeling tool, aiming to support further research on grey-box models.