Much effort has been made to increase the integration of solar photovoltaic (PV) systems to reduce the environmental impacts of fossil fuels. An essential process in PV systems is the forecasting of solar irradiance to avoid safety and stability problems due to its intermittent nature. Most of the research has been focused on improving the prediction accuracy based on the assumption that enough on-site training data are available. However, in many situations, it is required for the implementation of PV systems in locations where not enough solar irradiance measurements have been collected. Our hypothesis is that measurements from other sites can be used to train accurate forecasting models, given an appropriate definition of site similarity. We propose a methodology that takes information from exogenous variables that are correlated to on-site solar irradiance and constructs a multidimensional space equipped with a metric. Each site is a point in this space, and the learned metric is used to select those sites that can provide measurements to train an accurate forecasting model on an unobserved site. We show through experiments with real data that using the learned metric provides better predictions than using the measurements collected from the whole set of available sites.