It looks to me as though your computation of log scores in the Google Sheet are wrong, and it’s not just a sign error:
The correct log-score (Y log p + (1-Y) log (1-p), where Y is the outcome and p is the prediction) should be 0 for a perfect prediction (e.g. p=0 and the event didn’t happen) and should approach -infinity as the prediction becomes more and more confidently wrong. However, in the formula you used (-Y log (1-p) - (1-Y) log p), as we approach a perfect prediction, our score becomes infinitely large, whereas at the other extreme, the score is just 0. This can’t be a proper scoring rule, because the guesser would be incentivized to always predict p=0 or p=1.
It looks to me as though your computation of log scores in the Google Sheet are wrong, and it’s not just a sign error: The correct log-score (Y log p + (1-Y) log (1-p), where Y is the outcome and p is the prediction) should be 0 for a perfect prediction (e.g. p=0 and the event didn’t happen) and should approach -infinity as the prediction becomes more and more confidently wrong. However, in the formula you used (-Y log (1-p) - (1-Y) log p), as we approach a perfect prediction, our score becomes infinitely large, whereas at the other extreme, the score is just 0. This can’t be a proper scoring rule, because the guesser would be incentivized to always predict p=0 or p=1.