PDA

View Full Version : Colorado Baseball Stats



mathare
13th April 2005, 22:00
In the bath/shower and on the train - two of my favourite thinking places. I have most of my best analysis ideas there and both contributed to the analysis I have done here. :)

Larsson7 sent me some stats from baseball matches played in Colorado over the past 12 years. He sent me the frequency of each run total from 0 to 36, 937 games worth of data in all. More than enough for me to get my teeth into. The original angle was to be on spread betting.

At first I was struggling to get to grips with the data. I wasn't sure where to start but it soon fell into place. I worked out the probablity of each score occurring using the frequencies and total number of games. This data would form a large part of later analysis.

Next I worked out the average score. Now, I know there are several different definitions of average so I'd better clarify that statement. The mode/modal (most frequent) score was 13 and the mean was 13.255. I guess that means if the spread firms are setting the spread at 13-13.5 they are doing their job about right. That assumes of course that the opposition is an average team. The line may increase/decrease if the opposition are known to score or concede many/few runs.

I then realised that the analysis I was planning actually brought out fixed odds analysis as a natural by-product, which was an unexpected bonus. By summing the probabilities of scores above lines from 0-36 in 0.5 increments I got the probability of any line in that range being exceeded. And once you have a probability you can work out value odds. I did this for over and under each line in the aforementioned range. I figured this would be useful somewhere along the line, especially as Bet365 offer several lines at varying odds. The advertised main line may not be of value but there is a chance one of the other lines is I guess.

The spread betting angle was next. What I did was first work out the product of the probability of occurring and the number of runs for each score. Then in a similar way to the fixed odds probability calculations I summed these values for above and below lines from 0 to 36 in 0.5 increments. Subtract the belows from the aboves and for each line you have an idea of how to bet whatever the line is.

With the spread bet data there is a transition from positive to negative between 13 and 13.5. This is in agreement with a mean score of 13.255. The bigger the magnitude of this value the better the bet. This occurs as you get further away from the mean in either direction. If the line is much above the mean you should (in my opinion) sell ie bet low whereas if the line is et significantly below the mean I reckon you should buy ie bet high. Significantly here may only be a 1 or so away from the mean. It's up to you to set your own comfort limits on these values.

For example, if the line is set at 13 the spread "expectation" is 0.48 whereas at 12 it is 2.79. If the line is set at 13 you may wish to avoid this game but if the line comes down to 12 you may decide that this becomes a good buy. It's all up to you.

I'm not making any hard and fast suggestions about how to bet the lines on Colorado games here, just manipulating data. I will attach the spreadsheet I produced from Larsson7's results and you can have a look yourself and draw your own conclusions about what to do.

Cheers Al for giving something to exercise the brain :) It's not quite what I told you via PM I would do but it should be of some use at least, if nothing more then it should get a few of us thinking/talking.

MattR
13th April 2005, 22:43
Great stuff Mathare. 13.255 average , don't you just love high altitude! Unless you're a pitcher of course. Certainly gives food for thought.

Can I throw something out here for you to see if perhaps you can 'work it in' to the stats somehow. Something I used last year to a reasonable effect was team and starting pitcher's era. I was using this in a calculation to work out a possible run line for a game and it was showing pretty good results, unfortunately it was a lot of work to set up at first and I just didn't have the time to continue it.

Anyway it seems fair to say that the pitcher's ability is to some degree or other going to effect the chances of how 'high' the colorado run line is going to go, which may give a further guide to the value of the set line. If we can work out an average that Colorado's home ERA is above the league average then perhaps we could increase the starting pitchers (of opponents) by that to factor in his 'potential' ERA at Colorado. There are lot's of if's and but's of course, so many factors as to who a pitcher has played against, where played etc but you can only work with what you have.

In my calculations last season I was using 6* the ERA of the starting pitcher and 3* the team ERA to try and factor in the relief pitchers. IE 6 inns from the starter and 3 from the bullpen. This was as I say working fairly well in judging potential runs in a game. Those where it was 2 or more out from the bookmaker's line it had a good strike rate. I was also using teams average runs scored, however for Colorado again perhaps you'd have to factor the percentage the Colorado average is above the league average.

What do you think - if that made any sense :D

mathare
13th April 2005, 22:49
What do you think - if that made any sense :DNot really, I'm afraid. Sorry. I don't really understand baseball that well. I know there are loads of stats so I should love the game but I don't really get chance to watch it so I don't get into it at all.

That said if you have more ideas for stats to use in analysis and how they could be used I am happy to try and work through the numbers. :)

Larsson7
13th April 2005, 22:49
Matt....good work mate....and again cheers for taking the time to go over the results I sent you .

I see the angle you have re fixed odds betting , and esp as Bet365 do offer different lines with to the standard set by for eg Wm Hills , and how that could work in our favour , albeit as possibly reduced odds , which calls for a higher strike rate of course .

Before I collated the past 12 years results , I knew runs were more plentiful in Denver due to the altitude , hence the line is higher than average..however , was always wary betting on tgames with double digit lines..however I now see how I can using the sliding scale of lines offered by Bet365 as you pointed out .

I still like the idea of using Colorado home games for Spread Betting , as I see the downside being really minimal here , and the potential upside can be quite exciting .

Anyway , I will have a look at your spreadsheet , and if its ok , get back to you with anymore questions/suggestions that come to mind.

Thanks again .

Al.

mathare
13th April 2005, 22:52
Anyway , I will have a look at your spreadsheet , and if its ok , get back to you with anymore questions/suggestions that come to mind.
I never claim to come up with all the answers first time so I am happy for people to come back with questions. After all you are the expert on the game here, I just manipulate numbers often without context :)

MattR
13th April 2005, 23:00
Not really, I'm afraid. Sorry. I don't really understand baseball that well. I know there are loads of stats so I should love the game but I don't really get chance to watch it so I don't get into it at all.

That said if you have more ideas for stats to use in analysis and how they could be used I am happy to try and work through the numbers. :)


ERA stands for Earned Run Average. It to keep it simple is the number of runs that a pitcher would give up if he pitched an entire 9 innings. So the number of runs he gives up is divided by the number of innings he has pitched and worked into an ERA which expresses it as the average for an entire game. So, if say a pitcher pitched 5 innings and gave up 5 runs then his ERA for the game would be 9.00 5 runs/5 inns * 9. There's more to it but that's just to let you know what it is.

I was using ERA of both teams and both starting pitchers and also the average runs scored per game by the teams concerned. What I'm wondering is whether this could be used and increased by the percentage that Colorado's mean average of 13 odd runs per game is above the league average. This would give an estimated runs expected in the game and could maybe then be factored into the 12 year stats???

mathare
13th April 2005, 23:06
I get it now Matt.

Definitely possible to use all that data to some effect but my major concern surrounds maintaining the ERA data. The data I have used here is a one-shot thing and there is no real need to maintain it but I imagine ERA data will need to be up-to-date and accurate to be of any use, no?

MattR
13th April 2005, 23:11
I get it now Matt.

Definitely possible to use all that data to some effect but my major concern surrounds maintaining the ERA data. The data I have used here is a one-shot thing and there is no real need to maintain it but I imagine ERA data will need to be up-to-date and accurate to be of any use, no?

Yes, you are right. Fortunately the baseball sites all have it and each pitchers ERA and each teams ERA is readily available updated each day.

Just checking the schedule Colorado are home on Friday. How about just to give you something to perhaps play around with I post here in a minute Colorado's opponents data for then and also their own pitcher's ERA etc? Of course the main problem this early in the season is the lack of games played can skew averages. I could also include the pitcher's Career ERA to perhaps try and even out the season's average?

mathare
13th April 2005, 23:15
Post anything you think may be useful mate along with some idea of what sort of effect the data could have on a game (high values likely to increase the run total etc) and I will put my thinking cap on.

Can't promise anything for Friday as I have other commitments tomorrow night but it may be an angle worth looking into for the rest of the season.

Is Colorado the only team worth looking at these stats for? The spreadsheet for the run totals data is set-up now and it's not that much effort to do something similar for other teams. Or is it just not worth it for other teams?

MattR
13th April 2005, 23:28
Post anything you think may be useful mate along with some idea of what sort of effect the data could have on a game (high values likely to increase the run total etc) and I will put my thinking cap on.

Can't promise anything for Friday as I have other commitments tomorrow night but it may be an angle worth looking into for the rest of the season.

Is Colorado the only team worth looking at these stats for? The spreadsheet for the run totals data is set-up now and it's not that much effort to do something similar for other teams. Or is it just not worth it for other teams?

I'll put the stuff up for you to work on then, don't worry if it's not until after the game is played, whenever you hame time mate. The data you have will still be relevant even if it's after the fact.

All parks have their nuances (they actually all have ratings for home runs/triples/doubles etc) but Colorado is the stand out one. They do have parks that are considered pitchers parks though, as home runs are more difficult to come by etc. Detroit for instance has fewer home runs than most if not all I believe although that increases the amount of doubles and triples (bases made in an at bat) as the outfield is large.

It might be possible though to factor these stats in with any teams park average runs, after all it's two teams making those runs each day.


ESPN.com - MLB - Park Factor (http://sports.espn.go.com/mlb/stats/parkfactor?season=2004)


There's five years park factor data there, don't know if you can work with that at all?

EDIT: In case you're not familiar: HR (home run) 2b (double - meaning time a batter got to second base with a hit) and 3b (triple as double but to third base) BB (meaning base on balls abbreviation used for a Walk not sure why that should be used in park factor stats!)

mathare
14th April 2005, 09:30
I'll have a look later Matt and see what I can do

mathare
14th April 2005, 09:42
Just had a quick look and I have a few questions.

1) There are 29 teams/grounds listed for the 2004 stats. Is this all the teams? Do all these teams still exist at these grounds this season?
2) Could the data for the past 3 or 4 years be combined or have new teams appeared/teams moved ground etc?
3) Any idea how those numbers are derived? I understand the abbreviations now (thank you!) but what do the numbers actually mean? I read the note about a Park Factor > 1 favouring the hitters and < 1 favouring pitchers but don't quite get how to use this data. Is the PF a potential modifier on the league average total runs/per game?

For example if a ground had a PF of 1.500 can we expect 50% more runs per game on average? And if one had a PF of 0.800 we'd get totals averaging only 80% of the league total? Does that sound right?

If it does, where can we get data such as the average total runs per game? Preferably on a per season basis...

MattR
14th April 2005, 12:58
Matt,


Ok......

1. Good spot, didn't notice there were only 29 there. There are 30 teams. Could be they've wiped out Montreals as there is no longer a team there, they have moved to Washington. I'll check that in a minute, see if it is them that are missing.

2. Some teams have moved to new parks in the last couple of years. Most teams are at the same places (i can probably find something on that for you later on) As a rough guess I would say maybe 5-7 have moved in the last 5 years. The others will all be pretty constant. So Iwould say yes those stats could be used.

3. PF = ((homeRS + homeRA)/(homeG)) / ((roadRS + roadRA)/(roadG))
From that I believe so it's home teams runs + away teams runs (home teams against in other words)/number games / number of runs that parks team have scored away+ their opponents runs/the number of away games.
So in effect it's seeing the difference between what their home games runs are and the games on the road (away). So using your example I would say it is exactly that. A park with a run factor of 1.5 should have 50% more runs in it on average than the league average.

I'll have a search around for season run averages etc. I'll put some figures on a spreadsheet if you like so you can play around with them whenever.



What's interesting here is looking at 2004 Coors field (colorado's home stadium) is not at the top. It is on Home Runs but not on runs scored above average then. Hmmm.

Larsson7
14th April 2005, 13:56
Matt/Mathare....note LA Angels are " old" California/Anaheim Angrels.....

MattR
14th April 2005, 15:51
Matt/Mathare....note LA Angels are " old" California/Anaheim Angrels.....

and what a name eh? Los Angeles Angels of Anaheim :rolleyes:

MattR
14th April 2005, 16:46
Matt,


Got some stats together for you, attached here.

mathare
14th April 2005, 16:54
Sweet, thanks for that.

I'll look into it in more detail soon, probably this weekend

MattR
14th April 2005, 17:06
OK, didn't put above the runs averages that those are all for 2004.

MattR
18th April 2005, 12:56
Mathare,

Ok here's some stuff for todays colorado game at home if you want to see if it can be tied into the park data etc. Don't worry if/when you get to it, doesn't matter if it's played before you play around with stats, as they'll still be valid for that game.


2005 Park Factor Colorado: 1.867 to date.
Ave runs/game to date (whole league) 9.34

Colorado:
Starting Pitcher ERA (= runs against him if he were to pitch a whole game- so if his ERA were 9.00 it is saying on average he will concede 1 run per inning) Season ERA 1.50 (1 game)
Career home ERA 5.82 (23 games). I think Career stats more valid at this stage as pitcher has not played at home this season
Team ERA 7.84
Runs ave per game 7.00 (at home) 3.00 (away - quite a difference)

opponents:
Arizona Pitcher ERA
Season ERA 2.70 (1 game).
Career ERA at coors field (col's home) 4.08 (4 games)
Team ERA 5.38(average of all pitchers on team)
Runs ave per game 5.08 (not played at coors yet)



As a rough guide starting pitchers normally last on average around 5-6 innings. Very rough guide as there are variations dependent on situation etc. If a pitcher is getting hammered he could even last just one inning etc etc.


Not sure how much you'll be able to use all that to come to any conclusions but I'm sure you'll have fun with it!