Next: Introduction, Up: (dir) [Contents]
This manual is for the CLOP Pools Package (version 1.4, 29 September 2016), which calculates optimal picks for sports betting pools.
Copyright © 2006 Bryan Clair
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved.
• Introduction: | ||
• Football pools: | ||
• Tournament pools: |
Next: Football pools, Previous: Top, Up: Top [Contents]
The CLOP Pools Package is a suite of command line utilities for working with sports betting pools. The package generally implements algorithms from the article Optimal Strategies for Sports Betting Pools (Clair,Letscher 2005), but also includes the main algorithm from March Madness and the Office Pool (Kaplan, Garstka 2001). The package and supporting information are maintained at http://math.slu.edu/~clair/pools.
The model used for a betting pool requires three inputs:
The size of the pool is given (when needed) via the command line argument -n. The actual and perceived probabilities are stored in ASCII text files and are passed as command line arguments.
Pick sets are coded as single lines of ASCII text, and are passed to the pools utilites on standard input, and produced on standard output.
Generally, the package is intended to play well with Unix utilities such
as cut, paste, and sort, and the programs are designed to fit nicely into
pipelines. The programs fall
, fseek
,
fsmooth
, fcanon
, tgreedy
, and tcanon
all generate pick sets on stdout. The programs fstats
and
tstats
calculate interesting statistics for pick sets provided on
stdin. For example, the command:
fcanon -qd nfl_data | fstats -n50 nfl_data
calculates the expected return for a bet on all the underdogs in a 50 player football pool described in file nfl_data.
This package uses the GNU Scientific Library
(http://www.gnu.org/software/gsl/) for numeric computations.
The programs fall
, fseek
, and fsmooth
are all threaded for speed on multiprocessor machines.
Next: Tournament pools, Previous: Introduction, Up: Top [Contents]
Football pools consist of g games which are assumed to be independent. The number of games is limited by the number of bits in an unsigned int. Running a 16 game pool on a machine with 16 bit ints has not been tested, and could potentially cause problems.
• Football pool data file format: | ||
• Football picks format: | ||
• Football programs: |
Next: Football picks format, Up: Football pools [Contents]
A football pool data file contains all actual and perceived probablities needed to model the pool. A line beginning with ’#’ is a comment and is ignored, as are blank lines. The file begins with a header record on a single line which is followed by one game record line for each game in the pool.
The header record gives text names for each columnn of fields.
Each game record has from 3 to 16 whitespace separated fields:
Here is a sample data file:
# Data from week 3 of the 2005 NFL season HOME AWAY SAGARIN ESPN YAHOO BUF ATL 0.60 .459 .448 CHI CIN 0.48 .214 .228 DEN KC. 0.42 .278 .274 GB. TB. 0.41 .341 .334 IND CLE 0.87 .975 .970 MIA CAR 0.63 .206 .152 MIN NO. 0.51 .509 .533 NYJ JAX 0.64 .527 .480 PHI OAK 0.82 .940 .935 PIT NE. 0.61 .709 .619 SD. NYG 0.47 .635 .720 SEA ARZ 0.80 .877 .860 SF. DAL 0.67 .140 .125 STL TEN 0.54 .756 .763
In this file, the columns are the probability of the Home team winning as predicted by Sagarin ratings, the probability that a given participant in ESPN’s football pool chose the Home team, and the probability that a given participant in Yahoo’s football pool chose the Home team.
By default, the 3rd and 4th column are used as the actual and perceived probability of Team 1 winning. To choose different columns, use the -A and -P options to the football programs. For example,
fseek -n150 nfl05_week3_data -PYAHOO
would search for the best picks in a 150 participant pool, using SAGARIN data as actual probabilities and YAHOO data as perceived probabilities.
Next: Football programs, Previous: Football pool data file format, Up: Football pools [Contents]
Picks are read and written as a whitespace separated list of winners. Names must match exactly and be in the same order as the team names in the associated pool data file. For example:
BUF CHI DEN TB. IND MIA MIN NYJ PHI NE. NYG SEA SF. TEN
Previous: Football picks format, Up: Football pools [Contents]
Programs to generate picks: | ||
---|---|---|
• fall: | Calculate expected return for all possible picks. | |
• fsmooth: | Calculate an approximation to expected return for all possible picks. | |
• fseek: | Hill-clibming search for optimal picks. | |
• fcanon: | Calculate canonical picks. | |
Programs for calculating statistics: | ||
• fstats: | Compute statistics for picks. | |
Next: fsmooth, Up: Football programs [Contents]
Calculate expected return for all possible picks. Writes 2^g lines of output. Each line shows the expected return for a set of picks, a tab character, and then the picks in the above format.
It is useful to pipe the output of fall
to sort -nr
to get a list sorted in descending order of quality.
Usage:
fall [-tqnAP] datafile
-q
Quiet. Suppress display of header.
-t<threads>
Threads. Specify number of computation threads (default 2).
-n<competitors>
Number of competitors.
-A<actuals>
-P<perceiveds>
Actual or perceived probabilities. Specify column header from datafile.
Next: fseek, Previous: fall, Up: Football programs [Contents]
Calculate expected return for all possible picks. Writes 2^g lines of output. Each line shows the expected return for a set of picks, a tab character, and then the picks in the above format.
fsmooth
operates exactly the same as fall
, except that
the normal approximation is used to calculate the expected return for each set
of picks. fsmooth
is considerably faster than fall
.
It is useful to pipe the output of fsmooth
to sort -nr
to get a list sorted in descending order of quality.
Usage:
fsmooth [-tqnAP] datafile
-q
Quiet. Suppress display of header.
-t<threads>
Threads. Specify number of computation threads (default 2).
-n<competitors>
Number of competitors.
-A<actuals>
-P<perceiveds>
Actual or perceived probabilities. Specify column header from datafile.
Next: fcanon, Previous: fsmooth, Up: Football programs [Contents]
Hill climbing search for a pickset which is a local maximum for expected return. The search begins with a good guess based off of typical results. The search ends at a pick set which has larger expected return than any other set which differs by at most two games.
fseek
is very effective at finding the best pickset quickly.
However, it might in theory become stuck at a local maximum which is not
the global maximum.
Usage:
fseek [-tqnAP] datafile
-q
Quiet. Suppress display of header.
-t<threads>
Threads. Specify number of computation threads (default 2).
-n<competitors>
Number of competitors.
-A<actuals>
-P<perceiveds>
Actual or perceived probabilities. Specify column header from datafile.
Next: fstats, Previous: fseek, Up: Football programs [Contents]
Calculate canonical picks for a football pool. Canonical picks include the favories, the underdogs, and the edge picks (and will display in that order if more than one are requested). Favorites and underdogs use the actual values, so if you want to see perceived favorites/underdogs, use the -A option. The edge picks are optimal for a sufficiently large number of competitors, and maximize the ratio A/P.
Usage:
fcanon [-qfdeAP] datafile
-q
Quiet. Display only the picks.
-A<actuals>
-P<perceiveds>
Actual or perceived probabilities. Specify column header from datafile. Note that -P is only useful in conjunction with -e.
-f
Calculate actual favorites.
-d
Calculate actual underdogs.
-e
Calculate edge picks.
Previous: fcanon, Up: Football programs [Contents]
Calculate statistics for picksets read on standard in. The default behavior is to print the expected return followed by the picks.
Usage:
fstats [-qnAPsdgvw] datafile
-q
Quiet. Suppress display of header.
-n<competitors>
Number of competitors.
-A<actuals>
-P<perceiveds>
Actual or perceived probabilities. Specify column header from datafile.
-s
Smooth. Use the normal approximation to calculate expected return.
-d
Detailed. Show detailed statistics. Shows the expected return (exp), and expected return with smooth approximation (sexp). For both the actual and perceived data, it shows the probability that these picks occur exactly (prob), the mean and variance of the number of games these picks will agree with (mean, var), and the number of underdogs picked (upsets).
-g
Game-by-game. Displays five columns of data for each game. The first (Pick) is 1 or 0 depending on whether the actual favorite or actual underdog was chosen by the pickset. The next columns give the actual and perceived probabilites for the chosen team to win. The final two (which take some time to compute) give numeric calculations of the partial derivative of expected return with respect to a change in the input variables a_i or p_i for that game. These can be though of as a measure of the sensitivity of expected return to the data for that particular game. Keep in mind that probabilites range from 0 to 1, and that a change of .01 makes a much bigger difference to a probability of .98 than it does to a probability of .5.
-v<actual spread>
Vary actuals. This option is intended to test robustness of the expected return value. This option calculates the expected return for the given set of picks 200 times while varying the actual probabilities used to model the pool. For each calculation, each value a_i is chosen uniformly randomly from an interval centered at the original a_i with width 2*<actual spread>. Statistics are calculated for the 200 values of expected return, and displayed.
-w<perceived spread>
Vary perceiveds. Same as -v, but varies p_i. Using both -v and -w will vary both at the same time.
Previous: Football pools, Up: Top [Contents]
A tournament pool involves picking all games of an R round single elimination tournament with 2^R teams. Currently, the maximum number of allowable rounds is 14 (which is ridiculously large).
With tournament pools, the scoring method is variable. In this version of CLOP, only two scoring methods are implemented: power-of-two scoring and ESPN scoring. In power-of-two scoring, correct picks are worth 1,2,4,8,… in increasing rounds. In ESPN scoring, correct picks are worth 10,20,40,80,120, and 160 points in increasing rounds. Any tournament program that uses scoring will use power-of-two scoring by default and accept the -E option to switch to ESPN scoring.
• Tournament pool data file formats: | ||
• Tournament picks format: | ||
• Tournament programs: |
Next: Tournament picks format, Up: Tournament pools [Contents]
A tournament pool is described by three collections of data: team names, actual probabilties, and perceived probabilities. A collection of probabilities can be given in one of two ways, as head-to-head data or as winround data.
Within data files, team order is important, because it determines which teams play in which round (using the usual single elimination bracket) and it must remain consistent for all files used in a given pool.
In the sections below, T is the number of teams in the tournament and R is the number of rounds.
• Names file format: | ||
• Head-to-head probability file format: | ||
• Winround probability file format: |
A team names file begins with a header line containing the keyword
names
followed by the number of teams (T) in the tournament,
followed by an optional comment to the end of the line.
Each subsequent line contains a team name, which may optionally
be in double quotes. Quotes are useful to include whitespace in
the team name, which makes ASCII picks output much nicer.
Here is an example tournament with four teams. The first round matchups are Aardvarks-Bison and Chihuahuas-Ducks.
names 4 Bryan's Imaginary Playoffs "Aardvarks " "Bison " "Chihuahuas" "Ducks "
Next: Winround probability file format, Previous: Names file format, Up: Tournament pool data file formats [Contents]
A head-to-head data file begins with a header line containing the keyword
h2h
followed by the number of teams T in the tournament,
followed by and optional comment to the end of the line.
Data follows as T*T floating point numbers, in order:
P(0 beats 0) P(0 beats 1) .. P(0 beats T-1) ... P(T-1 beats 0) .. P(T-1 beats T-1)
The data is redundant since P(i beats j) = 1 - P(j beats i). Values for P(x beats x) are required but ignored.
Here is an example that goes with Bryan’s Imaginary Playoffs:
h2h 4 Close. Team 2 (Bison) have an edge. .5 .4 .4 .7 .6 .5 .7 .6 .6 .3 .5 .6 .3 .4 .4 .5
A winround data file begins with a header line containing the keyword
winround
followed by the number of teams T in the tournament,
followed by and optional comment to the end of the line.
Data in the file comes in two series, the solo and pair series.
The solo series begins with
the keyword solo
followed by the probabilities
of team i winning round r for all i,r.
The pair series begins with the keword pair
followed by
the probabilities of team i winning round r and team j winning round s for all i,j,r,s.
The pair series is optional. If it is omitted, the data no longer contains enough information for the theoretical model of the pool. In that case, CLOP will estimate the pair data and print a warning message to stderr. See the Optimal Strategies paper for details.
The solo series is size (T * (R+1)), in order:
P(0->0) P(0->1) ... P(0->R) P(1->0) ... P((T-1)->R)
The pair array is size (T * T * (R+1) * (R+1)), in order:
P(0->0 & 0->0) P(0->0 & 0->1) .. P(0->0 & 0->R) P(0->1 & 0->0) .. P(0->1 & 0->R) ... P(0->R & 0->0) .. P(0->R & 0->R) P(0->0 & 1->0) P(0->0 & 1->1) .. P(0->0 & 1->R) ... P(0->R & 1->0) .. P(0->R & 1->R) ... P(0->R & (T-1)->0) .. P(0->R & (T-1)->R) P(1->0 & 0->0) ... P((T-1)->R & (T-1)->R)
Here is an example that goes with Bryan’s Imaginary Playoffs:
winround 4 Team 2 (Bison) very strong. solo 1.000 0.300 0.180 1.000 0.700 0.525 1.000 0.500 0.125 1.000 0.500 0.170 pair 1.000 0.300 0.180 0.300 0.300 0.180 0.180 0.180 0.180 1.000 0.700 0.525 0.300 0.000 0.000 0.180 0.000 0.000 (14x9 more floats)...
Next: Tournament programs, Previous: Tournament pool data file formats, Up: Tournament pools [Contents]
A set of picks for a tournament is stored in “depth format” as a list of integers in the range [1…R+1], one for each team. The number for each team indicates which round that team will reach.
In Bryan’s Imaginary Playoffs, here is a bracket in which the Bison beat the Chihuahuas in the finals:
1 3 2 1
The tstats program can display brackets in a human readable ASCII format. The pix2tex utility can create a TeX file that displays the bracket graphically.
Previous: Tournament picks format, Up: Tournament pools [Contents]
Programs to generate picks: | ||
---|---|---|
• tseek: | Hill-clibming search for optimal picks. | |
• tcanon: | Calculate canonical picks. | |
• trandom: | Generate random picks. | |
Programs for calculating statistics: | ||
• tstats: | Compute statistics for picks. | |
• tscore: | Calculate score of picks given an outcome. | |
Utility programs: | ||
• tsim: | Simulate tournaments. | |
• dumph2h: | Dump probability data in h2h format. | |
• dumpwinround: | Dump probability data in winround format. | |
• pcalc: | Calculate probability data from a collection of opponent picks. | |
• pix2tex: | Generate a LaTeX picture of a filled in bracket. |
Next: tcanon, Up: Tournament programs [Contents]
Performs a hill-climbing search for picks that maximize expected return. Each trial chooses a random starting pick (uniformly distributed over the set of all possible brackets) and hill climbs to a local maximum. The process is repeated for the specified number of trials. Picks that improve on previous results are displayed when found.
Usage:
tseek [-nEqvts] namesfile actualfile perceivedfile
-n<competitors>
Number of competitors.
-E
Use ESPN scoring.
-q
Quiet. Display only one set of picks (the best found) when all trials are finished.
-v
Verbose. Display all intermediate picks for each trial.
Using tseek -v -t1 …
is a good way to get a feel for the
hill climbing process.
-t<trials>
Trials. Specify number of trials (default is to run trials forever).
-s<seed>
Seed. Specify (long integer) seed for random number generator (default seeds with the current time).
Next: trandom, Previous: tseek, Up: Tournament programs [Contents]
Display canonical statistics and picks for a tournament pool. The statistics (shown unless -q is used) describe opponent scoring. The six sets of picks are:
Usage:
tcanon [-Eq] namesfile actualfile perceivedfile
-E
Use ESPN scoring.
-q
Quiet. Suppress headers.
Next: tstats, Previous: tcanon, Up: Tournament programs [Contents]
Generate random picks. Each game is 50-50 unless the optional datafile is given to specify the probabilities.
Usage:
trandom [-nR] [datafile]
-n<count>
Generate <count> set of picks. Default is 1.
-R<rounds>
Specify number of rounds in the tournament. If datafile is given, uses the rounds for that datafile. If unspecified, defaults to 6.
Next: tscore, Previous: trandom, Up: Tournament programs [Contents]
Calculate statistics for picksets read on standard in. After reading input, tstats produces a header with the comments from the input files and statistics describing opponent scores. Then, for each set of picks on stdin, tstats displays the picks in a human readable ASCII form and displays statistics for the picks. The statistics are:
expected return
The expected return on a bet of 1 on these picks.
actual probability
The probability these picks actually occur.
actual mean score, actual score standard deviation
The mean score and SD for these picks.
perceived probability
The probability one opponent will make these picks exactly.
perceived mean score, perceived score standard deviation
The mean score and SD for these picks if the tournament games were played using the perceived probabilities.
correlation with opponents
The correlation (\in [-1,1]) between the score of these picks and the score of one opponent.
Usage:
tstats [-nEqsetP] namesfile actualfile perceivedfile
-n<competitors>
Number of competitors.
-E
Use ESPN scoring.
-q
Quiet. Suppress headers.
-s
Don’t show stats.
-e
Don’t show expected return.
-t
Don’t show teams.
-P
Show perceived probability only. (This was useful, once.)
Next: tsim, Previous: tstats, Up: Tournament programs [Contents]
Quick and dirty program to calculate the scores of picks on stdin, given
a set of picks as the input file outcome
.
Usage:
tscore [-E] [-r rounds] outcome
-E
Use ESPN scoring.
-r rounds
Specify number of rounds. Default is 6.
Next: dumph2h, Previous: tscore, Up: Tournament programs [Contents]
Simulate tournaments. Computes results for each set of picks Y read on standard in. Each trial chooses n competitor picks, either randomly using perceived probablities or by selecting from the opponentpicks file if provided. Each trial chooses winners using actual probabilities and calculates the score and winnings for picks Y. After all trials are finished, the summary results for picks Y are displayed.
Usage:
tsim [-nEqst] namesfile actualfile perceivedfile [opponentpicks]
-n<competitors>
Number of competitors.
-E
Use ESPN scoring.
-q
Quiet. Suppress headers.
-s<seed>
Seed. Seed random number generator with seed. Default is to use current time.
-t<trials>
Number of tournaments to simulate. Default is 10000.
Next: dumpwinround, Previous: tsim, Up: Tournament programs [Contents]
Utility program to read in a probability file and dump a correctly formatted probability file in h2h format. Use for converting winround to h2h.
Usage:
dumph2h probfile
Next: pcalc, Previous: dumph2h, Up: Tournament programs [Contents]
Utility program to read in a probability file and dump a correctly formatted probability file in winround format. Useful for converting h2h to winround (because the solo information is interesting for computer ranking generated h2h files).
Usage:
dumpwinround [-p] probfile
-p
Only dump solo data.
Next: pix2tex, Previous: dumpwinround, Up: Tournament programs [Contents]
Calcualate a table of winround data from a list of picks.
Given a series of picks on either stdin or in picksfile
,
computes solo and pair data by counting occurences of teams
reaching rounds. Dumps results to stdout as a winround format file.
This is how you get perceived probabilities if you have a large
collection of opponent picksets.
Usage:
pcalc [-r<rounds>] [picksfile]
-r<rounds>
Specify number of rounds in tournament. Default is 6.
Previous: pcalc, Up: Tournament programs [Contents]
From a set of picks and a tournament names file, pix2tex
generates
LaTeX output to draw a filled in bracket.
Width and height are specified as floating point numbers and are used to position
the elements of the bracket. LaTeX will interpret these as points, by default,
although you could change \unitlength
in your document to adjust this.
Usage:
pix2tex [-h<height>] [-w<width>] namesfile