LS Tour Links 2003 'Hidden Statistics'

Return to Steve Pitts' home page

PLEASE NOTE

These pages are no longer of anything other than historical interest, but have been left here as a reminder of how things were.

Introduction

Those of you that play Links 2003 at the LS Tour will know that there are reams and reams of statistics kept about every player and event, but that it isn't always easy to see the wood for the trees. You will also be aware that there are certain practices, frowned on in some quarters, that can only be detected by a study of individual statistics. The pages that you will find linked from this one arose, originally, out of a desire to find a better way of ranking the players at the LS Tour, irrespective of the swing type they use or the ability level they choose to play at. It then extended to take in trying to identify who was (ab)using two particular techniques – chipping on the greens and requalification.

The data pages here are divided into four categories (although I may add others if anyone shows any interest – how about an events finished ratio that shows who finishes what they started and who seems to lose interest a lot??), each of which has a section below explaining what it is about. There are also a handful of paragraphs about the data collection mechanism, and on known issues with the program that collects the data and generates these pages or the methodology itself.

WARNING!! Some of the pages in this section are monstrously large. Having no database capabilities on my NTLWorld web space, I am forced to present each section in full (or chop things up into arbitrary page sizes, and worry about page naming and navigation etc.) so the full summary pages and the click summary, Pro and Amateur sections are all between 200K and 500K in size. This is much too big (I normally work to a limit of 40 or 50K per page) but if you'd like to view the information you'll just have to take the pain. Sorry


Data Collection

The data for these pages is collected from the LS Tour web pages using an HTTP spidering process (similar to that used by indexes like Google, but a lot less complex). The spider starts by reading the rankings page for each combination of swing type and ability level, from which it extracts each player's rank, member id, tour name, flight, scoring average, points, and number of events started and completed.

For each player found on that ranking list the spider reads their member details page (for the same combination of swing type and ability level) and extracts their full tour name (since the name on the rankings pages can be abbreviated), the number of holes and rounds played and the number of penalty strokes accrued. It then reads the club statistics table accumulating the number of hole outs and also recording how many hole outs exceeded the number of uses for a given club.

As the process goes along most of these values (with some obvious exceptions, like names and flights) are also being aggregated by swing type, by ability level and overall, so that we can produce the full combined picture for each player as well as the detail.

Currently the whole process reads just under 1800 pages, containing about 37MB of data (the vast majority, about 86%, of which is for Click players) and takes about two minutes to strut its stuff. At the end of the process all of the collected data is written to a file so that the calculations and web page generation can be repeated without the need to revisit the tour web pages.


Known Issues

Because the analysis starts by reading the base data from the tour web pages just after the end of each week, it is generally the case that problems with that process will not be fixed until the following week. This section details any known problems with the data collection. Some of them may be endemic, whilst others may be resolved by next week's data collection run. If you find any errors or anomalies in the data on these pages then please either start a thread in the LS Tour lobby forum, which I frequent on a regular basis, or leave me a private message there.



Index of Rankings pages

Rankings

This is my (possibly vain) attempt at producing a meaningful rating of players between swing types and ability levels, taking into account the very different levels of participation across the board. At the moment the algorithm is a very early iteration and I've done little to validate the results. However, the focus as the weeks go by will be on refining it to produce more 'accurate' results, although the efficacy of the method will be an entirely subjective judgement. For the time being, I will keep the formula to myself, but once I've done some serious validation and cross-checking I will publish the final method.

Definitions of header columns on Rankings pages:

Events
the number of events a player has started
Finishes
the number of events a player has completed (with being cut from a second round cut event counting as a completion)
Points
the number of points a player has won from completed events
Adj. Points
an adjusted points total for the player, which currently takes into account the number of players ranked at each combination of swing type and ability level
Eff. Rating
the player's effectiveness rating, ie. the adjusted points total divided by the number of events entered
NPPE
'normalised points per event', ie. the normalised points total divided by the number of events entered (see next section)


Index of Normalised Points pages

Normalised Points

The normalised points pages provide a means of comparing players based on the points they have accumulated by finishing tournaments. All points totals are normalised (by multiplication) such that a win at Amateur on a particular tour is worth the same as a win at Elite – on the basis that the ability level has no effect on how difficult it is to win an event (in fact is arguably harder to win at Amateur than at any of the other ability levels). This adjusted points total is then divided by the number of events started, thereby automatically penalising incompletions, and that figure is used to rank the players.

At the detail level these rankings will be identical to the standard points totals, albeit adjusted to cater for the number of events required to accumulate them, but at the aggregated levels they give us a simplistic way of comparing performance between the ability levels and even between the swing types. Unfortunately, no account is taken of the size of fields in the events, and therefore this method of ranking is fundamentally flawed, if no less interesting for that.

Definitions of header columns on Normalised Points pages:

Events
the number of events a player has started
Finishes
the number of events a player has completed (with being cut from a second round cut event counting as a completion)
Points
the number of points a player has won from completed events
Norm. Points
the normalised points total for the player (see above)
NPPE
'normalised points per event', ie. the normalised points total divided by the number of events entered
PPE
'points per round', ie. the number of points divided by the number of events entered


Index of Penalty Strokes pages

Penalty Strokes

If Links 2003 crashes during a round it should allow you to resume where you left off, but there are situations where this isn't possible, or the player can choose not to even try. At that point you get disqualified from the tournament that you are playing. To avoid hardware or software issues from spoiling people's enjoyment the LS Tour allows a player to requalify themself for an event from which they've been DQed, which puts them back to the beginning of the round during which they had the problems. The first time they do so is 'free', and undetectable, but for second and subsequent requalifications they get charged penalty strokes, which are recorded in their individual statistics. Whilst we have no way of knowing how often someone uses the requalification system, we can see how many times they've been charged for doing so.

Unfortunately some folk have taken to abusing this system by kicking themselves out of the game, taking the DQ, and then requalifying in order to start over. The reasons cited by those that admit to this practice are many and varied, but you can dress it up all you want folks, it is cheating. Whilst I cannot identify who is making use of the free requalifier, nor can I tell how many times a player has requalified (since you get charged more penalty strokes for the second and third times), these pages make it apparent who is racking up the greatest proportion of penalty strokes in relation to the number of rounds played.

Since coding the logic for this I have been wondering whether or not a more interesting measure might be the number of penalty strokes per completed event, and I'd be interested to hear any opinions on that (and may add it as a separate column on the page, but not sort by it).

Definitions of header columns on Penalty Strokes pages:

Events
the number of events a player has started
Finishes
the number of events a player has completed (with being cut from a second round cut event counting as a completion)
Points
the number of points a player has won from completed events
Rounds
the number of rounds a player has completed
Penalties
the number of penalty strokes that a player has accrued
PPR
'penalties per round', ie. the number of penalty strokes divided by the number of rounds played


Index of Hole Outs pages

Hole Outs

The primary focus of these pages is to show what percentage of shots the player holes out without using the putter. We also collect the number of definite chips on the greens (where the number of hole outs with a particular club exceeds the number of uses in the club statistics).

Unlike the abuse of the requalification system, I don't feel particularly strongly one way or the other about chipping on the greens, but I leave this data for you to interpret as you see fit.

Definitions of header columns on Hole Outs pages:

Rounds
the number of rounds a player has completed
Holes
the number of holes a player has completed
Hole Outs
the number of times a player has holed out with any club other than the putter
Green Chips
the number of times a player has definitely holed out by chipping on the green
HPH
'hole outs per hole', ie. the number of hole outs divided by the number of holes played, expressed as a percentage
GCPH
'green chips per hole', ie. the number of definite hole outs by chipping on the green divided by the number of holes played, expressed as a percentage

Index of generated pages

These links are effectively a site map for this part of my site, but you can also navigate your way around the 'Hidden Stats' pages by using the menu block at the top of each page.

  1. Rankings
    1. Classic
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    2. PowerStroke
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    3. Real-Time
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
  2. Normalised Points
    1. Classic
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    2. PowerStroke
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    3. Real-Time
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
  3. Penalty Strokes
    1. Classic
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    2. PowerStroke
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    3. Real-Time
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
  4. Hole Outs
    1. Classic
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    2. PowerStroke
      1. Elite
      2. Champ
      3. Pro
      4. Amateur
    3. Real-Time
      1. Elite
      2. Champ
      3. Pro
      4. Amateur

Return to Steve Pitts' home page
All text copyright © 2000-2006 Steve Pitts – All rights reserved
Last updated 15th December 2006