Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Individual usage data providers should not filter doubles clicks. This form of normalisation should be carried out on a central level by the aggregator.

6.2. Robot filtering

6.2.1. Definition of a robot

The "user" as defined in section 2 of this report is assumed to be a human user. Consequently, the focus of this document is on requests which have consciously been initiated by human beings. Automated visits by internet robots must be filtered from the data as much as possible.

6.2.2. Strategy

It is decided to make a distinction between two 'layers' of robot filtering:

...

Internet robots will be identified by comparing the value of the User Agent HTTP header to regular expressions which are present in a list of known robots which is managed by a central authority. All entries are expected to conform to the definition of a robot as provided in section 5.2.1. All institutions that send usage data must first check the entry against this list of internet robots. If the robot is in the list, the event should not be sent. It has been decided not to filter robots on their IP-addresses. The reason for this is that IP-addresses change very regularly, and this would make the list very difficult to maintain.

6.2.3. Robot list schema

The robot list must meet the following requirements:

...