Home Help for publishers

Help for publishers

Euan
By Euan
4 articles

Allowing Overton access to your website

How to configure Cloudflare and other middleware to allow Overton’s scrapers access to your website. We also have advice on Making your publications more visible. Overton automatically checks for updates to your publication pages, so that we can see when new documents are published and start tracking them. If you’re a publisher and want to use Overton then it’s important that we’re able to see new documents when they are published – otherwise they won’t appear in the database or accrue citations. Typically we check for changes once a week using a piece of software called a web scraper. If you use Cloudflare Cloudflare is a widely used content delivery network. It sits between your servers and the internet and reduces page loading times. It can also be used to prevent bots and scrapers collecting data from your website. Overton is well behaved and usually Cloudflare allows our scraper to run without any issue. However, every so often we’ll get blocked from a Cloudflare protected website and we require the site owners to restore access. To give Overton access you need to: a) Log in to your Cloudflare account, and select your website b) Click the Security option in the left hand sidebar, and then on the **Tools **option in the menu that appears c) In the IP Access Rules box you need to add Overton’s IP address and mark it as “Allow” - In the “IP, IP range, country name, or ASN” box, type 176.58.101.93 (this is the IP address of our scraper). It may appear in the dropdown box after you’ve typed it, if so click on it there to confirm it. - In the “Action” dropdown, select “Allow” - You can leave the “Zone” and “Notes” fields as they are d) Now click on “Add” and the new rule should appear below. Make sure that “Action” is set to “Allow” If you use a different CDN or firewall Please whitelist http traffic from 176.58.101.93 This is the IP address of Overton’s scraper. Alternatively if you’re unable to whitelist specific IPs but can whitelist us via a user agent string or HTTP request headers then please contact us – both options are possible, just tell us which you’d prefer. If you require any support or further details, please don’t hesitate to contact [email protected]

Last updated on Jun 25, 2026

Making your publications more visible

Following best practice will help researchers, tools and the public find your publications. We also have advice on Allowing Overton access to your website. To make sure your publications are properly indexed by Overton and other search indexes, like Google Scholar, it is important to follow web publishing best practices wherever possible. Ensure publication pages show dates, titles and download links For a publication to be indexed correctly, Overton needs at least a title and a publication date. If you have publication landing pages, please make sure that these are listed on each, preferably in a machine-readable way (see below). Ideally, download links should be formatted differently from the rest of the page’s contents. It is very hard for us to pull out links to report PDFs when they are mixed in with explanatory text: consider styling the PDF download link as a button or pulling it out to display in a sidebar. Please do contact us if you’d like us to look at your site and give some suggestions. Add machine-readable metadata The most important thing you can do to support search and indexing engines is to make sure there’s a “machine readable” version of the metadata – e.g. the title, description and publication date – of each publication available. The standard way to do this is with tags on each publication landing page. These tags aren’t visible to humans visiting the website, but they’re the first thing that apps and search engines look for. Meta tags look like: <meta property="og:title" content="Sierra Leone Country Brief"> <meta property="article:published_time" content="2017-06-16T16:58:15-04:00"> Each has two parts: the “property” (what does the metatag describe?) and “content” (the actual description). Here the og:title property is the title of the publication (“Sierra Leone Country Brief”) and article:published_time is the date of publication (16th June 2017). You can find a list of important property names and what they should contain below. Properties supported by Overton There’s no one standard for which properties to include and what they should contain, but the vast majority of websites use one or more of the sets below – it’s fine (even encouraged) to have multiple sets on the same page. Overton requires at least a title and publication date. We prefer the citation set (which is also used by Google Scholar) but any will do. The most useful tag you could add beyond title and date is citation_pdf_url, this makes it much, much easier for us to find PDF downloads for your publications. Citation set (used by Google Scholar) You can find more detailed information on adding these meta tags on the Google Scholar website. | Property | | Contains | | | citation_title | | Title of publication | | | citation_doi | | DOI of the publication (if one exists) | | | citation_isbn | | ISBN of the publication (if one exists) | | | citation_pdf_url | | Direct link to the publication PDF | | | citation_abstract | | Description of publication | | | citation_author | | Name of author (one author per tag) | | | citation_publication_date | | Date of publication | | Open Graph set (used by Facebook sharing) | Property | | Contains | | | og:title | | Title of the publication | | | og:description | | Description of the publication | | | og:published | | Publication date | | Dublin Core set (used by web crawlers) | Property | | Contains | | | dcterms.title | | Title of publication | | | dcterms.description | | Description of publication | | | dcterms.creator | | Name of author (one author per tag) | | | dcterms.created | | Date of publication | | Article set (used by web crawlers) | Property | | Contains | | | article:published_time | | Date of publication | | Ensure publications are open Overton only reads publicly available pages – if you require an email address or a login to access publications then we won’t be able to read them. Please contact us if this is an issue: it may be possible for us to collect publications via a different route. Support visiting specific pages of publication lists If your organization has more than one page of publications make sure that it’s possible to visit each one directly. Usually this is only an issue if your website has “show more” type buttons to load more publications as you scroll. If you can copy and paste the link from your browser’s address bar to get back to a specific page of results then you’re all set. If the browser address bar *doesn’t *change when you load more items then this may cause problems for Overton (and be affecting the accessibility of your site more broadly). Use semantic mark-up Semantic markup is a way of writing and structuring your HTML so that it reinforces the semantics, or meaning, of the content rather than its appearance. Some content management systems are better at this than others. If you develop your own web pages, then the mark-up that Overton finds the most useful is: | “rel” attributes | | We look for these in pagination controls (rel=”next”) and for author names on publication pages (rel=”author”). | | | | | When displaying publication dates please use the tag rather than free text. | | | and | | On publication lists please consider wrapping each item in a separate tag.On publication landing pages consider also wrapping the item title, date etc. in a element to keep it distinct from the actual publication content. | |

Last updated on Jun 25, 2026

Which publications does Overton collect?

Discover Overton’s criteria for collecting documents from an indexed source Overton has a broad definition of a policy document: “documents written primarily for or by policymakers” … and collects from a wide range of different sources. But we don’t collect everything from these sources: we manually curate each one and set up filters to include only those publications that we think users of Overton will be interested in. What we can’t / don’t collect - Anything behind a login or paywall, or where you have to enter an email address before downloading the document - Journal articles, conference abstracts or other works clearly in the scholarly record - Books, unless publicly available and hosted on the organization’s website - News articles or blog posts, if they contain mostly news or announcements – but see below for exceptions - Reprints of commentary in other magazines or newspapers - Statistical tables - Primary legislation* – we don’t explicitly exclude this but don’t go out of our way to track it* - Court cases or legal briefs* – we don’t explicitly exclude this but don’t go out of our way to track it* - Interactive mini-websites, where there is no downloadable version of the document available - Archived publications, if these are on a different website (e.g the Internet Archive) What we will sometimes collect - Blog posts, if they are a venue for commentary and opinion on policy related issues – this is a relatively new feature and Overton is still finding and adding blog URLs for its sources - Publications where there is no downloadable version – in some cases Overton can work with the contents of a webpage, where the main text is clearly identifiable with semantic markup e.g. and tags are used What we will always try to collect - Working papers, reports, case studies, policy briefs, testimony, clinical guidelines and government documents - Publications of interest to a policy audience that have a clear, publicly available link to a downloadable version We’ve got some additional guidance for publishers for making your publications as visible as possible to Overton and other indexes including Google, Google Scholar and Bing. Document types in Overton In general we have two types of sources: 1. Sources that categorize their own publications by type (e.g. a think tank that has separate “Policy Brief” and “Annual Reports” sections) or whose publications are all instances of a particular type (e.g. parliamentary transcripts). These account for about 70% of the database. 2. Sources whose documents are not organized by type (e.g. some city, state or federal government departments, where publications are grouped by theme rather than type). These account for the remaining 30% of the database. When we add a type 1 source we manually map their categories to our own as far as possible. We can’t currently do this for type 2 sources. There are thousands of source-specific categories, but Overton groups the most common ones so you can filter on them inside the application. You can see these in the Document Type filter when browsing or searching policy documents: … the original, source category is preserved in the API output if required. Please let us know if we’re missing a document type you’d find useful to filter on.

Last updated on Jun 25, 2026