The New York Times

Article Search API



James Boehmer
james.boehmer@nytimes.com
@jamesboehmer
github.com/jamesboehmer

Getting an API Key


First things first, you need a key to use NYT APIs!

  1. Go to developer.nytimes.com/apps/register  (log in)
  2. Choose the API you want to use
  3. Agree to the evil terms of service!!!


Article Search API v2


Let's say you want to find an article in the archives.  You'll want to use the new Article Search API.

Article Search API v2

Query (q) parameter


The q parameter searches the body, headline, and byline for relevant results.



Article Search API v2

Highlight (hl) parameter


All results get returned with a headline and snippet.  Use the hl parameter to highlight the query term.



  web_url: "http://select.nytimes.com/gst/abstract.html?res=9E06E3...",
  snippet: "condemning the <strong>Pulitzer Prize</strong> award to...",
  headline: {
main: "The <strong>Pulitzer Prize</strong>."
  }
}

Article Search API v2

Begin/End Date parameters


Filter your search results by publication date


The begin_date and end_date parameters are inclusive filters for limiting the search corpus by publication date.  

Article Search API v2

Begin/End Date parameters (cont)


The begin_date and end_date parameters can be used together or alone, implying an open ended filter

Article Search API v2

Sort parameter


The sort parameter sorts the results by publication date, forcibly overriding relevance scores.


Relevance is still calculated for the query term, but only for inclusion in the result set

Documents with no publication date (e.g. references and lists) are returned last

Article Search API v2

Filter Query (fq)  parameter


Use standard Lucene syntax to create a custom filter


Similar to the date parameters, the filter query also limits the corpus before searching for the query term

The fields available for filtering behave in various way based on how they are analyzed at index time.

Article Search API v2

Filter Query (fq)  fields


                                        
Field                       Behavior

body                        multiple tokens
body.search                 left-edge n-grams
creative_works              single token
creative_works.contains     multiple tokens
day_of_week                 single token
document_type               case sensitive exact match
glocations                  single token
glocations.contains         multiple tokens
headline                    multiple tokens
headline.search             left-edge n-grams
kicker                      single token
kicker.contains             multiple tokens
news_desk                   single token
news_desk.contains          multiple tokens
organizations               single token
organizations.contains      multiple tokens
persons                     single token
persons.contains            multiple tokens
pub_date                    timestamp (YYYY-MM-DD)
pub_year                    integer
secpg                       multiple tokens

Article Search API v2

Filter Query (fq)  fields (cont)


                                        
Field                       Behavior

source                      single token
source.contains             multiple tokens
subject                     single token
subject.contains            multiple tokens
section_name                single token
section_name.contains       multiple tokens
type_of_material            single token
type_of_material.contains   multiple tokens
web_url                     case sensitive single token
word_count                  integer
  • Various fields can be combined in a complex way to narrow down exactly what you want
  • The default boolean between values in parenthesis is OR
  • Explicit booleans (AND, OR) must always be UPPER CASE

Article Search API v2

Type parameter


Filter by document_type using the type parameter


Multiple document types can be comma-separated

Article Search API v2

More about filter-like parameters


The type, begin_date and end_date parameters are API conveniences. They are functionally equivalent filter queries, joined by a logical AND

type=blogpost,multimedia
 ...is the same as...
fq=document_type:("blogpost" "multimedia")  
...which is the same as...
fq=document_type:"blogpost" OR document_type:"multimedia"  
         

Article Search API v2

More about filter-like parameters (cont)


begin_date=20130101
...is the same as... 
fq=pub_date:[2013-01-01 TO *]

type=article,blogpost & begin_date=20120101 & end_date=20121231
...is the same as...
fq=document_type:("article" "blogpost") AND pub_date:[2012-01-01 TO 2012-12-13]

Article Search API v2

Page parameter


Paginate through 10 results at a time using the page parameter


Page numbers start with zero (i.e. page 12 is offset 120)

response.meta.hits/10 tells you how many pages there are in total

Article Search API v2

Facet Field parameter


A facet is an aggregate count for a field, relative to a query term.


The response.facets object will give you the top five section names and days of the week, with (and ranked by) counts.

Article Search API v2

More on facets


What are facets useful for?

  • When constructing a front end search application, we can present the user with a list of available filters.  Intelligently aiding navigation for the user is always a plus!
  • We can make search better by coupling the most popular search terms with their top facets, and ranking results higher by keyword
  • We can visualize the importance of subjects over time by reporting on facets over a moving window
  • Presently only low-cardinality fields are available for faceting because of performance concerns.  These include source,section_name,document_type,type_of_material and day_of_week

Article Search API v2

Facet Filter parameter


By default, facets are aggregated only for the query term.  You can also include the filter query in the facet calculation

This concept is called adaptive facets, and is useful for sub-navigation of filtered queries

The New York Times

Article Search API



James Boehmer
james.boehmer@nytimes.com


Don't forget to check out the Times Developer Network:
developer.nytimes.com

And our very own Open Blog:
open.blogs.nytimes.com

Thank you!