The New York Times
Article Search API
James Boehmer
james.boehmer@nytimes.com
Getting an API Key
First things first, you need a key to use NYT APIs!
Article Search API v2
Let's say you want to find an article in the archives. You'll want to use the new Article Search API.
Article Search API v2
Query (q)
parameter
The q parameter searches the body, headline, and byline for relevant results.
Article Search API v2
Highlight (hl)
parameter
All results get returned with a headline and snippet. Use the hl parameter to highlight the query term.
{
web_url: "http://select.nytimes.com/gst/abstract.html?res=9E06E3...",
snippet: "condemning the <strong>Pulitzer Prize</strong> award to...",
headline: {
main: "The <strong>Pulitzer Prize</strong>."
}
}
Article Search API v2
Begin/End Date
parameters
Filter your search results by publication date
The
begin_date
and
end_date
parameters are inclusive filters for limiting the search corpus by publication date.
Article Search API v2
Begin/End Date
parameters (cont)
The
begin_date
and
end_date
parameters can be used together or alone, implying an open ended filter
Article Search API v2
Sort
parameter
The sort parameter sorts the results by publication date, forcibly overriding relevance scores.
Relevance is still calculated for the query term, but only for inclusion in the result set
Documents with no publication date (e.g. references and lists) are returned last
Article Search API v2
Filter Query (fq)
parameter
Use standard
Lucene syntax to create a custom filter
Similar to the date parameters, the filter query also limits the corpus before searching for the query term
The fields available for filtering behave in various way based on how they are analyzed at index time.
Article Search API v2
Filter Query (fq)
fields
Field Behavior
body multiple tokens
body.search left-edge n-grams
creative_works single token
creative_works.contains multiple tokens
day_of_week single token
document_type case sensitive exact match
glocations single token
glocations.contains multiple tokens
headline multiple tokens
headline.search left-edge n-grams
kicker single token
kicker.contains multiple tokens
news_desk single token
news_desk.contains multiple tokens
organizations single token
organizations.contains multiple tokens
persons single token
persons.contains multiple tokens
pub_date timestamp (YYYY-MM-DD)
pub_year integer
secpg multiple tokens
Article Search API v2
Filter Query (fq)
fields (cont)
Field Behavior
source single token
source.contains multiple tokens
subject single token
subject.contains multiple tokens
section_name single token
section_name.contains multiple tokens
type_of_material single token
type_of_material.contains multiple tokens
web_url case sensitive single token
word_count integer
- Various fields can be combined in a complex way to narrow down exactly what you want
-
The default boolean between values in parenthesis is OR
- Explicit booleans (AND, OR) must always be UPPER CASE
Article Search API v2
Type parameter
Filter by document_type using the type parameter
Multiple document types can be comma-separated
Article Search API v2
More about filter-like parameters
The type, begin_date and end_date parameters are API conveniences. They are functionally equivalent filter queries, joined by a logical AND
type=blogpost,multimedia
...is the same as...
fq=document_type:("blogpost" "multimedia")
...which is the same as...
fq=document_type:"blogpost" OR document_type:"multimedia"
Article Search API v2
More about filter-like parameters (cont)
begin_date=20130101
...is the same as...
fq=pub_date:[2013-01-01 TO *]
type=article,blogpost & begin_date=20120101 & end_date=20121231
...is the same as...
fq=document_type:("article" "blogpost") AND pub_date:[2012-01-01 TO 2012-12-13]
Article Search API v2
Page parameter
Paginate through 10 results at a time using the page parameter
Page numbers start with zero (i.e. page 12 is offset 120)
response.meta.hits/10 tells you how many pages there are in total
Article Search API v2
Facet Field parameter
A facet is an aggregate count for a field, relative to a query term.
The response.facets object will give you the top five section names and days of the week, with (and ranked by) counts.
Article Search API v2
More on facets
What are facets useful for?
-
When constructing a front end search application, we can present the user with a list of available filters. Intelligently aiding navigation for the user is always a plus!
-
We can make search better by coupling the most popular search terms with their top facets, and ranking results higher by keyword
-
We can visualize the importance of subjects over time by reporting on facets over a moving window
-
Presently only low-cardinality fields are available for faceting because of performance concerns. These include source,section_name,document_type,type_of_material and day_of_week
Article Search API v2
Facet Filter parameter
By default, facets are aggregated only for the query term. You can also include the filter query in the facet calculation
This concept is called adaptive facets, and is useful for sub-navigation of filtered queries
The New York Times
Article Search API
James Boehmer
james.boehmer@nytimes.com
Don't forget to check out the Times Developer Network:
And our very own Open Blog: