API Reference¶

This section of the documentation provides detailed information on functions, classes, and methods.

Server Communication¶

Communicating with the NLP server (processors-server) is handled by the following classes.

`ProcessorsBaseAPI`¶

class processors.api.ProcessorsBaseAPI(**kwargs)[source]¶

Bases: object

Manages a connection with processors-server and provides an interface to the API.

Parameters:	port (int) – The port the server is running on or should be started on. Default is 8886. hostname (str) – The host name to use for the server. Default is “localhost”. log_file (str) – The path for the log file. Default is py-processors.log in the user’s home directory.

annotate(text)¶: Produces a Document from the provided text using the default processor.

clu.annotate(text)¶: Produces a Document from the provided text using CluProcessor.

fastnlp.annotate(text)¶: Produces a Document from the provided text using FastNLPProcessor.

bionlp.annotate(text)¶: Produces a Document from the provided text using BioNLPProcessor.

annotate_from_sentences(sentences)¶: Produces a Document from sentences (a list of text split into sentences). Uses the default processor.

fastnlp.annotate_from_sentences(sentences)¶: Produces a Document from sentences (a list of text split into sentences). Uses FastNLPProcessor.

bionlp.annotate_from_sentences(sentences)¶: Produces a Document from sentences (a list of text split into sentences). Uses BioNLPProcessor.

corenlp.sentiment.score_sentence(sentence)¶: Produces a sentiment score for the provided sentence (an instance of Sentence).

corenlp.sentiment.score_document(doc)¶: Produces sentiment scores for the provided doc (an instance of Document). One score is produced for each sentence.

corenlp.sentiment.score_segmented_text(sentences)¶: Produces sentiment scores for the provided sentences (a list of text segmented into sentences). One score is produced for item in sentences.

odin.extract_from_text(text, rules)¶: Produces a list of Mentions for matches of the provided rules on the text. rules can be a string of Odin rules, or a url ending in .yml or .yaml.

odin.extract_from_document(doc, rules)¶: Produces a list of Mentions for matches of the provided rules on the doc (an instance of Document). rules can be a string of Odin rules, or a url ending in .yml or yaml.

`ProcessorsAPI`¶

class processors.api.ProcessorsAPI(**kwargs)[source]¶

Bases: processors.api.ProcessorsBaseAPI

Manages a connection with the processors-server jar and provides an interface to the API.

Parameters:

timeout (int) – The number of seconds to wait for the server to initialize. Default is 120.
jvm_mem (str) – The maximum amount of memory to allocate to the JVM for the server. Default is “-Xmx3G”.
jar_path (str) – The path to the processors-server jar. Default is the jar installed with the package.
kee_alive (bool) – Whether or not to keep the server running when ProcessorsAPI instance goes out of scope. Default is false (server is shut down).
log_file (str) – The path for the log file. Default is py-processors.log in the user’s home directory.

start_server(jar_path, **kwargs)¶: Starts the server using the provided jar_path. Optionally takes hostname, port, jvm_mem, and timeout.

stop_server()¶: Attempts to stop the server running at self.address.

`OdinAPI`¶

class processors.api.OdinAPI(address)[source]¶

Bases: object

API for performing rule-based information extraction with Odin.

Parameters:	address (str) – The base address for the API (i.e., everything preceding /api/..)

`SentimentAnalysisAPI`¶

class processors.sentiment.SentimentAnalysisAPI(address)[source]¶

Bases: object

API for performing sentiment analysis

Parameters:	address (str) – The base address for the API (i.e., everything preceding /api/..)

corenlp¶: processors.sentiment.CoreNLPSentimentAnalyzer – Service using [CoreNLP‘s tree-based system](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf) for performing sentiment analysis.

Data Structures¶

`Document`¶

class processors.ds.Document(sentences)[source]¶

Bases: object

Storage class for annotated text. Based on [org.clulab.processors.Document](https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/processors/Document.scala)

Parameters:	sentences ([processors.ds.Sentence]) – The sentences comprising the Document.

id¶: str or None – A unique ID for the Document.

size¶: int – The number of sentences.

sentences¶: sentences – The sentences comprising the Document.

words¶: [str] – A list of the Document‘s tokens.

tags¶: [str] – A list of the Document‘s tokens represented using part of speech (PoS) tags.

lemmas¶: [str] – A list of the Document‘s tokens represented using lemmas.

_entities¶: [str] – A list of the Document‘s tokens represented using IOB-style named entity (NE) labels.

nes¶: dict – A dictionary of NE labels represented in the Document -> a list of corresponding text spans.

bag_of_labeled_deps¶: [str] – The labeled dependencies from all sentences in the Document.

bag_of_unlabeled_deps¶: [str] – The unlabeled dependencies from all sentences in the Document.

text¶: str or None – The original text of the Document.

bag_of_labeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is labeled with its grammatical relation.

bag_of_unlabeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is left unlabeled without its grammatical relation.

`Sentence`¶

class processors.ds.Sentence(**kwargs)[source]¶

Bases: object

Storage class for an annotated sentence. Based on [org.clulab.processors.Sentence](https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/processors/Sentence.scala)

Parameters:

text (str or None) – The text of the Sentence.
words ([str]) – A list of the Sentence‘s tokens.
startOffsets ([int]) – The character offsets starting each token (inclusive).
endOffsets ([int]) – The character offsets marking the end of each token (exclusive).
tags ([str]) – A list of the Sentence‘s tokens represented using part of speech (PoS) tags.
lemmas ([str]) – A list of the Sentence‘s tokens represented using lemmas.
chunks ([str]) – A list of the Sentence‘s tokens represented using IOB-style phrase labels (ex. B-NP, I-NP, B-VP, etc.).
entities ([str]) – A list of the Sentence‘s tokens represented using IOB-style named entity (NE) labels.
graphs (dict) – A dictionary of {graph-name -> {edges: [{source, destination, relation}], roots: [int]}}

text¶: str – The text of the Sentence.

startOffsets¶: [int] – The character offsets starting each token (inclusive).

endOffsets¶: [int] – The character offsets marking the end of each token (exclusive).

length¶: int – The number of tokens in the Sentence

graphs¶: dict – A dictionary (str -> processors.ds.DirectedGraph) mapping the graph type/name to a processors.ds.DirectedGraph.

basic_dependencies¶: processors.ds.DirectedGraph – A processors.ds.DirectedGraph using basic Stanford dependencies.

collapsed_dependencies¶: processors.ds.DirectedGraph – A processors.ds.DirectedGraph using collapsed Stanford dependencies.

dependencies¶: processors.ds.DirectedGraph – A pointer to the prefered syntactic dependency graph type for this Sentence.

_entities¶: [str] – The IOB-style Named Entity (NE) labels corresponding to each token.

_chunks¶: [str] – The IOB-style chunk labels corresponding to each token.

nes¶: dict – A dictionary of NE labels represented in the Document -> a list of corresponding text spans (ex. {“PERSON”: [phrase 1, ..., phrase n]}). Built from Sentence._entities

phrases¶: dict – A dictionary of chunk labels represented in the Document -> a list of corresponding text spans (ex. {“NP”: [phrase 1, ..., phrase n]}). Built from Sentence._chunks

bag_of_labeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is labeled with its grammatical relation.

bag_of_unlabeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is left unlabeled without its grammatical relation.

`DirectedGraph`¶

class processors.ds.DirectedGraph(kind, deps, words)[source]¶

Bases: object

Storage class for directed graphs.

Parameters:	kind (str) – The name of the directed graph. deps (dict) – A dictionary of {edges: [{source, destination, relation}], roots: [int]} words ([str]) – A list of the word form of the tokens from the originating Sentence.

_words¶: [str] – A list of the word form of the tokens from the originating Sentence.

roots¶: [int] – A list of indices for the syntactic dependency graph’s roots. Generally this is a single token index.

edges¶: list[processors.ds.Edge] – A list of processors.ds.Edge

incoming¶: A dictionary of {int -> [int]} encoding the incoming edges for each node in the graph.

outgoing¶: A dictionary of {int -> [int]} encoding the outgoing edges for each node in the graph.

labeled¶: [str] – A list of strings where each element in the list represents an edge encoded as source index, relation, and destination index (“source_relation_destination”).

unlabeled¶: [str] – A list of strings where each element in the list represents an edge encoded as source index and destination index (“source_destination”).

graph¶: networkx.Graph – A networkx.graph representation of the DirectedGraph. Used by shortest_path

bag_of_labeled_dependencies_from_tokens(form)¶: Produces a list of syntactic dependencies where each edge is labeled with its grammatical relation.

bag_of_unlabeled_dependencies_from_tokens(form)¶: Produces a list of syntactic dependencies where each edge is left unlabeled without its grammatical relation.

`Mention`¶

class processors.odin.Mention(token_interval, sentence, document, foundBy, label, labels=None, trigger=None, arguments=None, paths=None, keep=True, doc_id=None)[source]¶

Bases: object

A labeled span of text. Used to model textual mentions of events, relations, and entities.

Parameters:

token_interval (Interval) – The span of the Mention represented as an Interval.
sentence (int) – The sentence index that contains the Mention.
document (Document) – The Document in which the Mention was found.
foundBy (str) – The Odin IE rule that produced this Mention.
label (str) – The label most closely associated with this span. Usually the lowest hyponym of “labels”.
labels (list) – The list of labels associated with this span.
trigger (dict or None) – dict of JSON for Mention’s trigger (event predicate or word(s) signaling the Mention).
arguments (dict or None) – dict of JSON for Mention’s arguments.
paths (dict or None) – dict of JSON encoding the syntactic paths linking a Mention’s arguments to its trigger (applies to Mentions produces from type:”dependency” rules).
doc_id (str or None) – the id of the document

tokenInterval¶: processors.ds.Interval – An Interval encoding the start and end of the Mention.

start¶: int – The token index that starts the Mention.

end¶: int – The token index that marks the end of the Mention (exclusive).

sentenceObj¶: processors.ds.Sentence – Pointer to the Sentence instance containing the Mention.

characterStartOffset¶: int – The index of the character that starts the Mention.

characterEndOffset¶: int – The index of the character that ends the Mention.

type¶: Mention.TBM or Mention.EM or Mention.RM – The type of the Mention.

`Interval`¶

class processors.ds.Interval(start, end)[source]¶

Bases: object

Defines a token or character span

Parameters:	start (str) – The token or character index where the interval begins. end (str) – The 1 + the index of the last token/character in the span.

DependencyUtils¶

class processors.paths.DependencyUtils[source]¶

Bases: object

A set of utilities for analyzing syntactic dependency graphs.

build_networkx_graph(roots, edges, name)¶: Constructs a networkx.Graph

shortest_path(g, start, end)¶: Finds the shortest path in a networkx.Graph between any element in a list of start nodes and any element in a list of end nodes.

retrieve_edges(dep_graph, path)¶: Converts output of shortest_path into a list of triples that include the grammatical relation (and direction) for each node-node “hop” in the syntactic dependency graph.

simplify_tag(tag)¶: Maps part of speech (PoS) tag to a subset of PoS tags to better consolidate categorical labels.

lexicalize_path(sentence, path, words=False, lemmas=False, tags=False, simple_tags=False, entities=False)¶: Lexicalizes path in syntactic dependency graph using Odin-style token constraints.

pagerank(networkx_graph, alpha=0.85, personalization=None, max_iter=1000, tol=1e-06, nstart=None, weight='weight', dangling=None)¶: Measures node activity in a networkx.Graph using a thin wrapper around networkx implementation of pagerank algorithm (see networkx.algorithms.link_analysis.pagerank). Use with processors.ds.DirectedGraph.graph.

Annotators (Processors)¶

Text annotation is performed by communicating with one of the following annotators (“processors”).

`CluProcessor`¶

class processors.annotators.CluProcessor(address)[source]¶

Bases: processors.annotators.Processor

Processor for text annotation based on [org.clulab.processors.clu.CluProcessor](https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/processors/clu/CluProcessor.scala)

Uses the Malt parser.

`FastNLPProcessor`¶

class processors.annotators.FastNLPProcessor(address)[source]¶

Bases: processors.annotators.Processor

Processor for text annotation based on [org.clulab.processors.fastnlp.FastNLPProcessor](https://github.com/clulab/processors/blob/master/corenlp/src/main/scala/org/clulab/processors/fastnlp/FastNLPProcessor.scala)

Uses the Stanford CoreNLP neural network parser.

`BioNLPProcessor`¶

class processors.annotators.BioNLPProcessor(address)[source]¶

Bases: processors.annotators.Processor

Processor for biomedical text annotation based on [org.clulab.processors.fastnlp.FastNLPProcessor](https://github.com/clulab/processors/blob/master/corenlp/src/main/scala/org/clulab/processors/fastnlp/FastNLPProcessor.scala)

CoreNLP-derived annotator.

Sentiment Analysis¶

Serialization¶

class processors.serialization.JSONSerializer[source]¶

Bases: object

Utilities for serialization/deserialization of data structures.

API Reference¶

Server Communication¶

ProcessorsBaseAPI¶

ProcessorsAPI¶

OdinAPI¶

SentimentAnalysisAPI¶

Data Structures¶

Document¶

Sentence¶

DirectedGraph¶

Mention¶

Interval¶

DependencyUtils¶

Annotators (Processors)¶

CluProcessor¶

FastNLPProcessor¶

BioNLPProcessor¶

Sentiment Analysis¶

Serialization¶

`ProcessorsBaseAPI`¶

`ProcessorsAPI`¶

`OdinAPI`¶

`SentimentAnalysisAPI`¶

`Document`¶

`Sentence`¶

`DirectedGraph`¶

`Mention`¶

`Interval`¶

`CluProcessor`¶

`FastNLPProcessor`¶

`BioNLPProcessor`¶