API Reference¶

This section of the documentation provides detailed information on functions, classes, and methods.

Server Communication¶

Communicating with the NLP server (processors-server) is handled by the following classes:

`ProcessorsBaseAPI`¶

class processors.api.ProcessorsBaseAPI(**kwargs)[source]¶

Bases: object

Manages a connection with processors-server and provides an interface to the API.

Parameters:	port (int) – The port the server is running on or should be started on. Default is 8886. hostname (str) – The host name to use for the server. Default is “localhost”. log_file (str) – The path for the log file. Default is py-processors.log in the user’s home directory.

annotate(text)¶: Produces a Document from the provided text using the default processor.

clu.annotate(text)¶: Produces a Document from the provided text using CluProcessor.

fastnlp.annotate(text)¶: Produces a Document from the provided text using FastNLPProcessor.

bionlp.annotate(text)¶: Produces a Document from the provided text using BioNLPProcessor.

annotate_from_sentences(sentences)¶: Produces a Document from sentences (a list of text split into sentences). Uses the default processor.

fastnlp.annotate_from_sentences(sentences)¶: Produces a Document from sentences (a list of text split into sentences). Uses FastNLPProcessor.

bionlp.annotate_from_sentences(sentences)¶: Produces a Document from sentences (a list of text split into sentences). Uses BioNLPProcessor.

corenlp.sentiment.score_sentence(sentence)¶: Produces a sentiment score for the provided sentence (an instance of Sentence).

corenlp.sentiment.score_document(doc)¶: Produces sentiment scores for the provided doc (an instance of Document). One score is produced for each sentence.

corenlp.sentiment.score_segmented_text(sentences)¶: Produces sentiment scores for the provided sentences (a list of text segmented into sentences). One score is produced for item in sentences.

odin.extract_from_text(text, rules)¶: Produces a list of Mentions for matches of the provided rules on the text. rules can be a string of Odin rules, or a url ending in .yml or .yaml.

odin.extract_from_document(doc, rules)¶: Produces a list of Mentions for matches of the provided rules on the doc (an instance of Document). rules can be a string of Odin rules, or a url ending in .yml or yaml.

`ProcessorsAPI`¶

class processors.api.ProcessorsAPI(**kwargs)[source]¶

Bases: processors.api.ProcessorsBaseAPI

Manages a connection with the processors-server jar and provides an interface to the API.

Parameters:

timeout (int) – The number of seconds to wait for the server to initialize. Default is 120.
jvm_mem (str) – The maximum amount of memory to allocate to the JVM for the server. Default is “-Xmx3G”.
jar_path (str) – The path to the processors-server jar. Default is the jar installed with the package.
kee_alive (bool) – Whether or not to keep the server running when ProcessorsAPI instance goes out of scope. Default is false (server is shut down).
log_file (str) – The path for the log file. Default is py-processors.log in the user’s home directory.

start_server(jar_path, **kwargs)¶: Starts the server using the provided jar_path. Optionally takes hostname, port, jvm_mem, and timeout.

stop_server()¶: Attempts to stop the server running at self.address.

`OdinAPI`¶

class processors.api.OdinAPI(address)[source]¶

Bases: object

API for performing rule-based information extraction with Odin.

Parameters:	address (str) – The base address for the API (i.e., everything preceding /api/..)

`OdinAPI`¶

class processors.api.OpenIEAPI(address)[source]¶: Bases: object

`SentimentAnalysisAPI`¶

class processors.sentiment.SentimentAnalysisAPI(address)[source]¶

Bases: object

API for performing sentiment analysis

Parameters:	address (str) – The base address for the API (i.e., everything preceding /api/..)

corenlp¶: processors.sentiment.CoreNLPSentimentAnalyzer – Service using [CoreNLP‘s tree-based system](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf) for performing sentiment analysis.

Data Structures¶

`NLPDatum`¶

class processors.ds.NLPDatum[source]¶: Bases: object

`Document`¶

class processors.ds.Document(sentences)[source]¶

Bases: processors.ds.NLPDatum

Storage class for annotated text. Based on [org.clulab.processors.Document](https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/processors/Document.scala)

Parameters:	sentences ([processors.ds.Sentence]) – The sentences comprising the Document.

id¶: str or None – A unique ID for the Document.

size¶: int – The number of sentences.

sentences¶: sentences – The sentences comprising the Document.

words¶: [str] – A list of the Document‘s tokens.

tags¶: [str] – A list of the Document‘s tokens represented using part of speech (PoS) tags.

lemmas¶: [str] – A list of the Document‘s tokens represented using lemmas.

_entities¶: [str] – A list of the Document‘s tokens represented using IOB-style named entity (NE) labels.

nes¶: dict – A dictionary of NE labels represented in the Document -> a list of corresponding text spans.

bag_of_labeled_deps¶: [str] – The labeled dependencies from all sentences in the Document.

bag_of_unlabeled_deps¶: [str] – The unlabeled dependencies from all sentences in the Document.

text¶: str or None – The original text of the Document.

bag_of_labeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is labeled with its grammatical relation.

bag_of_unlabeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is left unlabeled without its grammatical relation.

`Sentence`¶

class processors.ds.Sentence(**kwargs)[source]¶

Bases: processors.ds.NLPDatum

Storage class for an annotated sentence. Based on [org.clulab.processors.Sentence](https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/processors/Sentence.scala)

Parameters:

text (str or None) – The text of the Sentence.
words ([str]) – A list of the Sentence‘s tokens.
startOffsets ([int]) – The character offsets starting each token (inclusive).
endOffsets ([int]) – The character offsets marking the end of each token (exclusive).
tags ([str]) – A list of the Sentence‘s tokens represented using part of speech (PoS) tags.
lemmas ([str]) – A list of the Sentence‘s tokens represented using lemmas.
chunks ([str]) – A list of the Sentence‘s tokens represented using IOB-style phrase labels (ex. B-NP, I-NP, B-VP, etc.).
entities ([str]) – A list of the Sentence‘s tokens represented using IOB-style named entity (NE) labels.
graphs (dict) – A dictionary of {graph-name -> {edges: [{source, destination, relation}], roots: [int]}}

text¶: str – The text of the Sentence.

startOffsets¶: [int] – The character offsets starting each token (inclusive).

endOffsets¶: [int] – The character offsets marking the end of each token (exclusive).

length¶: int – The number of tokens in the Sentence

graphs¶: dict – A dictionary (str -> processors.ds.DirectedGraph) mapping the graph type/name to a processors.ds.DirectedGraph.

basic_dependencies¶: processors.ds.DirectedGraph – A processors.ds.DirectedGraph using basic Stanford dependencies.

collapsed_dependencies¶: processors.ds.DirectedGraph – A processors.ds.DirectedGraph using collapsed Stanford dependencies.

dependencies¶: processors.ds.DirectedGraph – A pointer to the prefered syntactic dependency graph type for this Sentence.

_entities¶: [str] – The IOB-style Named Entity (NE) labels corresponding to each token.

_chunks¶: [str] – The IOB-style chunk labels corresponding to each token.

nes¶: dict – A dictionary of NE labels represented in the Document -> a list of corresponding text spans (ex. {“PERSON”: [phrase 1, ..., phrase n]}). Built from Sentence._entities

phrases¶: dict – A dictionary of chunk labels represented in the Document -> a list of corresponding text spans (ex. {“NP”: [phrase 1, ..., phrase n]}). Built from Sentence._chunks

bag_of_labeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is labeled with its grammatical relation.

bag_of_unlabeled_dependencies_using(form)¶: Produces a list of syntactic dependencies where each edge is left unlabeled without its grammatical relation.

`Edge`¶

class processors.ds.Edge(source, destination, relation)[source]¶: Bases: processors.ds.NLPDatum

`DirectedGraph`¶

class processors.ds.DirectedGraph(kind, deps, words)[source]¶

Bases: processors.ds.NLPDatum

Storage class for directed graphs.

Parameters:	kind (str) – The name of the directed graph. deps (dict) – A dictionary of {edges: [{source, destination, relation}], roots: [int]} words ([str]) – A list of the word form of the tokens from the originating Sentence.

_words¶: [str] – A list of the word form of the tokens from the originating Sentence.

roots¶: [int] – A list of indices for the syntactic dependency graph’s roots. Generally this is a single token index.

edges¶: list[processors.ds.Edge] – A list of processors.ds.Edge

incoming¶: A dictionary of {int -> [int]} encoding the incoming edges for each node in the graph.

outgoing¶: A dictionary of {int -> [int]} encoding the outgoing edges for each node in the graph.

labeled¶: [str] – A list of strings where each element in the list represents an edge encoded as source index, relation, and destination index (“source_relation_destination”).

unlabeled¶: [str] – A list of strings where each element in the list represents an edge encoded as source index and destination index (“source_destination”).

graph¶: networkx.Graph – A networkx.graph representation of the DirectedGraph. Used by shortest_path

bag_of_labeled_dependencies_from_tokens(form)¶: Produces a list of syntactic dependencies where each edge is labeled with its grammatical relation.

bag_of_unlabeled_dependencies_from_tokens(form)¶: Produces a list of syntactic dependencies where each edge is left unlabeled without its grammatical relation.

`Mention`¶

class processors.odin.Mention(token_interval, sentence, document, foundBy, label, labels=None, trigger=None, arguments=None, paths=None, keep=True, doc_id=None)[source]¶

Bases: processors.ds.NLPDatum

A labeled span of text. Used to model textual mentions of events, relations, and entities.

Parameters:

token_interval (Interval) – The span of the Mention represented as an Interval.
sentence (int) – The sentence index that contains the Mention.
document (Document) – The Document in which the Mention was found.
foundBy (str) – The Odin IE rule that produced this Mention.
label (str) – The label most closely associated with this span. Usually the lowest hyponym of “labels”.
labels (list) – The list of labels associated with this span.
trigger (dict or None) – dict of JSON for Mention’s trigger (event predicate or word(s) signaling the Mention).
arguments (dict or None) – dict of JSON for Mention’s arguments.
paths (dict or None) – dict of JSON encoding the syntactic paths linking a Mention’s arguments to its trigger (applies to Mentions produces from type:”dependency” rules).
doc_id (str or None) – the id of the document

tokenInterval¶: processors.ds.Interval – An Interval encoding the start and end of the Mention.

start¶: int – The token index that starts the Mention.

end¶: int – The token index that marks the end of the Mention (exclusive).

sentenceObj¶: processors.ds.Sentence – Pointer to the Sentence instance containing the Mention.

characterStartOffset¶: int – The index of the character that starts the Mention.

characterEndOffset¶: int – The index of the character that ends the Mention.

type¶: Mention.TBM or Mention.EM or Mention.RM – The type of the Mention.

`Interval`¶

class processors.ds.Interval(start, end)[source]¶

Bases: processors.ds.NLPDatum

Defines a token or character span

Parameters:	start (str) – The token or character index where the interval begins. end (str) – The 1 + the index of the last token/character in the span.

contains(that)¶: Test whether that (int or Interval) overlaps with span of this Interval.

overlaps(that)¶: Test whether this Interval contains another. Equivalent Intervals will overlap.

Annotators (Processors)¶

Text annotation is performed by communicating with one of the following annotators (“processors”).

`CluProcessor`¶

class processors.annotators.CluProcessor(address)[source]¶

Bases: processors.annotators.Processor

Processor for text annotation based on [org.clulab.processors.clu.CluProcessor](https://github.com/clulab/processors/blob/master/main/src/main/scala/org/clulab/processors/clu/CluProcessor.scala)

Uses the Malt parser.

`FastNLPProcessor`¶

class processors.annotators.FastNLPProcessor(address)[source]¶

Bases: processors.annotators.Processor

Processor for text annotation based on [org.clulab.processors.fastnlp.FastNLPProcessor](https://github.com/clulab/processors/blob/master/corenlp/src/main/scala/org/clulab/processors/fastnlp/FastNLPProcessor.scala)

Uses the Stanford CoreNLP neural network parser.

`BioNLPProcessor`¶

class processors.annotators.BioNLPProcessor(address)[source]¶

Bases: processors.annotators.Processor

Processor for biomedical text annotation based on [org.clulab.processors.fastnlp.FastNLPProcessor](https://github.com/clulab/processors/blob/master/corenlp/src/main/scala/org/clulab/processors/fastnlp/FastNLPProcessor.scala)

CoreNLP-derived annotator.

Sentiment Analysis¶

`SentimentAnalyzer`¶

class processors.sentiment.SentimentAnalyzer(address)[source]¶: Bases: object

`CoreNLPSentimentAnalyzer`¶

class processors.sentiment.CoreNLPSentimentAnalyzer(address)[source]¶

Bases: processors.sentiment.SentimentAnalyzer

Bridge to [CoreNLP‘s tree-based sentiment analysis system](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)

`paths`¶

`DependencyUtils`¶

class processors.paths.DependencyUtils[source]¶

Bases: object

A set of utilities for analyzing syntactic dependency graphs.

build_networkx_graph(roots, edges, name)¶: Constructs a networkx.Graph

shortest_path(g, start, end)¶: Finds the shortest path in a networkx.Graph between any element in a list of start nodes and any element in a list of end nodes.

retrieve_edges(dep_graph, path)¶: Converts output of shortest_path into a list of triples that include the grammatical relation (and direction) for each node-node “hop” in the syntactic dependency graph.

simplify_tag(tag)¶: Maps part of speech (PoS) tag to a subset of PoS tags to better consolidate categorical labels.

lexicalize_path(sentence, path, words=False, lemmas=False, tags=False, simple_tags=False, entities=False, limit_to=None)¶: Lexicalizes path in syntactic dependency graph using Odin-style token constraints.

pagerank(networkx_graph, alpha=0.85, personalization=None, max_iter=1000, tol=1e-06, nstart=None, weight='weight', dangling=None)¶: Measures node activity in a networkx.Graph using a thin wrapper around networkx implementation of pagerank algorithm (see networkx.algorithms.link_analysis.pagerank). Use with processors.ds.DirectedGraph.graph.

`HeadFinder`¶

class processors.paths.HeadFinder[source]¶: Bases: object

Serialization¶

`JSONSerializer`¶

class processors.serialization.JSONSerializer[source]¶

Bases: object

Utilities for serialization/deserialization of data structures.

Visualization¶

`JupyterVisualizer`¶

.. autoclass:: processors.Visualization.JupyterVisualizer :show-inheritance:

API Reference¶

Server Communication¶

ProcessorsBaseAPI¶

ProcessorsAPI¶

OdinAPI¶

OdinAPI¶

SentimentAnalysisAPI¶

Data Structures¶

NLPDatum¶

Document¶

Sentence¶

Edge¶

DirectedGraph¶

Mention¶

Interval¶

Annotators (Processors)¶

CluProcessor¶

FastNLPProcessor¶

BioNLPProcessor¶

Sentiment Analysis¶

SentimentAnalyzer¶

CoreNLPSentimentAnalyzer¶

paths¶

DependencyUtils¶

HeadFinder¶

Serialization¶

JSONSerializer¶

Visualization¶

JupyterVisualizer¶

`ProcessorsBaseAPI`¶

`ProcessorsAPI`¶

`OdinAPI`¶

`OdinAPI`¶

`SentimentAnalysisAPI`¶

`NLPDatum`¶

`Document`¶

`Sentence`¶

`Edge`¶

`DirectedGraph`¶

`Mention`¶

`Interval`¶

`CluProcessor`¶

`FastNLPProcessor`¶

`BioNLPProcessor`¶

`SentimentAnalyzer`¶

`CoreNLPSentimentAnalyzer`¶

`paths`¶

`DependencyUtils`¶

`HeadFinder`¶

`JSONSerializer`¶

`JupyterVisualizer`¶