public interface AmazonTextract
Amazon Textract detects and analyzes text in documents and converts it into machine-readable text. This is the API reference documentation for Amazon Textract.
Modifier and Type | Method and Description |
---|---|
AnalyzeDocumentResult |
analyzeDocument(AnalyzeDocumentRequest analyzeDocumentRequest)
Analyzes an input document for relationships between detected items.
|
DetectDocumentTextResult |
detectDocumentText(DetectDocumentTextRequest detectDocumentTextRequest)
Detects text in the input document.
|
ResponseMetadata |
getCachedResponseMetadata(AmazonWebServiceRequest request)
Returns additional metadata for a previously executed successful request,
typically used for debugging issues where a service isn't acting as
expected.
|
GetDocumentAnalysisResult |
getDocumentAnalysis(GetDocumentAnalysisRequest getDocumentAnalysisRequest)
Gets the results for an Amazon Textract asynchronous operation that
analyzes text in a document.
|
GetDocumentTextDetectionResult |
getDocumentTextDetection(GetDocumentTextDetectionRequest getDocumentTextDetectionRequest)
Gets the results for an Amazon Textract asynchronous operation that
detects text in a document.
|
void |
setEndpoint(java.lang.String endpoint)
Overrides the default endpoint for this client
("https://textract.us-east-1.amazonaws.com").
|
void |
setRegion(Region region)
An alternative to
setEndpoint(String) , sets the
regional endpoint for this client's service calls. |
void |
shutdown()
Shuts down this client object, releasing any resources that might be held
open.
|
StartDocumentAnalysisResult |
startDocumentAnalysis(StartDocumentAnalysisRequest startDocumentAnalysisRequest)
Starts the asynchronous analysis of an input document for relationships
between detected items such as key-value pairs, tables, and selection
elements.
|
StartDocumentTextDetectionResult |
startDocumentTextDetection(StartDocumentTextDetectionRequest startDocumentTextDetectionRequest)
Starts the asynchronous detection of text in a document.
|
void setEndpoint(java.lang.String endpoint) throws java.lang.IllegalArgumentException
Callers can pass in just the endpoint (ex:
"textract.us-east-1.amazonaws.com") or a full URL, including the protocol
(ex: "https://textract.us-east-1.amazonaws.com"). If the protocol is not
specified here, the default protocol from this client's
ClientConfiguration
will be used, which by default is HTTPS.
For more information on using AWS regions with the AWS SDK for Java, and a complete list of all available endpoints for all AWS services, see: http://developer.amazonwebservices.com/connect/entry.jspa?externalID= 3912
This method is not threadsafe. An endpoint should be configured when the client is created and before any service requests are made. Changing it afterwards creates inevitable race conditions for any service requests in transit or retrying.
endpoint
- The endpoint (ex: "textract.us-east-1.amazonaws.com") or
a full URL, including the protocol (ex:
"https://textract.us-east-1.amazonaws.com") of the region
specific AWS endpoint this client will communicate with.java.lang.IllegalArgumentException
- If any problems are detected with the
specified endpoint.void setRegion(Region region) throws java.lang.IllegalArgumentException
setEndpoint(String)
, sets the
regional endpoint for this client's service calls. Callers can use this
method to control which AWS region they want to work with.
By default, all service endpoints in all regions use the https protocol.
To use http instead, specify it in the ClientConfiguration
supplied at construction.
This method is not threadsafe. A region should be configured when the client is created and before any service requests are made. Changing it afterwards creates inevitable race conditions for any service requests in transit or retrying.
region
- The region this client will communicate with. See
Region.getRegion(com.amazonaws.regions.Regions)
for
accessing a given region.java.lang.IllegalArgumentException
- If the given region is null,
or if this service isn't available in the given region. See
Region.isServiceSupported(String)
Region.getRegion(com.amazonaws.regions.Regions)
,
Region.createClient(Class,
com.amazonaws.auth.AWSCredentialsProvider, ClientConfiguration)
AnalyzeDocumentResult analyzeDocument(AnalyzeDocumentRequest analyzeDocumentRequest) throws AmazonClientException, AmazonServiceException
Analyzes an input document for relationships between detected items.
The types of information returned are as follows:
Form data (key-value pairs). The related information is returned in two
Block objects, each of type KEY_VALUE_SET
: a KEY
Block
object and a VALUE Block
object. For
example, Name: Ana Silva Carolina contains a key and value.
Name: is the key. Ana Silva Carolina is the value.
Table and table cell data. A TABLE Block
object contains
information about a detected table. A CELL Block
object is
returned for each cell in a table.
Lines and words of text. A LINE Block
object contains one or
more WORD Block
objects. All lines and words that are
detected in the document are returned (including text that doesn't have a
relationship with the value of FeatureTypes
).
Selection elements such as check boxes and option buttons (radio buttons)
can be detected in form data and in tables. A SELECTION_ELEMENT
Block
object contains information about a selection element,
including the selection status.
You can choose which type of analysis to perform by specifying the
FeatureTypes
list.
The output is returned in a list of Block
objects.
AnalyzeDocument
is a synchronous operation. To analyze
documents asynchronously, use StartDocumentAnalysis.
For more information, see Document Text Analysis.
analyzeDocumentRequest
- InvalidParameterException
InvalidS3ObjectException
UnsupportedDocumentException
DocumentTooLargeException
BadDocumentException
AccessDeniedException
ProvisionedThroughputExceededException
InternalServerErrorException
ThrottlingException
HumanLoopQuotaExceededException
AmazonClientException
- If any internal errors are encountered
inside the client while attempting to make the request or
handle the response. For example if a network connection is
not available.AmazonServiceException
- If an error response is returned by Amazon
Textract indicating either a problem with the data in the
request, or a server side issue.DetectDocumentTextResult detectDocumentText(DetectDocumentTextRequest detectDocumentTextRequest) throws AmazonClientException, AmazonServiceException
Detects text in the input document. Amazon Textract can detect lines of
text and the words that make up a line of text. The input document must
be an image in JPEG or PNG format. DetectDocumentText
returns the detected text in an array of Block objects.
Each document page has as an associated Block
of type PAGE.
Each PAGE Block
object is the parent of LINE
Block
objects that represent the lines of detected text on a
page. A LINE Block
object is a parent for each word that
makes up the line. Words are represented by Block
objects of
type WORD.
DetectDocumentText
is a synchronous operation. To analyze
documents asynchronously, use StartDocumentTextDetection.
For more information, see Document Text Detection.
detectDocumentTextRequest
- InvalidParameterException
InvalidS3ObjectException
UnsupportedDocumentException
DocumentTooLargeException
BadDocumentException
AccessDeniedException
ProvisionedThroughputExceededException
InternalServerErrorException
ThrottlingException
AmazonClientException
- If any internal errors are encountered
inside the client while attempting to make the request or
handle the response. For example if a network connection is
not available.AmazonServiceException
- If an error response is returned by Amazon
Textract indicating either a problem with the data in the
request, or a server side issue.GetDocumentAnalysisResult getDocumentAnalysis(GetDocumentAnalysisRequest getDocumentAnalysisRequest) throws AmazonClientException, AmazonServiceException
Gets the results for an Amazon Textract asynchronous operation that analyzes text in a document.
You start asynchronous text analysis by calling
StartDocumentAnalysis, which returns a job identifier (
JobId
). When the text analysis operation finishes, Amazon
Textract publishes a completion status to the Amazon Simple Notification
Service (Amazon SNS) topic that's registered in the initial call to
StartDocumentAnalysis
. To get the results of the
text-detection operation, first check that the status value published to
the Amazon SNS topic is SUCCEEDED
. If so, call
GetDocumentAnalysis
, and pass the job identifier (
JobId
) from the initial call to
StartDocumentAnalysis
.
GetDocumentAnalysis
returns an array of Block
objects. The following types of information are returned:
Form data (key-value pairs). The related information is returned in two
Block objects, each of type KEY_VALUE_SET
: a KEY
Block
object and a VALUE Block
object. For
example, Name: Ana Silva Carolina contains a key and value.
Name: is the key. Ana Silva Carolina is the value.
Table and table cell data. A TABLE Block
object contains
information about a detected table. A CELL Block
object is
returned for each cell in a table.
Lines and words of text. A LINE Block
object contains one or
more WORD Block
objects. All lines and words that are
detected in the document are returned (including text that doesn't have a
relationship with the value of the StartDocumentAnalysis
FeatureTypes
input parameter).
Selection elements such as check boxes and option buttons (radio buttons)
can be detected in form data and in tables. A SELECTION_ELEMENT
Block
object contains information about a selection element,
including the selection status.
Use the MaxResults
parameter to limit the number of blocks
that are returned. If there are more results than specified in
MaxResults
, the value of NextToken
in the
operation response contains a pagination token for getting the next set
of results. To get the next page of results, call
GetDocumentAnalysis
, and populate the NextToken
request parameter with the token value that's returned from the previous
call to GetDocumentAnalysis
.
For more information, see Document Text Analysis.
getDocumentAnalysisRequest
- InvalidParameterException
AccessDeniedException
ProvisionedThroughputExceededException
InvalidJobIdException
InternalServerErrorException
ThrottlingException
InvalidS3ObjectException
AmazonClientException
- If any internal errors are encountered
inside the client while attempting to make the request or
handle the response. For example if a network connection is
not available.AmazonServiceException
- If an error response is returned by Amazon
Textract indicating either a problem with the data in the
request, or a server side issue.GetDocumentTextDetectionResult getDocumentTextDetection(GetDocumentTextDetectionRequest getDocumentTextDetectionRequest) throws AmazonClientException, AmazonServiceException
Gets the results for an Amazon Textract asynchronous operation that detects text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.
You start asynchronous text detection by calling
StartDocumentTextDetection, which returns a job identifier (
JobId
). When the text detection operation finishes, Amazon
Textract publishes a completion status to the Amazon Simple Notification
Service (Amazon SNS) topic that's registered in the initial call to
StartDocumentTextDetection
. To get the results of the
text-detection operation, first check that the status value published to
the Amazon SNS topic is SUCCEEDED
. If so, call
GetDocumentTextDetection
, and pass the job identifier (
JobId
) from the initial call to
StartDocumentTextDetection
.
GetDocumentTextDetection
returns an array of Block
objects.
Each document page has as an associated Block
of type PAGE.
Each PAGE Block
object is the parent of LINE
Block
objects that represent the lines of detected text on a
page. A LINE Block
object is a parent for each word that
makes up the line. Words are represented by Block
objects of
type WORD.
Use the MaxResults parameter to limit the number of blocks that are
returned. If there are more results than specified in
MaxResults
, the value of NextToken
in the
operation response contains a pagination token for getting the next set
of results. To get the next page of results, call
GetDocumentTextDetection
, and populate the
NextToken
request parameter with the token value that's
returned from the previous call to GetDocumentTextDetection
.
For more information, see Document Text Detection.
getDocumentTextDetectionRequest
- InvalidParameterException
AccessDeniedException
ProvisionedThroughputExceededException
InvalidJobIdException
InternalServerErrorException
ThrottlingException
InvalidS3ObjectException
AmazonClientException
- If any internal errors are encountered
inside the client while attempting to make the request or
handle the response. For example if a network connection is
not available.AmazonServiceException
- If an error response is returned by Amazon
Textract indicating either a problem with the data in the
request, or a server side issue.StartDocumentAnalysisResult startDocumentAnalysis(StartDocumentAnalysisRequest startDocumentAnalysisRequest) throws AmazonClientException, AmazonServiceException
Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.
StartDocumentAnalysis
can analyze text in documents that are
in JPEG, PNG, and PDF format. The documents are stored in an Amazon S3
bucket. Use DocumentLocation to specify the bucket name and file
name of the document.
StartDocumentAnalysis
returns a job identifier (
JobId
) that you use to get the results of the operation.
When text analysis is finished, Amazon Textract publishes a completion
status to the Amazon Simple Notification Service (Amazon SNS) topic that
you specify in NotificationChannel
. To get the results of
the text analysis operation, first check that the status value published
to the Amazon SNS topic is SUCCEEDED
. If so, call
GetDocumentAnalysis, and pass the job identifier (
JobId
) from the initial call to
StartDocumentAnalysis
.
For more information, see Document Text Analysis.
startDocumentAnalysisRequest
- InvalidParameterException
InvalidS3ObjectException
InvalidKMSKeyException
UnsupportedDocumentException
DocumentTooLargeException
BadDocumentException
AccessDeniedException
ProvisionedThroughputExceededException
InternalServerErrorException
IdempotentParameterMismatchException
ThrottlingException
LimitExceededException
AmazonClientException
- If any internal errors are encountered
inside the client while attempting to make the request or
handle the response. For example if a network connection is
not available.AmazonServiceException
- If an error response is returned by Amazon
Textract indicating either a problem with the data in the
request, or a server side issue.StartDocumentTextDetectionResult startDocumentTextDetection(StartDocumentTextDetectionRequest startDocumentTextDetectionRequest) throws AmazonClientException, AmazonServiceException
Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.
StartDocumentTextDetection
can analyze text in documents
that are in JPEG, PNG, and PDF format. The documents are stored in an
Amazon S3 bucket. Use DocumentLocation to specify the bucket name
and file name of the document.
StartTextDetection
returns a job identifier (
JobId
) that you use to get the results of the operation.
When text detection is finished, Amazon Textract publishes a completion
status to the Amazon Simple Notification Service (Amazon SNS) topic that
you specify in NotificationChannel
. To get the results of
the text detection operation, first check that the status value published
to the Amazon SNS topic is SUCCEEDED
. If so, call
GetDocumentTextDetection, and pass the job identifier (
JobId
) from the initial call to
StartDocumentTextDetection
.
For more information, see Document Text Detection.
startDocumentTextDetectionRequest
- InvalidParameterException
InvalidS3ObjectException
InvalidKMSKeyException
UnsupportedDocumentException
DocumentTooLargeException
BadDocumentException
AccessDeniedException
ProvisionedThroughputExceededException
InternalServerErrorException
IdempotentParameterMismatchException
ThrottlingException
LimitExceededException
AmazonClientException
- If any internal errors are encountered
inside the client while attempting to make the request or
handle the response. For example if a network connection is
not available.AmazonServiceException
- If an error response is returned by Amazon
Textract indicating either a problem with the data in the
request, or a server side issue.void shutdown()
ResponseMetadata getCachedResponseMetadata(AmazonWebServiceRequest request)
Response metadata is only cached for a limited period of time, so if you need to access this extra diagnostic information for an executed request, you should use this method to retrieve it as soon as possible after executing a request.
request
- The originally executed request.Copyright © 2018 Amazon Web Services, Inc. All Rights Reserved.