Form Recognizer 2021-09-30-preview
Form Recognizer extracts information from forms and images into structured data. It includes the following options:
- Layout - Extracts text and table structure from documents using optical character recognition (OCR).
- Document - Extract text, selection marks, tables, entities, and general key-value pairs from documents.
- Business Card - Detects and extracts data from business cards using optical character recognition (OCR) and our business card model, enabling you to easily extract structured data from business cards such as contact names, company names, phone numbers, emails, and more.
- ID Document - Detects and extracts data from identification documents using optical character recognition (OCR) and our ID document model, enabling you to easily extract structured data from ID documents such as first name, last name, date of birth, document number, and more.
- Invoices - Detects and extracts data from invoices using optical character recognition (OCR) and our invoice understanding deep learning models, enabling you to easily extract structured data from invoices such as customer, vendor, invoice ID, invoice due date, total, invoice amount due, tax amount, ship to, bill to, line items and more.
- Receipt - Detects and extracts data from receipts using optical character recognition (OCR) and our receipt model, enabling you to easily extract structured data from receipts such as merchant name, merchant phone number, transaction date, transaction total, and more.
- Custom - Extracts information from forms (PDFs and images) into structured data based on a model created from a set of representative training forms. Form Recognizer learns the structure of your forms to intelligently extract text and data. It ingests text from forms, applies machine learning technology to identify keys, tables, and fields, and then outputs structured data that includes the relationships within the original file.
Analyze - Get analyze result
Gets the result of document analysis.
Select the testing console in the region where you created your resource:
Open API testing consoleRequest URL
Request parameters
string
Format - [a-zA-Z0-9][a-zA-Z0-9._~-]{1,63}. Unique model name.
string
Analyze operation result ID.
Request headers
string
Subscription key which provides access to this API. Found in your Cognitive Services accounts.
Request body
Response 200
Supported Document Fields
prebuilt:businesscard
Field | Type | Description | Example |
---|---|---|---|
ContactNames | array | ||
ContactNames.* | object | Contact name | Chris Smith |
ContactNames.*.FirstName | string | First (given) name of contact | Chris |
ContactNames.*.LastName | string | Last (family) name of contact | Smith |
CompanyNames | array | ||
CompanyNames.* | string | Company name | CONTOSO |
JobTitles | array | ||
JobTitles.* | string | Job title | Senior Researcher |
Departments | array | ||
Departments.* | string | Department or organization | Cloud & AI Department |
Addresses | array | ||
Addresses.* | string | Address | 4001 1st Ave NE Redmond, WA 98052 |
WorkPhones | array | ||
WorkPhones.* | phoneNumber | Work phone number | +1 (987) 213-5674 |
MobilePhones | array | ||
MobilePhones.* | phoneNumber | Mobile phone number | +1 (987) 123-4567 |
Faxes | array | ||
Faxes.* | phoneNumber | Fax number | +1 (987) 312-6745 |
OtherPhones | array | ||
OtherPhones.* | phoneNumber | Other phone number | +1 (987) 213-5673 |
Emails | array | ||
Emails.* | string | Contact email | chris.smith@contoso.com |
Websites | array | ||
Websites.* | string | Website | https://www.contoso.com |
prebuilt:idDocument:driverLicense
Field | Type | Description | Example |
---|---|---|---|
CountryRegion | countryRegion | Country or region code | USA |
Region | string | State or province | Washington |
DocumentNumber | string | Driver license number | WDLABCD456DG |
FirstName | string | Given name and middle initial if applicable | LIAM R. |
LastName | string | Surname | TALBOT |
Address | string | Address | 123 STREET ADDRESS YOUR CITY WA 99999-1234 |
DateOfBirth | date | Date of birth (DOB) | 01/06/1958 |
DateOfExpiration | date | Date of expiration (EXP) | 08/12/2020 |
Sex | string | Sex | M |
Endorsements | string | Endorsements | L |
Restrictions | string | Restrictions | B |
VehicleClassifications | string | Vehicle classification | D |
prebuilt:idDocument:passport
Field | Type | Description | Example |
---|---|---|---|
MachineReadableZone | object | Machine readable zone (MRZ) |
P
|
MachineReadableZone.FirstName | string | Given name and middle initial if applicable | JENNIFER |
MachineReadableZone.LastName | string | Surname | BROOKS |
MachineReadableZone.DocumentNumber | string | Passport number | 340020013 |
MachineReadableZone.CountryRegion | countryRegion | Issuing country or organization | USA |
MachineReadableZone.Nationality | countryRegion | Nationality | USA |
MachineReadableZone.DateOfBirth | date | Date of birth | 1980-01-01 |
MachineReadableZone.DateOfExpiration | date | Date of expiration | 201-05-05 |
MachineReadableZone.Sex | string | Sex | F |
prebuilt:invoice
Field | Type | Description | Example |
---|---|---|---|
CustomerName | string | Customer being invoiced | Microsoft Corp |
CustomerId | string | Reference ID for the customer | CID-12345 |
PurchaseOrder | string | A purchase order reference number | PO-3333 |
InvoiceId | string | ID for this specific invoice (often 'Invoice Number') | INV-100 |
InvoiceDate | date | Date the invoice was issued | 11/15/2019 |
DueDate | date | Date payment for this invoice is due | 12/15/2019 |
VendorName | string | Vendor who has created this invoice | CONTOSO LTD. |
VendorAddress | string | Mailing address for the Vendor | 123 456th St New York, NY, 10001 |
VendorAddressRecipient | string | Name associated with the VendorAddress | Contoso Headquarters |
CustomerAddress | string | Mailing address for the Customer | 123 Other St, Redmond WA, 98052 |
CustomerAddressRecipient | string | Name associated with the CustomerAddress | Microsoft Corp |
BillingAddress | string | Explicit billing address for the customer | 123 Bill St, Redmond WA, 98052 |
BillingAddressRecipient | string | Name associated with the BillingAddress | Microsoft Services |
ShippingAddress | string | Explicit shipping address for the customer | 123 Ship St, Redmond WA, 98052 |
ShippingAddressRecipient | string | Name associated with the ShippingAddress | Microsoft Delivery |
SubTotal | number | Subtotal field identified on this invoice | $100.00 |
TotalTax | number | Total tax field identified on this invoice | $10.00 |
InvoiceTotal | number | Total new charges associated with this invoice | $110.00 |
AmountDue | number | Total Amount Due to the vendor | $610.00 |
PreviousUnpaidBalance | number | Explicit previously unpaid balance | $500.00 |
RemittanceAddress | string | Explicit remittance or payment address for the customer | 123 Remit St New York, NY, 10001 |
RemittanceAddressRecipient | string | Name associated with the RemittanceAddress | Contoso Billing |
ServiceAddress | string | Explicit service address or property address for the customer | 123 Service St, Redmond WA, 98052 |
ServiceAddressRecipient | string | Name associated with the ServiceAddress | Microsoft Services |
ServiceStartDate | date | First date for the service period (for example, a utility bill service period) | 10/14/2019 |
ServiceEndDate | date | End date for the service period (for example, a utility bill service period) | 11/14/2019 |
Items | array | List of line items | |
Items.* | object | A single line item | 3/4/2021 A123 Consulting Services 2 hours $30.00 10% $60.00 |
Items.*.Amount | number | The amount of the line item | $60.00 |
Items.*.Date | date | Date corresponding to each line item. Often it is a date the line item was shipped | 3/4/2021 |
Items.*.Description | string | The text description for the invoice line item | Consulting service |
Items.*.Quantity | number | The quantity for this invoice line item | 2 |
Items.*.ProductCode | string | Product code, product number, or SKU associated with the specific line item | A123 |
Items.*.Tax | number | Tax associated with each line item. Possible values include tax amount, tax %, and tax Y/N | 10% |
Items.*.Unit | string | The unit of the line item, e.g, kg, lb etc. | hours |
Items.*.UnitPrice | number | The net or gross price (depending on the gross invoice setting of the invoice) of one unit of this item | $30.00 |
prebuilt:receipt
Field | Type | Description | Example |
---|---|---|---|
ReceiptType | string | Type of receipt | Itemized |
Locale | string | Locale | en-US |
MerchantName | string | Name of the merchant issuing the receipt | Contoso |
MerchantPhoneNumber | phoneNumber | Listed phone number of merchant | 987-654-3210 |
MerchantAddress | string | Listed address of merchant | 123 Main St Redmond WA 98052 |
Total | number | Full transaction total of receipt | $14.34 |
TransactionDate | date | Date the receipt was issued | June 06, 2019 |
TransactionTime | time | Time the receipt was issued | 4:49 PM |
Subtotal | number | Subtotal of receipt, often before taxes are applied | $12.34 |
Tax | number | Tax on receipt, often sales tax or equivalent | $2.00 |
Tip | number | Tip included by buyer | $1.00 |
ArrivalDate | date | Date of arrival | 27Mar21 |
DepartureDate | date | Date of departure | 28Mar21 |
Currency | currency | Currency unit of receipt amounts, or 'MIXED' if multiple values are found | USD |
MerchantAliases | array | ||
MerchantAliases.* | string | Alternative name of merchant | Contoso (R) |
Items | array | ||
Items.* | object | Extracted line item | 1 Surface Pro 6 $999.00 $999.00 |
Items.*.TotalPrice | number | Total price of line item | $999.00 |
Items.*.Name | string | Item name | Surface Pro 6 |
Items.*.Quantity | number | Quantity of each item | 1 |
Items.*.Price | number | Individual price of each item unit | $999.00 |
Items.*.Description | string | Item description | Room Charge |
Items.*.Date | date | Item date | 27Mar21 |
Items.*.Category | string | Item category | Room |
Error
Form Recognizer uses an unified design to represent all errors encountered in the REST APIs. Whenever an API operations returns a 4xx or 5xx status code, additional information about the error are returned in the response JSON body as follows:
{
"error": {
"code": "InvalidRequest",
"message": "Invalid request.",
"innererror": {
"code": "InvalidContent",
"message": "The file format is unsupported or corrupted. Refer to documentation for the list of supported formats."
}
}
}
For long-running operations where multiple errors may be encountered, the top-level error code is set to the most severe error, with the individual errors listed under the error.details property. In such scenarios, the target property of each individual error specifies the trigger of the error.
{
"status": "failed",
"createdDateTime": "2021-07-14T10:17:51Z",
"lastUpdatedDateTime": "2021-07-14T10:17:51Z",
"error": {
"code": "InternalServerError",
"message": "An unexpected error occurred.",
"details": [
{
"code": "InternalServerError",
"message": "An unexpected error occurred."
},
{
"code": "InvalidContentDimensions",
"message": "The input image dimensions are out of range. Refer to documentation for supported image dimensions.",
"target": "2"
}
]
}
}
{
"status": "succeeded",
"createdDateTime": "2021-09-30T12:42:07Z",
"lastUpdatedDateTime": "2021-09-30T12:42:13Z",
"analyzeResult": {
// Basic analyze result metadata
"apiVersion": "2021-09-30-preview", // REST API version used
"modelId": "prebuilt-invoice", // ModelId used
"stringIndexType": "textElements", // Character unit used for string offsets and lengths: textElements, unicodeCodePoint, utf16CodeUnit
// Concatenated content in global reading order across pages.
// Words are generally delimited by space, except CJK (Chinese, Japanese, Korean) characters.
// Lines and selection marks are generally delimited by newline character.
// Selection marks are represented in Markdown emoji syntax (:selected:, :unselected:).
"content": "CONTOSO LTD.\nINVOICE\nContoso Headquarters...",
"pages": [ // List of pages analyzed
{
// Basic page metadata
"pageNumber": 1, // 1-indexed page number
"angle": 0, // Orientation of content in clockwise direction (degree)
"width": 0, // Page width
"height": 0, // Page height
"unit": "pixel", // Unit for width, height, and bounding box coordinates
"spans": [ // Parts of top-level content covered by page
{
"offset": 0, // Offset in content
"length": 7 // Length in content
}
],
// List of words in page
"words": [
{
"content": "CONTOSO", // Equivalent to $.content.Substring(span.offset, span.length)
"boundingBox": [ ... ], // Position in page
"confidence": 0.99, // Extraction confidence
"span": { ... } // Part of top-level content covered by word
}, ...
],
// List of selectionMarks in page
"selectionMarks": [
{
"state": "selected", // Selection state: selected, unselected
"boundingBox": [ ... ], // Position in page
"confidence": 0.95, // Extraction confidence
"span": { ... } // Part of top-level content covered by selection mark
}, ...
],
// List of lines in page
"lines": [
{
"content": "CONTOSO LTD.", // Concatenated content of line (may contain both words and selectionMarks)
"boundingBox": [ ... ], // Position in page
"spans": [ ... ], // Parts of top-level content covered by line
}, ...
]
}, ...
],
// List of extracted tables
"tables": [
{
"rowCount": 1, // Number of rows in table
"columnCount": 1, // Number of columns in table
"boundingRegions": [ // Bounding boxes potentially across pages covered by table
{
"pageNumber": 1, // 1-indexed page number
"boundingBox": [ ... ], // Bounding box
}
],
"spans": [ ... ], // Parts of top-level content covered by table
// List of cells in table
"cells": [
{
"kind": "stubHead", // Cell kind: content (default), rowHeader, columnHeader, stubHead, description
"rowIndex": 0, // 0-indexed row position of cell
"columnIndex": 0, // 0-indexed column position of cell
"rowSpan": 1, // Number of rows spanned by cell (default=1)
"columnSpan": 1, // Number of columns spanned by cell (default=1)
"content": "SALESPERSON", // Concatenated content of cell
"boundingRegions": [ ... ], // Bounding regions covered by cell
"spans": [ ... ] // Parts of top-level content covered by cell
}, ...
]
}, ...
],
// List of extracted key-value pairs
"keyValuePairs": [
{
"key": { // Extracted key
"content": "INVOICE:", // Key content
"boundingRegions": [ ... ], // Key bounding regions
"spans": [ ... ] // Key spans
},
"value": { // Extracted value corresponding to key, if any
"content": "INV-100", // Value content
"boundingRegions": [ ... ], // Value bounding regions
"spans": [ ... ] // Value spans
},
"confidence": 0.95 // Extraction confidence
}, ...
],
// List of extracted entities
"entities": [
{
"category": "DateTime", // Primary entity category
"subCategory": "Date", // Secondary entity category
"content": "11/15/2019", // Entity content
"boundingRegions": [ ... ], // Entity bounding regions
"spans": [ ... ], // Entity spans
"confidence": 0.99 // Extraction confidence
}, ...
],
// List of extracted styles
"styles": [
{
"isHandwritten": true, // Is content in this style handwritten?
"spans": [ ... ], // Spans covered by this style
"confidence": 0.95 // Detection confidence
}, ...
],
// List of extracted documents
"documents": [
{
"docType": "prebuilt:invoice", // Classified document type (model dependent)
"boundingRegions": [ ... ], // Document bounding regions
"spans": [ ... ], // Document spans
"confidence": 0.99, // Document splitting/classification confidence
// List of extracted fields
"fields": {
"VendorName": { // Field name (docType dependent)
"type": "string", // Field value type: string, number, array, object, ...
"valueString": "CONTOSO LTD.",// Normalized field value
"content": "CONTOSO LTD.", // Raw extracted field content
"boundingRegions": [ ... ], // Field bounding regions
"spans": [ ... ], // Field spans
"confidence": 0.99 // Extraction confidence
}, ...
}
}, ...
]
}
}
Response 404
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
NotFound | Resource not found. |
Top Error Code | Inner Error Code | Message |
---|---|---|
NotFound | OperationNotFound | The requested operation was not found. The identifier may be invalid or the operation may have expired. |
{
"error": {
"code": "NotFound",
"message": "Resource not found.",
"innererror": {
"code": "OperationNotFound",
"message": "The requested operation was not found. The identifier may be invalid or the operation may have expired."
}
}
}
Response 500
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
InternalServerError | An unexpected error occurred. |
Top Error Code | Inner Error Code | Message |
---|---|---|
InternalServerError | Unknow | Unknow error. |
{
"error": {
"code": "InternalServerError",
"message": "An unexpected error occurred.",
"innererror": {
"code": "Unknown",
"message": "Unknown error."
}
}
}
Response 503
The top-level error.code property can be one of the following:
Error Code | Message |
---|---|
ServiceUnavailable | A transient error has occurred. Please try again. |
Top Error Code | Inner Error Code | Message |
---|---|---|
ServiceUnavailable | ServiceUnavailable | A transient error has occurred. Please try again. |
{
"error": {
"code": "ServiceUnavailable",
"message": "A transient error has occurred. Please try again.",
"innererror": {
"code": "ServiceUnavailable",
"message": "A transient error has occurred. Please try again."
}
}
}
Code samples
@ECHO OFF
curl -v -X GET "https://*.cognitiveservices.azure.com/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview"
-H "Ocp-Apim-Subscription-Key: {subscription key}"
--data-ascii "{body}"
using System;
using System.Net.Http.Headers;
using System.Text;
using System.Net.Http;
using System.Web;
namespace CSHttpClientSample
{
static class Program
{
static void Main()
{
MakeRequest();
Console.WriteLine("Hit ENTER to exit...");
Console.ReadLine();
}
static async void MakeRequest()
{
var client = new HttpClient();
var queryString = HttpUtility.ParseQueryString(string.Empty);
// Request headers
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "{subscription key}");
var uri = "https://*.cognitiveservices.azure.com/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview&" + queryString;
var response = await client.GetAsync(uri);
}
}
}
// // This sample uses the Apache HTTP client from HTTP Components (http://hc.apache.org/httpcomponents-client-ga/)
import java.net.URI;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.utils.URIBuilder;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class JavaSample
{
public static void main(String[] args)
{
HttpClient httpclient = HttpClients.createDefault();
try
{
URIBuilder builder = new URIBuilder("https://*.cognitiveservices.azure.com/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview");
URI uri = builder.build();
HttpGet request = new HttpGet(uri);
request.setHeader("Ocp-Apim-Subscription-Key", "{subscription key}");
// Request body
StringEntity reqEntity = new StringEntity("{body}");
request.setEntity(reqEntity);
HttpResponse response = httpclient.execute(request);
HttpEntity entity = response.getEntity();
if (entity != null)
{
System.out.println(EntityUtils.toString(entity));
}
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
}
}
<!DOCTYPE html>
<html>
<head>
<title>JSSample</title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js"></script>
</head>
<body>
<script type="text/javascript">
$(function() {
var params = {
// Request parameters
};
$.ajax({
url: "https://*.cognitiveservices.azure.com/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview&" + $.param(params),
beforeSend: function(xhrObj){
// Request headers
xhrObj.setRequestHeader("Ocp-Apim-Subscription-Key","{subscription key}");
},
type: "GET",
// Request body
data: "{body}",
})
.done(function(data) {
alert("success");
})
.fail(function() {
alert("error");
});
});
</script>
</body>
</html>
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSString* path = @"https://*.cognitiveservices.azure.com/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview";
NSArray* array = @[
// Request parameters
@"entities=true",
];
NSString* string = [array componentsJoinedByString:@"&"];
path = [path stringByAppendingFormat:@"?%@", string];
NSLog(@"%@", path);
NSMutableURLRequest* _request = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:path]];
[_request setHTTPMethod:@"GET"];
// Request headers
[_request setValue:@"{subscription key}" forHTTPHeaderField:@"Ocp-Apim-Subscription-Key"];
// Request body
[_request setHTTPBody:[@"{body}" dataUsingEncoding:NSUTF8StringEncoding]];
NSURLResponse *response = nil;
NSError *error = nil;
NSData* _connectionData = [NSURLConnection sendSynchronousRequest:_request returningResponse:&response error:&error];
if (nil != error)
{
NSLog(@"Error: %@", error);
}
else
{
NSError* error = nil;
NSMutableDictionary* json = nil;
NSString* dataString = [[NSString alloc] initWithData:_connectionData encoding:NSUTF8StringEncoding];
NSLog(@"%@", dataString);
if (nil != _connectionData)
{
json = [NSJSONSerialization JSONObjectWithData:_connectionData options:NSJSONReadingMutableContainers error:&error];
}
if (error || !json)
{
NSLog(@"Could not parse loaded json with error:%@", error);
}
NSLog(@"%@", json);
_connectionData = nil;
}
[pool drain];
return 0;
}
<?php
// This sample uses the Apache HTTP client from HTTP Components (http://hc.apache.org/httpcomponents-client-ga/)
require_once 'HTTP/Request2.php';
$request = new Http_Request2('https://*.cognitiveservices.azure.com/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview');
$url = $request->getUrl();
$headers = array(
// Request headers
'Ocp-Apim-Subscription-Key' => '{subscription key}',
);
$request->setHeader($headers);
$parameters = array(
// Request parameters
);
$url->setQueryVariables($parameters);
$request->setMethod(HTTP_Request2::METHOD_GET);
// Request body
$request->setBody("{body}");
try
{
$response = $request->send();
echo $response->getBody();
}
catch (HttpException $ex)
{
echo $ex;
}
?>
########### Python 2.7 #############
import httplib, urllib, base64
headers = {
# Request headers
'Ocp-Apim-Subscription-Key': '{subscription key}',
}
params = urllib.urlencode({
})
try:
conn = httplib.HTTPSConnection('*.cognitiveservices.azure.com')
conn.request("GET", "/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview&%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
####################################
########### Python 3.2 #############
import http.client, urllib.request, urllib.parse, urllib.error, base64
headers = {
# Request headers
'Ocp-Apim-Subscription-Key': '{subscription key}',
}
params = urllib.parse.urlencode({
})
try:
conn = http.client.HTTPSConnection('*.cognitiveservices.azure.com')
conn.request("GET", "/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview&%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
####################################
require 'net/http'
uri = URI('https://*.cognitiveservices.azure.com/formrecognizer/documentModels/{modelId}/analyzeResults/{resultId}?api-version=2021-09-30-preview')
request = Net::HTTP::Get.new(uri.request_uri)
# Request headers
request['Ocp-Apim-Subscription-Key'] = '{subscription key}'
# Request body
request.body = "{body}"
response = Net::HTTP.start(uri.host, uri.port, :use_ssl => uri.scheme == 'https') do |http|
http.request(request)
end
puts response.body