Document annotator for intelligent ERP systems¶
Smartscan is an is a cutting-edge data extraction tool (using OCR technology) for unstructured documents, such as invoices and receipts. The Smartscan API extracts, enriches and categorizes information details within seconds, and the output consists of a JSON document. Explore the use cases on our website.
Quicklinks¶
- Smartscan's feature list. You can also find all available output fields in our GitHub repository. Comments in the protobuf will contain information on fields that have unique traits.
- All supported countries.
- All supported file types.
- How to start providing feedback.
- Data Deletion Policy.
Quick Start¶
Free Demo¶
Try our demo for free! Experience the power of the Smartscan API firsthand.
Upload your document and get the scan results within seconds using our demo.
Demo Token¶
For testing purposes, you can use the token "demo" for access to staging, and use the staging endpoints indicated in the endpoints section.
Limitations
The "demo" token is very rate limited, but fine for development purposes.
For setting up a project on our Staging Environment, follow the steps in the Quick Start Guide.
POST v1/document:annotate
https://api.stag.ssn.visma.ai/v1/document:annotate
Authorization - Bearer Token¶
Token: demo
Body - raw (json)¶
Body
{ "document": { "source": { "httpUri": "http://classy.dk/ftest0716/IMG_20160720_111137.jpg" } }, "features": [ { "type": "DEFAULT" } ], "tier": "PREMIUM" }
Model Versions¶
Smartscan is available in two flavors:
STANDARD
is the legacy version of Smartscan, available since 2019.PREMIUM
is a more modern model with enhanced capabilities that was first launched in 2021.
Note
We strongly recommend using PREMIUM
version as it is vastly superior to the STANDARD
. Explore the benefits of Premium by checking out the feature list and supported countries.
Document Data Sources¶
The caller can choose either to send the document data as part of the request or as shown in the quick start through URI. Key field content
is the document data base64 encoded.
Example request¶
POST v1/document:annotate¶
https://api.stag.ssn.visma.ai/v1/document:annotate
Authorization - Bearer Token¶
Token: demo
Body - raw (json)¶
Body
{ "document": { "content": "Vl00oANHjF3gxaYT4fQ0PSDJwwZIuMLl0GdNlgyKhF4KYOtcH3r... -- this is unfinished base64 enconding --" }, "features": [ { "type": "DEFAULT" } ], "tier": "PREMIUM" }
Confidence Levels¶
Smartscan provides predictions accompanied by a confidence level, indicating the model's output quality. The allowed confidence level values are as follows:
VERY_HIGH, HIGH, MID, LOW, VERY_LOW
Higher confidence levels suggest more accurate predictions. By default, we display suggestions with the highest confidence. We filter results based on confidence thresholds (HIGH
or VERY HIGH
) to enhance accuracy.
You can customize confidence levels in your Smartscan requests, filtering results according to your desired confidence threshold, such as VERY HIGH
. However, raising confidence levels may yield fewer suggestions.
Results are sorted by confidence level, from most to least confident.
Bounding Boxes¶
In the Smartscan API, bounding boxes play a crucial role in extracting meaningful information from documents. A bounding box represents the coordinates of the rectangular border that encloses a suggested field on the image of the document, as shown in the image below. Utilising bounding boxes empowers you to precisely pinpoint and extract specific data, enhancing the accuracy and efficiency of your document processing.
How to Implement Bounding Boxes¶
When making API requests, include bounding box coordinates to define the region of interest within the document. Here's a basic example:
{
"bounding_box": {
"top_left": {"x": 100, "y": 150},
"bottom_right": {"x": 300, "y": 250}
},
// Other request parameters...
}
Page references¶
Pageref references to the number of the suggested field's page.
- When you send a document that is only one page long, Smartscan returns pageref of 1.
- When you send a document that is two or more pages long, Smartscan only reads the first and last page. As an example, if the document is four pages long, the pageref will return pageref of 1 and 4.
The bounding box and pagerefs is present on majority of the fields.
Supported countries¶
If you require improved support for a specific country, please contact us.
The list of supported countries for the Premium model is much bigger than for the Standard model. For Standard the list of countries is as follows:
DK, NO, SE, FI, NL, GB, DE, US, RO, EE, IE
AT, AU, BE, BG, CA, CH, CN, CY, CZ, DE, DK, EE, ES, FI, FO, FR, GB, GL, HR, HU, IE, IS, IT, LT, LU, LV, MT, NL, NO, NZ, PL, PT, RO, RS, RU, SE, SI, SK, TR, UA, US
Supported file types¶
Currently supported image types.
- PDF containing text (Recommended)
- PDF containing images (most often from scanners)
- JPG / JPEG (Recommended)
- PNG
- BMP
In addition, Smartscan processes all text the OCR is able to scan. This can include handwritten receipts on standard pads.