Skip to content

Document annotator for intelligent ERP systems

Logo

Smartscan is an is a cutting-edge data extraction tool (using OCR technology) for unstructured documents, such as invoices and receipts. The Smartscan API extracts, enriches and categorizes information details within seconds, and the output consists of a JSON document. Explore the use cases on our website.

Quick Start

Free Demo

Try our demo for free! Experience the power of the Smartscan API firsthand.

Upload your document and get the scan results within seconds using our demo.

Demo Token

For testing purposes, you can use the token "demo" for access to staging, and use the staging endpoints indicated in the endpoints section.

Limitations

The "demo" token is very rate limited, but fine for development purposes.

For setting up a project on our Staging Environment, follow the steps in the Quick Start Guide.


POST v1/document:annotate

https://api.stag.ssn.visma.ai/v1/document:annotate

Authorization - Bearer Token

Token: demo


Body - raw (json)

Body
{
    "document": {
        "source": {
            "httpUri": "http://classy.dk/ftest0716/IMG_20160720_111137.jpg"
        }
    },
    "features": [
        {
            "type": "DEFAULT"
        }
    ],
    "tier": "PREMIUM"
}


Model Versions

Smartscan is available in two flavors:

  • STANDARD is the legacy version of Smartscan, available since 2019.
  • PREMIUM is a more modern model with enhanced capabilities that was first launched in 2021.

Note

We strongly recommend using PREMIUM version as it is vastly superior to the STANDARD. Explore the benefits of Premium by checking out the feature list and supported countries.

Document Data Sources

The caller can choose either to send the document data as part of the request or as shown in the quick start through URI. Key field content is the document data base64 encoded.

Example request


POST v1/document:annotate

https://api.stag.ssn.visma.ai/v1/document:annotate

Authorization - Bearer Token

Token: demo


Body - raw (json)

Body
{
    "document": {
        "content": "Vl00oANHjF3gxaYT4fQ0PSDJwwZIuMLl0GdNlgyKhF4KYOtcH3r... -- this is unfinished base64 enconding --"
    },
    "features": [
        {
            "type": "DEFAULT"
        }
    ],
    "tier": "PREMIUM"
}


Confidence Levels

Smartscan provides predictions accompanied by a confidence level, indicating the model's output quality. The allowed confidence level values are as follows:

VERY_HIGH, HIGH, MID, LOW, VERY_LOW

Higher confidence levels suggest more accurate predictions. By default, we display suggestions with the highest confidence. We filter results based on confidence thresholds (HIGH or VERY HIGH) to enhance accuracy.

You can customize confidence levels in your Smartscan requests, filtering results according to your desired confidence threshold, such as VERY HIGH. However, raising confidence levels may yield fewer suggestions.

Results are sorted by confidence level, from most to least confident.

Bounding Boxes

In the Smartscan API, bounding boxes play a crucial role in extracting meaningful information from documents. A bounding box represents the coordinates of the rectangular border that encloses a suggested field on the image of the document, as shown in the image below. Utilising bounding boxes empowers you to precisely pinpoint and extract specific data, enhancing the accuracy and efficiency of your document processing.

boundingboxes

How to Implement Bounding Boxes

When making API requests, include bounding box coordinates to define the region of interest within the document. Here's a basic example:

{
  "bounding_box": {
    "top_left": {"x": 100, "y": 150},
    "bottom_right": {"x": 300, "y": 250}
  },
  // Other request parameters...
}

Page references

Pageref references to the number of the suggested field's page.

  • When you send a document that is only one page long, Smartscan returns pageref of 1.
  • When you send a document that is two or more pages long, Smartscan only reads the first and last page. As an example, if the document is four pages long, the pageref will return pageref of 1 and 4.

The bounding box and pagerefs is present on majority of the fields.

Supported countries

If you require improved support for a specific country, please contact us.

The list of supported countries for the Premium model is much bigger than for the Standard model. For Standard the list of countries is as follows:

DK, NO, SE, FI, NL, GB, DE, US, RO, EE, IE
Premium currently supports 41 countries. We're improving the country support continuously.
AT, AU, BE, BG, CA, CH, CN, CY, CZ, DE, DK, EE, ES, FI, FO, FR, GB, GL, HR, HU, IE, IS, IT, LT, LU, LV, MT, NL, NO, NZ, PL, PT, RO, RS, RU, SE, SI, SK, TR, UA, US

Supported file types

Currently supported image types.

  • PDF containing text (Recommended)
  • PDF containing images (most often from scanners)
  • JPG / JPEG (Recommended)
  • PNG
  • BMP

In addition, Smartscan processes all text the OCR is able to scan. This can include handwritten receipts on standard pads.