OCR on iOS with Workflow and Cognitive Services

Workflow is a powerful iOS task automation application that can daisy chain actions across different apps on your iPhone or iPad, and execute the combination of steps via a single tap (think along the lines of Automator for the Mac or Logic Apps on Azure). Recently, I discovered that Workflow can also consume and parse JSON from a web response which opens the door to a number of possibilities... :) one of which being OCR via Cognitive Services.

Build Your Own OCR (Image to Text) App on iOS

High-Level Overview

Prerequisites:

  1. You will need an Azure account. If you don't have one, sign up for a free trial.
  2. Download and install Workflow on an iOS device via the App Store.

Note: The required Azure resource (Computer Vision) has a free tier (20 calls per minute, 5K calls per month) which is sufficient for this demo.

Quick Start

1. Create a Computer Vision resource via the Azure portal.

computer_vision_create.png

2. Navigate to the resource and copy and paste the Computer Vision API key. Preferably, copy the key across to a text editor on your iOS device (e.g. Notes) as we will need this later on to update our workflow.

computer_vision_key.png

3. Download the pre-created workflow by tapping on this link via an iOS device (iPhone or iPad) that has Workflow installed. Once loaded, tap Open in "Workflow".

open_in_workflow.png

4. Replace the placeholder text with the Computer Vision API key (from step 2).

replace_workflow_key.JPG

Note:

  • If you created your Computer Vision resource in the West US region, you are done and can hit the play button to test the workflow. The workflow app will present a one-time warning that "...this workflow was imported from Safari. Are you sure you want to run it?", tap Run Workflow to dismiss this message.
  • If you created your Computer Vision resource in a different region, you will need to perform an additional step.

5. ** This is only required if your Computer Vision resource has been created outside the West US region ** Scroll down to the URL step and replace westus with your region. Note: You can check your resource endpoint via the Azure portal under the Overview section of your Computer Vision resource.

computer_vision_endpoint.png

Workflow Recipe

The illustration below provides a helicopter view of all steps encompassed in the workflow.

  • The initial menu provides three choices (Take Photo, Latest Photo or Select Photo).
  • Once the image has been selected, the editor is presented to crop the relevant section.
  • The image is then converted to JPEG and sent to Cognitive Services for processing via an HTTP POST request.
  • The response is then converted into a dictionary and ultimately parsed to retrieve the text.
  • The output of combined text is then sent to the Notes app.
workflow_recipe.png

For those that are interested in understanding how to compile this workflow manually, step-by-step instructions in the video below.