Creating OCR Android app using Tesseract in Android Studio Tutorial

Text Detection using electronic devices link an Android device using an app is also called optical character recognition (OCR). Since it’s inception OCR has come a long way in terms of speed and ease of use, but we still cannot detect handwritten text and accuracy of OCR depends on various factors.

Text recognition  in Android has become relatively easier. There are various library that allows you to perform OCR using and Android app. Some like Abby, are commercial text recognition solutions while others like Tesseract are free and open source, hence tesseract is the most common Text recognition library for Android. If you want to detect text regions and not read it, you can refer to my post here – Text detection in Android using openCV.

tesseract-ocr-android-app

In this post we will learn –

How to create Android app that performs OCR in Android Studio using Tesseract library :

There are various approaches to do this but this is the most simple and quick approach –
1. Adding tess-two to dependency
2. Creating a class to manage Tesseract calls.
3. Initialize the object of the class and call methods on that object.

We will be using tess-two library for using Tesseract in Android. To use tess-two with Android Studio, just add the following to dependencies of app module-

 compile 'com.rmtheis:tess-two:6.0.3'

Now sync the project and you will be able to use Tesseract with Android Studio. So out first step is complete, now let’s move on to the next step. We will create a class that will handle the initialization of TessBaseAPI and contain methods to facilitate call to recognize text from images. Here is the code for the class :

public class MyTessOCR {
    private String datapath;
    private TessBaseAPI mTess;    Context context;
    public MyTessOCR(Context context) {
        // TODO Auto-generated constructor stub       
    this.context = context;
        datapath = Environment.getExternalStorageDirectory() + "/ocrctz/";       
    File dir = new File(datapath + "/tessdata/");       
    File file = new File(datapath + "/tessdata/" + "eng.traineddata");       
    if (!file.exists()) {
            Log.d("mylog", "in file doesn't exist");           
        dir.mkdirs();           
        copyFile(context);       
    }

        mTess = new TessBaseAPI();       
    String language = "eng";
        mTess.init(datapath, language);
        //Auto only
        mTess.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO_ONLY);   
    }

    public void stopRecognition() {
            mTess.stop();
    }

    public String getOCRResult(Bitmap bitmap) {
        mTess.setImage(bitmap);       
    String result = mTess.getUTF8Text();
        return result;
    }

    public void onDestroy() {
        if (mTess != null)
            mTess.end();
    }

    private void copyFile(Context context) {
        AssetManager assetManager = context.getAssets();
        try {
            InputStream in = assetManager.open("eng.traineddata");
            OutputStream out = new FileOutputStream(datapath + "/tessdata/" + "eng.traineddata");                    byte[] buffer = new byte[1024];
            int read = in.read(buffer);
            while (read != -1) {
                out.write(buffer, 0, read);
                read = in.read(buffer);
            }
        } catch (Exception e) {
            Log.d("mylog", "couldn't copy with the following error : "+e.toString());
        }
    }
}

Remember to keep the traineddata file for the language that you need tesseract to recognize in the assets folder. For this example we have eng.traineddata.

Now only the final step remains to build our Android app for OCR. Now we just need to use this class and call required methods for recognition.
In the Activity in which you need to recognition, just Initialize an object of this class :

private MyTessOCR mTessOCR;
mTessOCR = new TessOCR(MainActivity.this); 

Once initialized you can just call the following method and pass a bitmap as a parameter, your android app should read the text from the bitmap and return you a string now :

String temp = mTessOCR.getOCRResult(bitmap);

Temp now contains the string value of the text that is read from the bitmap that you passed as an argument in the code above! Congratulations, now you are ready to build your first OCR app for Android. If you want to learn how to recognize text region, and want an example for this code check out the following :
Text Region Detection using OpenCV.
Text Recognition app in Android Example.
Keep Coding, do comment you problems if you have any problems.

Don’t miss these tips!

We don’t spam! Read our privacy policy for more info.

Sharing is caring!

7 thoughts on “Creating OCR Android app using Tesseract in Android Studio Tutorial”

  1. Hello, I'm trying to use it for Bulgarian language.
    I do the following:
    1. Replace language code from 'eng' to 'bul'
    2. Use bul.tranineddata file instead of eng.traneddata file. (I downloaded bul.traineddata from https://github.com/tesseract-ocr/tessdata

    But mTess.init(datapath, language); fails with:
    E/Tesseract(native): Could not initialize Tesseract API with language=bul!

    Maybe I'm missing some step or this is wrong lang code?

Leave a Reply to Unknown Cancel Reply

Your email address will not be published.