{"id":1696,"date":"2015-11-26T13:13:10","date_gmt":"2015-11-26T05:13:10","guid":{"rendered":"http:\/\/www.magicandlove.com\/blog\/?p=1696"},"modified":"2015-11-26T13:31:55","modified_gmt":"2015-11-26T05:31:55","slug":"processing-with-ocr","status":"publish","type":"post","link":"http:\/\/www.magicandlove.com\/blog\/2015\/11\/26\/processing-with-ocr\/","title":{"rendered":"Processing with OCR"},"content":{"rendered":"<p>This is a short Processing sketch to demonstrate the use optical character recognition (OCR) with the <a href=\"https:\/\/github.com\/tesseract-ocr\/tesseract\">Tesseract OCR<\/a> engine. I used the Mac OSX platform for testing. Here are the steps.<\/p>\n<p><!--more--><\/p>\n<p><em>Install the Tesseract in OSX with all supported languages by using Homebrew<\/em><\/p>\n<pre>\r\nbrew install imagemagick\r\nbrew install tesseract --all-languages\r\n<\/pre>\n<p><em>Install the Java JNA binding from <a href=\"http:\/\/tess4j.sourceforge.net\/\">Tess4j<\/a><\/em><br \/>\nI just download and unzip the package. From the <strong>dist<\/strong> folder, I copy the <strong>tess4j.jar<\/strong> to the <strong>code<\/strong> folder of the Processing sketch.<\/p>\n<p><em>Install the jai_imageio.jar<\/em><br \/>\nIt also needs the <a href=\"http:\/\/www.oracle.com\/technetwork\/java\/javasebusiness\/downloads\/java-archive-downloads-java-client-419417.html#jaiio-1.0_01-oth-JPR\">Java Advanced Imaging Image Tools<\/a> from the Java archive. Copy the <strong>jai_imageio.jar <\/strong>from the <strong>lib<\/strong> folder to the <strong>code<\/strong> folder of the Processing sketch.<\/p>\n<p><em>Copy the dynamic libraries<\/em><br \/>\nI also copy the <strong>libtesseract.dylib<\/strong>, <strong>liblept.dylib<\/strong> from the Homebrew directories to the <strong>code<\/strong> folder of the Processing sketch, with their loading path patched with the @loader_path.<\/p>\n<p><em>Copy the trained language data<\/em><br \/>\nFinally, I copy the English trained language file, <strong>eng.traineddata<\/strong>, from the Homebrew directory to the <strong>data<\/strong> folder of the sketch. It also needs a sub-folder named<strong> tessdata<\/strong>.<\/p>\n<p>Here is the testing result. The top section is the original test image in PNG format (400 x 300). The bottom section is the recognised string with each character shown one by one in the program. <\/p>\n<p><a href=\"http:\/\/www.magicandlove.com\/blog\/wp-content\/uploads\/2015\/11\/ocr0289.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.magicandlove.com\/blog\/wp-content\/uploads\/2015\/11\/ocr0289-200x300.png\" alt=\"\" title=\"ocr0289\" width=\"200\" height=\"300\" class=\"alignnone size-medium wp-image-1700\" srcset=\"http:\/\/www.magicandlove.com\/blog\/wp-content\/uploads\/2015\/11\/ocr0289-200x300.png 200w, http:\/\/www.magicandlove.com\/blog\/wp-content\/uploads\/2015\/11\/ocr0289.png 400w\" sizes=\"auto, (max-width: 200px) 100vw, 200px\" \/><\/a><\/p>\n<p><em>The complete source code<\/em><\/p>\n<pre lang=\"java\">\r\nimport net.sourceforge.tess4j.*;\r\nimport java.awt.image.BufferedImage;\r\n\r\nTesseract ocr;\r\nBufferedImage img;\r\nPImage pimg;\r\nString res, show;\r\nint idx;\r\n\r\nvoid setup() {\r\n  size(400, 600);\r\n  background(0);\r\n  ocr = new Tesseract();\r\n  ocr.setDatapath(dataPath(\"\"));\r\n  pimg = loadImage(\"testing.png\");\r\n  img = (BufferedImage) pimg.getNative();\r\n  show = \"\";\r\n  idx = 0;\r\n  try {\r\n    res = ocr.doOCR(img);\r\n    \/\/   println(res);\r\n  } \r\n  catch (TesseractException e) {\r\n    println(e.getMessage());\r\n  }\r\n  frameRate(25);\r\n}\r\n\r\nvoid draw() {\r\n  background(0);\r\n  image(pimg, 0, 0);\r\n  if (idx < res.length()) {\r\n    show += res.charAt(idx);\r\n    idx++;\r\n  } else {\r\n    noLoop();\r\n  }\r\n  text(show, 20, pimg.height+30);\r\n}<\/pre>\n<p>The only mistake I can spot is 'ocr' becomes 'cor'.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is a short Processing sketch to demonstrate the use optical character recognition (OCR) with the Tesseract OCR engine. I used the Mac OSX platform for testing. Here are the steps.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[79,66],"tags":[141,62],"class_list":["post-1696","post","type-post","status-publish","format-standard","hentry","category-software-2","category-testing","tag-ocr","tag-processing-org"],"_links":{"self":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts\/1696","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/comments?post=1696"}],"version-history":[{"count":6,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts\/1696\/revisions"}],"predecessor-version":[{"id":1703,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/posts\/1696\/revisions\/1703"}],"wp:attachment":[{"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/media?parent=1696"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/categories?post=1696"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.magicandlove.com\/blog\/wp-json\/wp\/v2\/tags?post=1696"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}