4 Oct 2000 Raphael   » (Master)

Question: maximum information density in the print-scan process?

Does anybody know how much information can be stored and reliably retrieved from a piece of paper, using a standard printer (inkjet or laser, 300dpi) and a scanner (1200 dpi)? Since a piece of paper can be affected by bit rot (literally) and can be damaged in various ways, some error correction (e.g. Reed Solomon) and detection (e.g. CRC) is necessary. Also, I do not want to rely on high-quality paper so I have to accept some ink diffusion and "background noise" introduced by defects in the paper.

I found some references to 2D barcodes (such as DataMatrix, PDF-417 and others) but these codes are designed to be scanned efficiently by relatively cheap and fast CCD scanners. I am not worried about the scanning time (I am using a flatbed scanner) or the processing time (I can accept some heavy image processing). Also, I would like to encode raw bits and pack as much information as possible on a sheet of paper, regardless of its size. These 2D barcodes have a fixed or maximum symbol size and it is necessary to use several of them if I want to fill a sheet of paper, wasting space in the duplicated calibration areas and guard areas.

PDF-417 has a maximum density of 106 bytes per square centimeter (686 bytes per square inch, for you retrogrades), which is quite low. It is certainly possible to do better, but I would like to know if there are any standards for doing that. I am especially interested in methods that are in the public domain, because most 2D barcodes are patented (e.g. PDF-417 is covered by US patent 5,243,655 and DataMatrix is covered by 4,939,354, 5,053,609 and 5,124,536).

If you know any good references, please post them in a diary entry (I try to check the recent diaries once a day, but I may miss some of them) or send them to me by e-mail: quinet (at) gamers (dot) org. Thanks!

Hmmm... This is a bit long for a diary entry. But I don't think that such a question deserves an article in the front page. If you think that I should I have posted this as an article, then send me an e-mail and I will re-post this question and edit it out of my diary.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!