'PowerArchiver' bert.hubert@netherlabs.nl Idea ---- Allow user to scan file paper documents to folders. One folder might be two pages of an itemised bill, or several pages of a letter. Scanning is a one-button job. The goal is absolute ease. For example, a folder could be configured while a page is scanning. Furthermore, papers should be as readable on screen as possible [1]. To this end, the user can mark the 'interesting' rectangle of a page, which is shown by default such that the entire page fits on the screen. [2] Why? ---- Because I spend a lot of time searching for papers people sent me. Folders ------- Folders will have 'print' and 'export to pdf' buttons. Multiple pages can be scanned in a row, be reordered, processed (improve readability, flip, rotate, annotated) and then be assigned to folders, either all in one go or separately to perhaps different folders. Each folder has a date, keywords and tags. These tags have a hierarchy, allowing the selection of 'all bills from December 2005', or 'all bills from Netherlabs computer consuling'. Processing ---------- There are several buttons to flip, rotate and enhance a scanned page. The goal is to allow the process to be nothing more than pressing 'scan', 'paperize' [3] and 'store'. Viewing ------- Each folder will consist of one to several pages. By default pages are zoomed such that the interesting rectangle is zoomed to fill the screen. This means that the entire page is larger than the screen, for which we supply scrollbars. In many cases, two interesting rectangles will fit side by side. Searching --------- Over time, there will be a lot of folders. It might prove useful to OCR pages to allow for real searching. Current open source OCR programs appear lacking, but several interesting ones are for sale (for Linux as well). Each page would then have a textual representation as well, which is generally not displayed. 'non-paper pages' ----------------- It should also be possible to import PDF documents, possible converted from Word or OpenOffice. Storage ------- Each page is stored as a PNG [4]. A folder is a directory with a numerical name which never changes. In this directory reside the PNGs, plus a file which stores the name of the folder, the date, any keywords attached and possibly the 'interesting' rectangle, unless this can be stored as PNG metadata. The idea is to make sure pages are accessible using software available 10 to 20 years from now. Another goal is to allow archives to be merged easily. [1] Modern CPUs have more than enough horse power to due proper interpolation on demand. There is no excuse for grainy pictures [2] It turns out it is feasible to automate the selection of this 'interesting' rectangle, but the user can override it [3] Assuming mostly bitonal ('black and white') scans, this is the process of making sure the scan appears on the screen as in real life. [4] DjVu also seem promising