Why a post on Incremental Reading PDF’s is taking so long?
At first it sounded like an easy idea, you just plod around some strategy, write down the steps involved and try it up. I’ve done that, but the problem of dealing with PDF’s was, as I was about to find out, of a much higher order of magnitude.
Randy Pausch clearly stated “brick walls are there, not to stop you, are there to stop the others”, if you haven’t had the opportunity to see what over 11 million visitors already have, I recommend you take some time and watch he’s last lecture.
Incremental reading PDF”s is clearly not a childhood dreams is just some small project, but unless some one comes out with some brilliant idea I feel like banging my self to this wall, until someone helps me go over it instead of being stop by it.
I’d like to have ideas on how you’re dealing with the same issue of incrementally reading PDFs? email me or just post a comment.
What I have already tried
Read and Copy.
Perhaps the easiest approach is to just read the text, and as soon as you find something important, you select the text and copy it to the clipboard and back into SuperMemo. This leads to a lot bad formatting spaces, as PDF’s don’t have a single clue about text structure. You might of course paste this extract to some txt editor and delete (trim) all white spaces, and then imported down to supermemo.
But, by the time you’re making the first extract you’ve already used a lot more cognitive resources that you’re suppose to, in order to understand what you were reading in the first place, and you don’t even have the source on your extract, or any highlight in the PDF to acknowledge what you’ve extracted.
I’ve downloaded every single trial OCR engine out there — I personally own Abby Fine reader but was looking for something to solve the SuperMemo issue. After loading this programs with PDF articles, OCR them, and saving them as TXT, RTF, DOC, DOCX, HTML, XHTML, etc. (as exact copy), my conclusion is that it simple takes to much time to select those picture and text that, although are so obvious to the eyes, somehow, until know, no software is able recognize with an accuracy over 70%. I’m not talking about OCR per se, text recognition accuracy is over 95% in almost any good OCR software. But the text structure it self is really badly recognized. To make things short, OCR doesn’t make a SuperMemo importable file.
Best you can hope for is that the PDF article has a simple structure and is short in magnitude, then you can load it to SupeMemo, but this is definitively not a complete solution.
Read and Copy via Autohotkey.
This is the closest I have come to process text from PDF articles, but this is not a simple AHK scripting solution. It will not work everywhere on any PDF reader or any platform, and you can’t or otherwise you won’t have a complete orderly processing of any article so that you know what you’ve extracted, from what source, and have unique PDF-ID to be able to find it fast enough in case need it a couple of months (or years) after making the extract; even less to to know if some particular piece of text has already been introduced into SuperMemo. You need all these to able to read incrementally PDF with out importing the complete article or book into SuperMmoe, at the same time you want to make it fast, and almost automatic, and of course you’ll want some method not so unorthodox that the time to come some SuperMemo or you PDF reader version comes out, you won’t need to change you method again and again.
Another situation that makes this process difficult to sort out is that, more and more, I’m relying on a simple file system information research database, to liberate my self from proprietary software formats, only txt and jpg are now my preferred way to save files. I would probably go as far as saving everything in XHTML, but I’d like to have access to my database at any time with out having to worry that some particular application random-ware has become forgotten-ware, and course the XHTML is non XZalphaHTML or something like that making my old files unreadable with out hour of converting them.
Life long learning demands standardized tools, and that is something I was not aware until my collection went over 25,000 q&a and more then 6’000 full articles, then I understood PDF can’t also go as full books into my SuperMemo, because it makes no sense to include (currently) any information, if I’m not certain it will be read (must read lectures), or has already been read (extracts from books, now on future PDF extracs). It took me this number of importing to acknowledge that simple is better, one of PW rules.
Being said that, don’t think we need to restrain from including any article we’d like to read on the future on some systematic way that lets us, if time is available, import it or read it and import its extracts later on to SuperMemo, but we must deal it outside SuperMemo to keep it from getting in to a knowledge overweight issue, so difficult to fix with out a lot of wasted time pruning article no longer needed or indefinitely postponing low priority articles while doing our repetitions and taking time out of repetitions times (aka learning and reviewing).
Scrapbook, a Fire Fox add-on, is great for reading outside supermemo and later importing into any SRS (my prefference of course still SuperMemo, still). I’m using a simple IFFRS (Incremental Folder Filtering Repetitions System) on Scrapbook for all my would like to read articles. I had previously overlook this add-on but current version is much better, thanks Marcin Rybacki for remembering me about this great add-on.
I’m not far from achieving a systematic way of reading PDF’s into SuperMemo but I’m also not enough close to a simple nice solution that would deserve posting it fully
On the last note, some visitors have being searching for the cognywiki on the blog, certainly this must a result of mentioning it before, well a couple of days ago I got in touch with Oliver Geordon, cognywiki’s creator and he was kind enough to let me post his method here, so that should make a new post later on. I think he has a lot of nice principles on how to deal with notetaking at least worthwhile the time to try it out, specially if your objective is not only life long learning but also life long note-taking ala Thomas Edison.
Hope you guys doing great, sorry for such a wordy post, sometimes I feel like I’m on adderall, my brain keeps going and going, if only all this thinking was only full on great ideas. Until next time.