I know Calibre can remove DRM, but it seems that Calibre does not remove things like watermarks, references to the buyer by name, etc. Now maybe I can try to find those manually, but that is an error prone process. Plus, what if they embed a unique digital signature that ties back to me? I understand that this is a very uncommon practice, but I do not want to find myself in a bad place.
I suppose the only way to remove a digital signature of any sort is to buy two of the same e-book by different people, diff them, and remove anything that differentiates them.
Is there any tool that does this or automates the process? am I being too paranoid, and this is not a real threat?
have a look on “snowdrop” (search together with “steganography”), its basically the opposite of what you want, but worth mentioning here. watermarks could be placed into whitespace (not limited to actual spaces or linebreaks, intentionally changed usage of paragraphs, tabs or even page boundaries could possibly be detected after scanning andeven after OCR. IMHO snowdrop uses -depending on choosen operation mode- small errors like misspelled words, commata etc but also has a mode that comes along with fine grammar and without misspelled words…
how do you make sure that by diff’ing two versions you do cover "everything’ that has been deliberately placed into both documents but share literally the same informations?
lets say you bought two books at two different stores with two different watermarks. if the watermark contains the date and time of the purchase and the only difference of this were the minutes because you bought them within the same hour, the remaining watermark would point to all buyers that bought exactly this book in this hour - worldwide. but still it could be “very” precise depending on all other(!) buyers, if they exist at all within that timeframe. what if the watermark includes unix epoch? then the part which is the same in both watermarks would not be bound by hours, but by seconds, 10seconds, 100seconds etc.
and you could not know if there were other watermarks hidden that just happened to be the same for your two (three.?) purchases (same country, continent, payment method, credit card holder name, name of internet provider used during purchase, browser used etc.) it fully depends on the creator of the watermark what would be included and what not. if you happem to know all that (without any possibleexemptions) you might be on the safe side, but if not…
my general suggestion here is:
- if you want to be sure to not getting into trouble, then just don’t do it.
- if that book is too expensive compared to its content, just not buying it possibly also helps the market to fix the problem.
- save that time and instead help those who already fight for a better world.
- search already licence free books (or such as “cc” licensed) and promote those instead, help improving free resources like openstreetmap, wiki* but do not publish licence-poisoned content there, wtite it yourself, alway.
- write your own book and publish it free.
just to mention… the “safe” side sometimes seems limited but maybe is actually not, if you really look at it.
The bad news is that uploading e-books will involve programming on your part (for your sanity at least).
The good news is that it should be far easier than other mediums.
If you are approaching from a complete safety perspective (cause you live in a fiefdom that owes tribute to the publishers guild), then you’re going to want to OCR the pages of the book and use the text to make a brand new book free from metadata. I’m pretty sure a python crash course could get you up and running in a month or 6.
If you want what’s closest to the original product, then you’ll need a python script that strips everything from the book into just a text document, then re-convert back into your own book. You’ll have to review the text document to see if any random code was included in the book like invisible text.
Both options are so simple from a programming perspective that I’ve never seen scripts to strip e-book protections. A real (the solution is left un-worked as a challenge for the reader). And from what I know, the publishers have switched to focusing on selling hard copies as their bread and butter, and striking deals with libraries for other revenue. Big money is still in mandatory university textbooks.
Source: Never actually done what you’re asking for
buy the same book with another account and compare them. then you should be able to find out where the fingerprint is.
Just use calibre to change the format to rtf or even txt, and then back to mobi or epub, and poof.
Neither of those formats store any significant metadata, with rtf the original formatting gets preserved though.
This is such a bad idea! The formatting will be lost and the resulting document will look like shit! Especially for books where they use graphics.