Searchable Memories

I do so much reading and writing about the IT industry I forget to emphasize some of the joys of using Debian GNU/Linux for my IT.

One of my favourite things about IT is searching. With Google, for instance, we have a powerful tool for finding what’s out there. Of course, with a big hard drive, I can end up with a lot of stuff to search locally:

Package: recoll (1.17.3-2) Personal full text search package with a Qt GUI This package is a personal full text search package is based on a very strong backend (Xapian), for which it provides an easy to use and feature-rich interface.

Package: recoll (1.17.3-2)
Personal full text search package with a Qt GUI
This package is a personal full text search package is based on a very strong backend (Xapian), for which it provides an easy to use and feature-rich interface.

Cool, eh? Further, I can configure Recoll to index just a few selected directories so my search results are not cluttered with Linux source-code and other stuff I am unlikely to use in my work. Typically, I search PDF, word-processing, spreadsheets and some text files. Often I like to search the OCR results from the PDFs of US DOJ v M$. That’s a gold-mine.

Quoting Bill Gates: People ask, “Will the Internet be the thing that kills you?” I say, with tongue in cheek, “No. It’s all the other things that will kill us, because we’re so distracted by the Internet.‘’

Of course it is the Internet that is killing the Wintel monopoly because FLOSS and products running it are so easy to find on the web. What’s really killing M$ is the inability to change away from a business plan that has brought in hundreds of $billions by the efforts of all M$’s slaves. GNU/Linux is a product of the Internet as much as anything because Linus and others actually do share Free/Libre Open Source software as the licence permits, to run, examine, modify and distribute for $0 .

Recoll is a GUI interface to Xapian, a search engine and indexer. While designed for programmers, it can be used for searching all manner of files and it works very well. Features:

  • Transactions: if database update fails in the middle of a transaction, the database is guaranteed to remain in a consistent state.
  • Simultaneous search and update, with new documents being immediately visible.
  • Support for large databases: Xapian has been proven to be scalable to hundreds of millions of documents.
  • Accurate probabilistic ranking: more relevant documents are listed first.
  • Phrase and proximity searching.
    Relevance feedback, which improves ranking and can expand a query, find related documents, categorise documents etc.
  • Structured Boolean queries, e.g. “race AND condition NOT horse”
  • Wildcard search, e.g. “wiki*”
  • Spelling correction
  • Synonyms
  • Omega, a packaged solution for adding a search engine to a web site or intranet. Omega can easily be extended and adapted to fit changing requirements.

I have tens of thousands of documents indexed with no worry about hitting links further down the page, broken links and other nuisances of the web. Why should programmers have all the fun? I certainly don’t need M$ or its software to get a lot done very swiftly and accurately.

About Robert Pogson

I am a retired teacher in Canada. I taught in the subject areas where I have worked for almost forty years: maths, physics, chemistry and computers. I love hunting, fishing, picking berries and mushrooms, too.
This entry was posted in technology and tagged , . Bookmark the permalink.

2 Responses to Searchable Memories

  1. Look what I dredged up from the spam-bin:
    ” the fact that your spurious advertisement for an idiot layer (Recoll) on top of a product (Xapian) that nobody has ever heard of, can it?

    Because, if people had actually heard of it, then the programmers might make money.

    Which is the last thing you want.”

    Just thought I would share it with you to make your day fun-filled… HAHAHA

    Xapian was created by programmers for programmers to assist them in their work of creating and maintaining software. That is sufficient payment for them and their employers and because it is FLOSS the rest of us get to put it too good use.

    The idea that I don’t want programmers to be paid is silly. I worked for years doing programming largely in science/technology from the days of the mainframe/punchcards to the PC-era. In 1970 my first task on one job was to read the assembler programming manual from D.E.C. for a PDP15 so I could convert a standalone programme from PDP-9 instruction-set to PDP15 and I had to do it in a control-room near 80dB noise level… Ah, the good old days. I was paid many times for my programming and worked hard. My employers thought it worthwhile too.

    BTW, I did mention Xapian exactly because it was a key piece of the application, hoping to broaden knowledge of this wonderful software. So, TAKE A HIKE, Quibbly!

  2. Quibbly says:

    What a load of crap, Pogson. You’ve outdone yourself.

    Besides which, what’s with this “Why should programmers have all the fun?” drivel?

    Last time I listened to your bleating, the big problem with Microsoft was that they don’t depend upon programmers. AFAIK, the reason that M$ makes their money is supposedly because of salesmen and other shady characters.

    It would hardly be anything to do with the fact that your spurious advertisement for an idiot layer (Recoll) on top of a product (Xapian) that nobody has ever heard of, can it?

    Because, if people had actually heard of it, then the programmers might make money.

    Which is the last thing you want.

Leave a Reply