Hister - Your Own Search Engine

monica_b1998@lemmy.world · 6 days ago

Hister - Your Own Search Engine

Taleya@aussie.zone · 4 days ago

Ooof. I’ll be honest. I head hister, i think about that nostradamus theory.

Sibbo@sopuli.xyz · 6 days ago

So did anybody try this and wake to share their experiences?

Does it use lots of CPU, RAM or disk?
Are the search results actually good?
Does it use a browser extension or so to get new visited sites, or do I have to import my history every day?
Does it also have a crawler?

Also, why would I use this over e.g. YaCy?

eutampieri@feddit.it · 6 days ago

What’s the difference between this and SearXNG?

uuj8za@piefed.social · 6 days ago

Huh, just from browsing the homepage, this seems to be more for searching local files.

xnx@piefed.social · 4 days ago

It’s not. It makes local versions of websites you visit so you can search them

eutampieri@feddit.it · 6 days ago

Hmm, thanks. I did browse the page but I didn’t understand well that aspect

activistPnk@slrpnk.net · 6 days ago

I currently use the find-grep function in emacs, which is basically: find . -type f -exec grep 'my.*search.*pattern' {} +

To do PDFs, I use something like find . -type f -iname \*pdf -exec pdfgrep 'my.*search.*pattern' {} +

My problem is generally when TOKEN1<space>TOKEN2 has a line break between tokens. It’s fucking annoying that grep is line-by-line. I wonder if Hister solves that problem. But from the website, I see no advanced syntax. I would love to search a pattern like word1 w/s word2, which would find cases where word1 and word2 appear in the same sentence. And word1 w/p word2 to match cases where two words are in the same paragraph.

cravl@slrpnk.net · 5 days ago

Replacing line breaks with nulls first is an option. That’s a lot of extra processing for very large blocks of text though.

Using regular grep is possible with the right flags, or you could also use pcre2grep with the -M flag, which should be available on every distro nowadays. See this Stack Overflow article for details.

activistPnk@slrpnk.net · 4 days ago

pcregrep is not automatically installed with Debian but it’s in the official repos. It seems common to get:

pcregrep: Too many errors - abandoned.
pcregrep: Error -8, -21 or -27 means that a resource limit was exceeded.
pcregrep: Check your regex for nested unlimited loops.

But it will help in many cases. I can see that it works on sufficiently small files. I noticed the built-in grep function for emacs can be modified to use pcregrep w/-M added instead of grep, which I find quite important because emacs makes it very easy to jump around to visit different results. In the end it’s still a hack.