My own Search Engine..?
Is it just me, or have search results gotten really shitty? When I search for something, the only fucking results that I get are AI generated SEO bullshit. And I'm done with it.
I have been thinking about what I want a search engine to do. It should do basically this:
-
Search Wikipedia. Apparently you can download the entirety of (English) Wikipedia in just 19GB (compressed). This is excluding any page history/talk pages, and the dump doesn't contain media. But still, 19GB is not that much. It expands to around 90GB, but I think that's manageable if I run it on a local server.
-
Crawl and index Stack Exchange sites. Think Stack Overflow, Server Fault, Super User, Unix & Linux, Mathematics, and Ask Ubuntu.
-
Optionally record my browsing history via a browser extension. This should be optional and contain a blacklist containing domains that shouldn't be recorded — like search pages. Results from my own browser history should be less prominent.
-
Crawl websites on the small web. If you're wondering what the heck I'm talking about, this explanation from Marginalia Search describes it pretty well:
Remember when you used to explore the Internet, when you used to discover cool little websites made by people and it wasnāt just a bunch of low effort content mill listicles and blog spam?
In recent years, something has been simmering: Some call it the āSmall Internetā. I hesitate to call it a movement, that would imply a level of organization and intent that it does not possess. Itās a disjointed group of like-minded people that recognize that the Internet has lost a certain je ne sais quoi, it has turned from a wild and creative space, into more of shopping mall. Where ever you go, youāre prodded to subscribe to newsletters, to like and comment, to buy stuff.There's a bunch of small indexes and search engines like this. I was planning to curate my own index, based on existing indexes. There should also be an option for people to submit sites.
-
Index some prominent Linux-oriented wikis, like the Arch Wiki and Gentoo Wiki. They seem to run the same wiki software as Wikipedia, so maybe they have a download option as well?
-
Searching man pages and RFCs seems cool?
-
Support for bangs, like DuckDuckGo. I'd want
!g
for Whoogle results and!y
for YouTube.
This hypothetical search engine would be mainly for personal use, but I wouldn't mind putting it on a public domain if I were to build it.