My own Search Engine..?

Is it just me, or have search results gotten really shitty? When I search for something, the only fucking results that I get are AI generated SEO bullshit. And I'm done with it.

I have been thinking about what I want a search engine to do. It should do basically this:

  • Search Wikipedia. Apparently you can download the entirety of (English) Wikipedia in just 19GB (compressed). This is excluding any page history/talk pages, and the dump doesn't contain media. But still, 19GB is not that much. It expands to around 90GB, but I think that's manageable if I run it on a local server.

  • Crawl and index Stack Exchange sites. Think Stack Overflow, Server Fault, Super User, Unix & Linux, Mathematics, and Ask Ubuntu.

  • Optionally record my browsing history via a browser extension. This should be optional and contain a blacklist containing domains that shouldn't be recorded — like search pages. Results from my own browser history should be less prominent.

  • Crawl websites on the small web. If you're wondering what the heck I'm talking about, this explanation from Marginalia Search describes it pretty well:

    Remember when you used to explore the Internet, when you used to discover cool little websites made by people and it wasnā€™t just a bunch of low effort content mill listicles and blog spam?
    In recent years, something has been simmering: Some call it the ā€œSmall Internetā€. I hesitate to call it a movement, that would imply a level of organization and intent that it does not possess. Itā€™s a disjointed group of like-minded people that recognize that the Internet has lost a certain je ne sais quoi, it has turned from a wild and creative space, into more of shopping mall. Where ever you go, youā€™re prodded to subscribe to newsletters, to like and comment, to buy stuff.

    There's a bunch of small indexes and search engines like this. I was planning to curate my own index, based on existing indexes. There should also be an option for people to submit sites.

  • Index some prominent Linux-oriented wikis, like the Arch Wiki and Gentoo Wiki. They seem to run the same wiki software as Wikipedia, so maybe they have a download option as well?

  • Searching man pages and RFCs seems cool?

  • Support for bangs, like DuckDuckGo. I'd want !g for Whoogle results and !y for YouTube.

This hypothetical search engine would be mainly for personal use, but I wouldn't mind putting it on a public domain if I were to build it.