provides a competitor to Big Tech, so their governance decisions on the internet will be less extractive. Most tech companies are ultimately doing some sort of search, be it for partners or jobs or restaurants or whatnot.
can build applications that are unlikely to be built by Big Tech. For example new forms of government can built if you have open source search and uncensorable data as primitives
(There are more benefits, I haven't yet figured out the best sales pitch for this but I could come up with something if given time.)
I will need around $1M in funding to build this, and since I would prefer to open source it, it will most likely have to be non-profit funding.
I am open to putting in some effort to increase awareness, network more, if I believed I would get funding. But I'm currently not that optimistic. Hence I wanted a second opinion.
The trick is to convert text into embeddings and embeddings into locality-sensitive hashes. 1000 bytes plaintext -> 3 byte hash, so commoncrawl 2 PB -> 6 TB index. <100 ms latency search possible on consumer HDDs. Can provide more technical details if anyone wants.
IMO it is possible to run an open source search engine on a consumer PC with higher search accuracy than Google.[1]
Benefits of open source search engine:[2]
(There are more benefits, I haven't yet figured out the best sales pitch for this but I could come up with something if given time.)
I will need around $1M in funding to build this, and since I would prefer to open source it, it will most likely have to be non-profit funding.
I am open to putting in some effort to increase awareness, network more, if I believed I would get funding. But I'm currently not that optimistic. Hence I wanted a second opinion.
The trick is to convert text into embeddings and embeddings into locality-sensitive hashes. 1000 bytes plaintext -> 3 byte hash, so commoncrawl 2 PB -> 6 TB index. <100 ms latency search possible on consumer HDDs. Can provide more technical details if anyone wants.
Obligatory comment for LW. This project has zero benefit if short timelines (ASI by 2030), I am not betting on short timelines though.