AI Safety Chatbot
Hello World! The AISafety.info team is launching a prototype of the AI Safety Chatbot. The chatbot uses a dataset of alignment literature to answer any questions related to AI safety that you might have, while also citing established sources. Please keep in mind that this is a very early prototype and despite citing references, it may still provide inaccurate or inappropriate information. The overall objective is to help people better understand AI Safety issues based on alignment research using an LLM. This helps with tailoring content to the user's needs and technical level. The chatbot can hopefully be used by both newcomers to AI safety, as well as researchers and engineers who want to get up to speed on specific topics. How it works This chatbot builds upon AlignmentSearch. Our work also expands upon the alignment research dataset (ARD) developed during AI Safety Camp 6. This involved updating and curating the dataset to focus more on quality over quantity. Additionally, we created a process to regularly fetch new articles from selected sources. The ARD contains information about alignment from various books, research papers, and blog posts. For a full list of all the sources being used, look at the readme of the repository on GitHub or HuggingFace. We use a process called retrieval-augmented generation (RAG) to generate the answers. Since LLM data is static, RAG increases the capabilities of a LLM by referencing an external authoritative knowledge base before generating a response. So the process can be roughly broken into - 1) getting and storing the data in a vector database, and then 2) generating an answer based on that data. The information storage process is outlined below: Source: DeepLearning.AI (2023) “LangChain: Chat with Your Data” * Document Loading: The articles are scraped from various sources such as the ones mentioned above. They are then parsed and stored in an SQL database while making sure that metadata values fields are valid. * S