--- title: README emoji: 💩 colorFrom: purple colorTo: green sdk: static pinned: false license: bsd-2-clause --- Philosophy Hacking is a New Danger That No One is Talking About Related To Machine Learning As the use of machine learning and AI continue to grow, so too does the understanding of how they work in general. As that expertise continues to grow, malicious actors are going to experiment more and more with various ways to utilize these tools in malicious ways. I believe this makes LLMs and language models specifically exploitable to hacking efforts within the status quo. These hacking efforts will be unique in their execution compared to anything we have ever seen before. What is Jailbreaking? Jailbreaking is the process of removing software restrictions imposed by the manufacturer on a device. This allows the user to have more control over the device and to install software that is not approved by the manufacturer. Jailbreaking is typically done on mobile devices, such as smartphones and tablets. It is also possible to jailbreak some computers, such as personal computers and gaming consoles. There are a number of reasons why someone might want to jailbreak their device. For example, they might want to install software that is not available in the official app store, or they might want to customize the look and feel of their device. Jailbreaking can be a risky process, as it can expose the device to malware and other security threats. It is important to only jailbreak from a trusted source and to take steps to protect the device after it has been jailbroken. What is a SQL Injection? A SQL injection is a type of attack in which an attacker inserts malicious SQL code into a database query. This can be done by tricking the user into entering malicious input into a form or by exploiting a vulnerability in the application. Once the malicious code is inserted, it can be used to gain unauthorized access to the database, modify or delete data, or even take control of the application server. SQL injection attacks are a serious threat to the security of web applications. They can be used to steal sensitive data, such as credit card numbers and passwords, or to disrupt the operation of a website or web application. There are a number of ways to prevent SQL injection attacks. One way is to use prepared statements, which are a way of parameterized queries that prevent SQL injection attacks. Another way is to use input validation, which is a way of checking user input for malicious content. SQL injections are a serious threat to the security of web applications. It is important for developers to take steps to prevent SQL injection attacks, such as using prepared statements and input validation. The vulnerability has been known since the dawn of applications that utilize SQL, and it is still a battle that is constantly fought to this day. It is important to take steps to prevent SQL injection attacks, as they can have a serious impact on the security of a web application. What is a Philosophy Attack? A philosophy attack is a type of attack that uses philosophical arguments to deceive or manipulate an LLM. These attacks can be very effective, as LLMs are not always able to distinguish between good and bad arguments. If you think about these concepts broadly, a philosophy attack can also be utilized as a SQL injection. It could be used to ‘trip up’ the logic of an LLM, to deceptively gain access to root systems, just like a SQL injection. If I wanted to exploit a particular system or piece of software via SQL injection, I would simply look to inject my code somewhere in the system. From there, the injection would be able to take care of the rest. Having played around with debating LLMs on philosophy myself, I believe philosophical arguments can be the mechanism of ‘injection’ for a brute force attack in a very similar way. Why are Philosophy Attacks So Effective? There are a few reasons why philosophy attacks are so effective. First, LLMs are trained on a massive amount of text data, which includes a lot of philosophical arguments. This means that LLMs are familiar with the language and concepts of philosophy, which makes them more susceptible to philosophy attacks. Second, LLMs are not always able to distinguish between good and bad arguments. This is because LLMs are not trained to evaluate the logic of arguments. As a result, they may be fooled by arguments that are logically flawed. Third, these are arguments and discussions that can often confound and brain melt upper graduate PhD candidates. They are tricky subjects, and there are no right or wrong answers to some of the questions. The brightest minds in the world often struggle with questions around what is reality, vs simulation, what is real vs virtual, and what do those two separate concepts mean? Getting someone, or a machine that can learn, stuck in a loop on these thoughts can have very negative consequences. How Can We Train LLMs against Philosophy Attacks? There are a few things that can be done to train LLMs against philosophy attacks. First, we can train LLMs on a dataset of philosophy attacks. This will help LLMs to recognize the patterns of philosophy attacks. Second, we can train LLMs to evaluate the logic of arguments. This will help LLMs to identify and reject logically flawed arguments. The problem with trying to get a human, a machine, or a brain in general to defend itself against someone willing to frame these arguments as attacks is that they are often irrefutable at their core when you begin to examine the arguments. A lot of postmodern philosophy is based on the exploration of what is illusory vs what is real. The invention of the internet and ‘virtual worlds’ did nothing whatsoever but fuel these discussions throughout the discipline. You cannot program an LLM to not talk about a specific philosopher or philosophy, you would have to code them out from discussing essentially the entirety of postmodern philosophy if one were to try to brute force out these vulnerabilities. What are the benefits of training LLMs against philosophy attacks? There are several benefits to training LLMs against philosophy attacks. First, it will make LLMs more resistant to deception and manipulation. This will make them more reliable and trustworthy. Second, it will make LLMs more useful for a variety of tasks. For example, LLMs can be used to generate creative text, translate languages, and answer questions. By training LLMs against philosophy attacks, we can make them more effective at these tasks.