Tentative GPT4's summary. This is part of an experiment.
Up/Downvote "Overall" if the summary is useful/harmful.
Up/Downvote "Agreement" if the summary is correct/wrong.
If so, please let me know why you think this is harmful.
(OpenAI doesn't use customers' data anymore for training, and this API account previously opted out of data retention)
TLDR:
This article analyzes the competency of GPT-4 in understanding complex legal language, specifically Canadian Bill C-11, aiming to regulate online media. The focus is on summarization, clarity improvement, and the identification of issues for an AI safety perspective.
Arguments:
- GPT-4 struggles to accurately summarize Bill C-11, initially confusing it with Bill C-27.
- After providing the correct summary and the full text of C-11, GPT-4 examines it for logical inconsistencies, loopholes, and ambiguities.
- The article uses a multi-layered analysis to test GPT-4's ability to grasp the legal text.
Takeaways:
- GPT-4 demonstrates some competency in summarizing legal texts but makes mistakes.
- It highlights ambiguous terms, such as "social media service," which is not explicitly defined.
- GPT-4's judgement correlates with human judgement in identifying potential areas for improvement in the bill.
Strengths:
- Provides a detailed analysis of GPT-4's summarization and understanding of complex legal language.
- Thoroughly examines potential issues and ambiguities in Bill C-11.
- Demonstrates GPT-4's value for deriving insights for AI safety researchers.
Weaknesses:
- Limitations of GPT-4 in understanding complex legal texts and self-correcting its mistakes.
- Uncertainty about the validity of GPT-4's insights derived from a single test case.
Interactions:
- The assessment of GPT-4's understanding of legal text can inform AI safety research, AI alignment efforts, and future improvements in AI summarization capabilities.
- The recognition of GPT-4's limitations can be beneficial in fine-tuning its training and deployment for more accurate summaries in the future.
Factual mistakes:
- The initial confusion between Bills C-11 and C-27 by GPT-4 incorrectly summarizes the bill, which is a significant mistake.
Missing arguments:
- The article does not provide a direct comparison between GPT-4 and previous iterations (e.g., GPT-3.5) in understanding complex legal texts, though it briefly mentions GPT-4's limitations.
- There is no mention of evaluating GPT-4's performance across various legal domains beyond Canadian legal texts or multiple test cases.
If your at all familiar with Canadian politics over the past few years you may have heard of this mysterious 'C-11' (a.k.a. 'the Online Streaming Act') being mentioned in various places.
It refers to a complex bill attempting to regulate an area that is notoriously difficult to regulate in a fair and balanced way. Namely, online media, which includes the streaming giants, news, radio, but also smaller scale productions such as podcasts, streamers, etc. It attempts to mirror the regulations that broadcast and radio content conform to in Canada while accommodating the unique characteristics of the internet.
It notably excludes purely personal communication such as LW posts, though not without some asterisks if there is any for-profit component, hence the polarizing nature of the legislation, which will be touched on in part 2.
I chose this example to represent complex legal language because it's:
Keep in mind that although the bill is primarily about amending an existing law, there are various other laws amended as well by the bill, which have been excluded from this analysis so that the potential reader-base will be larger than a handful of Canadian law experts.
(The countries I know of have law(s) broadly similar to the Broadcasting Act regulating TV, radio, etc... but in case you've never heard of this before I would recommend skimming through the wikipedia page at least)
So let's begin.
At first I asked for a quick summary without providing any of the bill's text, relying entirely on GPT-4's knowledge.
However to make things interesting I provided only the URL link without an associated title. The response was:
So right off the bat, we see a big problem, although it got C-11 right, the 'Digital Charter Implementation Act' is not C-11 at all! That's Bill C-27 which hasn't even been passed yet.
As a summary of C-27 it's fine enough for someone who needing an overview, and remarkably competent considering it must have based this just from prior training while being prompted by a parsing error.
But this is clearly not what we want. So I continued, but without pointing out the mistake, to test if it could self-correct its mistake:
Although this is still a continuation of analyzing C-27, we're indeed getting substantive analysis if that was the topic! And quite easily too considering I wrote the briefest of prompts.
It still mentions 'C-11' so it seems to be quite confused. It's not the first time I've seen this happen so it's indicative of how fragile it can be.
In this aspect GPT-4 does not improve upon GPT-3.5, nonetheless, the level of summarization performance is encouraging for our intended analysis.
Moving on to the main show, I directly copy-pasted the entire text of the summary section of C-11 and asked for it to identify any logical inconsistencies, to make sure it understood correctly. (I find that GPT-4 can't convincingly BS beyond more than 1 layer of abstraction so a multitude of possible tricks could be used here)
Reading along with the actual text is highly recommended if your curious about the nuances that it gets right and those that it does not, which are too many to elaborate on in detail. The Bill Summary is very accessible and written in simpler language, without any French.
The points are broadly correct as far as I understand, and it's a decent reduction in word length from the original, mostly from cutting out the sub-points, but not nearly as much as I expected. This points to the original summary being well written enough that not much more can be cut.
Interestingly, it does not give the points with subpoints any more length for explanation, such as the fifth point, which in the actual bill has 7 subpoints:
If you had just read GPT-4's summary you would might not have guessed how much importance the bill drafters attached to this part specifically.
On the other hand, how many folks would actually read it instead of skimming it?
So although it might be different from how a trained lawyer would summarize the text, it doesn't seem altogether inferior for the purposes of the casual reader.
As it did not identify any logical inconsistencies I tried to prompt in an alternative way with:
It begins sounding like boiler-plate but there are some interesting points raised, which correspond to what's later expanded upon in the main body of the bill without having read it, so GPT-4's 'judgement' does seem to have some correlation with human 'judgement'.
Though how much so would depend heavily on personal views or perhaps legal philosophy of how detailed laws need to be.
Moving on to the main text, my method was copy-pasting each part as input prompt, the first part begins immediately after the Bill Summary, up until Subparagraph 3(1)(d)(iii.7), and GPT-4 provided what it thinks of the possible inconsistencies in each. This will be repeated for each subsequent part. With the overall conclusions being made at the end.
The reason for this approach is because simply summarizing the bill in each part seemed superfluous when there's already a summary of the overall bill that is accessible enough for most readers. Plus it makes the wording choices more interesting to analyze.
The bit about 'social media service' not being explicitly defined is remarkably spot on. The bill does indeed continue to use this specific term, a dozen times total throughout, without actually defining it!
'Addressing potential overlap or conflicts' is also a solid insight as the bill continues to be quite obtuse in this regard, though perhaps GPT-4 just guessed this.
To get an idea of how wordy the main text is, an amendment to a single subparagraph in the actual bill, albeit the lengthiest in the first part, is as follows:
And there are quite a few of these later on too. Most normal readers would likely have stopped here if they bothered reading past the Bill Summary, so there's huge value in being able to quickly condense all the legalese into a more readable format.
For comparison here is how GPT-3.5 interprets the same section:
Immediately noticeable is the different approach of GPT-3.5, it really gets down to the weeds without bothering to mention, or consider, that these issues could be addressed in the remaining text of the bill. Although each point seems well reasoned, in context it seems somewhat 'dumber' than GPT-4.
Though they do both immediately, and correctly, identify 'social media services' as an ambiguous, not defined, term.
There's a lot more that could be said but due to length, the post will be continued in Part 2, depending on demand, so comment below if you want to read more.