9 May 2025
Why it is the very plausibility of AI-generated case law that should put lawyers on their guard
*
Consider five cases recently cited in a High Court housing judicial review in London:
R (on the application of Ibrahim) v Waltham Forest LBC [2019] EWHC 1873 (Admin)
R (on the application of KN) v Barnet LBC [2020] EWHC 1066 (Admin)
R (on the application of El Gendi) v Camden London Borough Council [2020] EWHC 2435 (Admin)
R (on the application of H) v Ealing London Borough Council [2021] EWHC 939 (Admin)
R (on the application of Balogun) v London Borough of Lambeth [2020] EWCA Civ 1442
On the face of it, what can we say about these five cases?
Well, first, they have mundane – indeed boring – names. The sort of case names that one would expect to exist in everyday cases. These are case names to which one would not give a second glance.
Second: you can also see that each case is against a London council, with that council status set out in three different ways – “LBC”, “London Borough of [x]” and “[y] London Borough Council”. Such variation is common on case names. And as housing judicial review cases, the defendant being a local council is what one would expect.
Third: they are in the correct case style – in four cases the “R (on the application of [x]” bit at the beginning of the case name matches the “(Admin)” bit at the end – for is the usual way for the names of judicial review cases in the Administrative Court to be set out. And this form often survives when a case goes to the Court of Appeal – so that fifth case does not look unusual either.
Had the cases been placed in the, say, family court or patent court then those case names and citations would look odd – but here, all so normal, all so unexceptional.
Fourth: even the case numbering is plausible and within the range for the year in question – eg, the Administrative Court had at least 3562 numbered cases in 2020 – and so a 2020 case numbered 2435 would not seem strange.
And fifth: they are all recent (2020 onwards) cases against local authorities – again, exactly what one would expect to see in a legal submission in a case about local councils in a well-litigated, always-moving area like housing. You would expect to see a bias towards recent cases to illustrate legal points in this context.
So all the cases not only look look normal, but they are plausible names and citations for housing judicial review cases.
The cases are not only superficially plausible, they are deeply plausible and on many levels.
There is nothing inherent in that list of case names and citations that would make any experienced lawyer doubt that they exist.
*
Now lets move on to how these five cases were used in the High Court case submissions by the claimant’s lawyers, so as to illustrate legal points (case names emphasised):
“Moreover, in R (on the application of Ibrahim) v Waltham Forest LBC [2019] EWHC 1873 (Admin), the court quashed the local authority decision due to its failure to properly consider the applicant’s medical needs, underscoring for necessity the careful evaluation of such evidence in homelessness determinations. The respondent’s failure to consider the appellant’s medical conditions in their entirety, despite being presented with comprehensive medical documentation, renders their decision procedurally improper and irrational”.
“The respondent’s failure to provide a timely response and its refusal to offer interim accommodation have denied the appellant a fair opportunity to secure his rights under the homelessness legislation. This breach is further highlighted in R (on the application of KN) v Barnet LBC [2020] EWHC 1066 (Admin) where the court held that procedural fairness includes timely decision-making and the provision of necessary accommodation during the review process. The respondent’s failure to adhere to these principles constitutes a breach of the duty to act fairly.”
“In R (on the application of El Gendi) v Camden London Borough Council [2020] EWHC 2435 (Admin), the High Court emphasised that failing to provide interim accommodation during the review process undermines the protective purposes of homelessness legislation. The court found that such a failure not only constitutes a breach of statutory duty but also creates unnecessary hardship for vulnerable individuals. The respondent’s similar failure in the present case demonstrates a procedural impropriety warranting judicial review.”
“The appellant’s situation mirrors the facts in R (on the application of H) v Ealing London Borough Council [2021] EWHC 939 (Admin) where the court found the local authority’s failure to provide interim accommodation irrational in light of the appellant’s vulnerability and the potential consequences of homelessness. The respondent’s conduct in this case similarly lacks a rational basis and demonstrates a failure to properly exercise its discretion.”
“The appellant’s case further aligns with the principles set out in R (on the application of Balogun) v London Borough of Lambeth [2020] EWCA Civ 1442 — where the Court of Appeal emphasise that local authorities must ensure fair treatment of applicants in the homelessness review process. The respondent’s conduct in failing to provide interim accommodation or a timely decision breaches the standard of fairness”.
*
Again there is nothing glaringly untoward on the face of these passages.
For example, a case with a High Court citation is placed in the High Court, the case with the Court of Appeal citation is placed in the Court of Appeal.
These are the sort of passages which are set out in hundreds of legal submissions every day – where case law used to demonstrate or show legal points.
Indeed the language employed – “further highlighted in”, “mirrors the facts”, “aligns with the principles set out in” – even shows a pleasing variation in style in using cases to support legal points.
So not only are the case names and citations mundane and plausible, so are the uses to which the cases are put in the submission.
There is nothing on the face of the case names, their citations and the way they are deployed in the submissions which would alert anyone but a deep specialist (who would simply not recognise the cases in their specialised heavily litigated area of law) that something was not right.
*
But.
None of these five cases exist.
All five cases are fake.
All five cases are fabricated.
Notwithstanding the plausibility of the case names, and the citations, and the legal points they are attractively used to support – all are mere inventions.
It would take a skilled human forger to come up with such plausible names and citations and legal points.
Or it would take something else.
*
The five cases – non-cases – referred to above are, of course, from the unhappy recent case of Ayinde, R (On the Application Of) v The London Borough of Haringey [2025] EWHC 1040 (Admin) – a case which (unfortunately for some of those involved) certainly does exist, though the facts of the case are such that one would think it was invented.
I have written about this case over at Prospect this week – please click and read here – and it is an extraordinary case.
*
But first, here are the ordinary – that is, not extraordinary – features of the case.
The claimant was a homeless man challenging his local authority so that he can be housed. One good thing about this case was that he was housed during the litigation, and so for him (at least) the case worked out well. Justice was served from his perspective, and he suffered no prejudice (and gained no advantage) because of what else happened in this case.
The defendant local authority were dire – at least in the litigation – and were so woeful in the conduct of the case they even managed to be disbarred from defending the claim. Nothing in this case is to the credit of the local authority. Quite rightly the costs of the claim were made against the local authority. Again, such conduct by a local council in a case like this is not surprising.
Furthermore the merits of the case were to the advantage of the claimant, including the medical evidence.
This was thereby a case where the claimant was in a strong position, regardless of the haplessness of the local authority.
*
And it is now the case becomes odd.
The claimant’s lawyers put in legal submissions which rested on various cited cases – and these cited cases included the five fake cases above.
The legal points these cases were used for by the claimant’s lawyers – see the passages above – were mainly for illustration, rather than to assert and rely on a binding precedent – and one suspects real cases were available to illustrate those very same points.
In other words: there was probably no need to invent any case for the purposes of the submissions. Genuine cases, no doubt, could have been available to illustrate the same points.
And so there is no obvious benefit from inventing cases – it was not as if any of the fabricated cases were a game-changer which would turn the claim from losing to winning. Indeed, it looked as if the claim would have succeeded anyway.
As the judge in this matter says at one point in the judgment:
“The submission was a good one. The medical evidence was strong. The ground was potentially good. Why put a fake case in?”
Why indeed?
*
We do not know with absolute certainty how this situation arose.
The judge in this case did not need to probe that deeply – it was for him enough for him that the cited cases were fake, and that there was no good reason for the fake cases to be cited.
The judge therefore used the rare “wasted costs jurisdiction” to award costs against the claimant’s lawyers personally.
The judge also ordered that his judgment be published, which means it is now a signal to all legal practitioners of what is at stake if fake cases are relied upon.
Of course – if you think about it – there will be a limit to what wasted costs can be awarded in respect of fake cases, for the other side can hardly charge for time reading cases that do not exist – but wasted costs orders are something which all lawyers should and do take seriously.
*
Of course, there is a plausible explanation for what happened – and it is indeed also the most charitable explanation for what happened.
And that explanation is that the claimant lawyers used some form of Artificial Intelligence (AI) – in particular a large language model (LLM – and not Master of Laws!) like ChatGPT, which superficially is like a search engine.
The judge in this case mentioned that such AI was a possibility, but he did not make a finding.
But the AI/LLM explanation makes sense.
The sheer boringness and plausibility of the case names and citations and the ways the cases were employed suggest either a master human forger or a well-stocked/well-educated LLM. It would otherwise be hard to come up with things that, well, normal-sounding.
There is also the singular fact that the claimant lawyers gained no advantage from the fake cases – they had a strong case anyway, where the defendant was not even able to put in a defence, and with strong medical evidence. None of the five fake cases seemed to be determinative of the case, and they were such that real cases probably would have been done.
The claimant lawyers themselves probably did not realise to a late stage that the cases were fake.
The lack of any advantage gained, and the plausibility of the cases, point to a LLM search engine generating legal hallucinations.
There is no other explanation which fits the available information.
(And readers of this blog will recall the Christmas story where AI came up with a superficially plausible but false author and book.)
*
This case is now being used as morality story – to warn lawyers not to rely on AI in their legal research.
Over at his blog Richard Moorhead protests that that this is not actually an AI case – and in one sense he is right, as there was no finding of fact that AI was used.
But that is perhaps like saying that the hallowed negligence case of Donoghue v Stevenson was not about a snail in a bottle of ginger ale because it seems that allegation was never actually proved at trial. Yet it is still known as the snail in the ginger ale bottle case.
And this is destined to be known as the “AI” fake cases judgment – used to warn and terrify generations of lawyers to come.
*
For this is very much a case about AI – in that even if AI was somehow (implausibly) not used to invent those five cases, then the effects of those fabrications are very much what would happen if the lawyers had used AI.
And the sheer plausibility of the cases and their citations and the legal points they were used to illustrate are very much what lawyers should be on their guard against with AI LLMs.
The results of queries given to LLMs can be very plausible indeed.
They were not scarily outlandish, but scarily normal.
*
Finally, this post will make three further observations and points about this case.
*
First, the fake cases only really came to light because on the eve of the hearing, the defendant lawyers made an application for wasted costs (even though they were disbarred from mounting a defence).
This application for wasted costs was such that it was intended – mischievously – to offset the costs that would be awarded against the council for their own woeful conduct of the case.
In other words, the fakery came to light because it was tactically useful to a party to a case for it to do so.
The fake cases were in a document dated August 2024 but were not spotted as problematic until February 2025. Until then they had not been checked by anybody.
This suggests that such fake cases may be more widespread – but just not noticed.
*
Second: the conduct of the claimant lawyers when challenged – a remarkable letter and evasions in court, doubling-down – made things far worse. Had the lawyers promptly said there had been a fundamental error because of the searches/queries made, and provided alternative authorities, and apologised, then a court may have been less harsh.
Under-resourced parties to litigation can make errors: the important thing is to correct them promptly.
*
Third: you may also have noticed I have not named the lawyers or the parties – that is because I did not need to do so for the purposes of this blog, and I usually shy away from personalising thing in blogposts generally (at least lower than Cabinet level).
Here the import of this case is not just about lawyers in a particular case – the exceptional way in which this instance came to light indicates there may be a more widespread problem, where plausible looking things do not get a second look.
*
My own view is that AI/LLMs can do certain simple law-related tasks, but legal research is something which always requires a human mind, and it is something fundamental to being a lawyer.
Some would say using AI/LLMs for legal research is fine, as long as you check the results before relying on them.
My view is that, given the current state of AI/LLMs, that even doing this is dangerous, as the results can looks so plausible you may think they don’t need checking.
You will not notice anything wrong.
**
For further reading on this case, see this by David Burrows and also this by Gordon Exall at the estimable Civil Litigation blog.
I agree with all of this.
As a specialist in the field, I am driven to say that there are real cases supporting all of the claimant’s points except for one – the claimant argued that provision of interim accommodation pending review under section 188(3) Housing Act 1996 was a mandatory duty, and cited the fake case of R (on the application of El Gendi) v Camden London Borough Council in support. This is just wrong, it is a discretion, not a duty and there are no cases to support that point.
My account of this judgment, with even more housing law, is here – https://nearlylegal.co.uk/2025/05/the-cases-that-werent/
Even legal textbooks can wrongly describe what a case decided, so this case also constitutes a warning to any advocate against stacking a document with unnecessary authorities said to support fairly run-of-the-mill propositions and not checking the authority first.
I suspect that the essential irrelevance of the cases was why nobody bothered tracking them down for months before the hearing, and that if they had been vital to the case or cited for any challengeable proposition this would either not have happened or played out differently.
The case of Roberto Mata v. Avianca decided in the Southern District of New York in 2023 was a pretty egregious example of citation of fake AI authorities – https://storage.courtlistener.com/recap/gov.uscourts.nysd.575368/gov.uscourts.nysd.575368.54.0_8.pdf
There was a paper published in April 2025 by Yale researchers that says hallucination detection in LLMs is fundamentally impossible if the model is only trained on correct outputs. No matter how advanced the model is, without explicit examples of errors, it cannot learn to identify what is false..
https://arxiv.org/abs/2504.17004
Is there a danger that non-existent “cases” created by AI/LLMs that are used in cases but are not detected at the time may then begin to be findable in case law through normal research/search methods?
“This case is now being used as morality story – to warn lawyers not to rely on AI in their legal research.”
What I say to this emphatically is “of course”. Of course lawyers need to be careful, more so than any other profession, not to rely on generated content. They are officers of the court, and litigation involves findings of fact. Lawyers need to be meticulous to the nth degree.
What you suggest about the likelihood of other erroneous citations going unnoticed ought to be alarming. Those citations are apt to be recycled, and their presence in existing authority would give them weight. There’s no excuse. Lawyers have access to a LexisNexis. Use that. That’s what it’s for.
I notice that the LexisNexis site has something on it about AI. Let’s hope it wasn’t a hallucination there. That might be a reasonable excuse. Surely, though… surely they have carefully designed it to guard against this sort of thing when searching for citations.
I use ChatGPT as a research tool. You can give it standing instructions. My standing instructions are that it must provide a link to each site it quotes. You may not be surprised to hear that as well as creating fake case names, it can create fake links. Yesterday it gave me links to a couple of cases on BAILLI that did indeed take me to BAILLI but to a page saying Not Found.
AS DAG says, ChatGPT is *superficially* like a search engine. But it’s not a search engine. I still, however, find it a really useful tool. I can give it a vague description of a half-remembered case and it will usually make a better shot at finding it than I would with Google. The key thing is that you have to check its results. If you have asked it to provide the addresses of each site it references, it takes very little time to click on each one and make sure that 1. the case is real and 2. it has interpreted it correctly.
It ought, of course, do a further pass on its output to check that any URLs it is quoting do actually exist and not have a 404 response.
In my own field, I’ve found the AI tools give a mixture of responses: some ok, some a reasonable starting point, some total nonsense. But without further (human) knowledge, analysis and research, there’s no way to know which bucket the answer falls into.
Yes, it should. But the small charity that allow public access to a selection of our decision does not allow this: “BAILII’s policy is that judgments on BAILII should not be accessible using search engines”.
The Find Case Law beta is available for recent decisions and can be used for this. I don’t know whether those cases behind the paywalls are available to LLMs (other than their own in house ones).
It seems to me that the term “AI” is a bit of a misnomer. The large language model may be artificial, but it has no intelligence – it knows nothing about the world in general or the law in particular. They are essentially statistical models designed to analyse how language works and produce new text according to their algorithmic rules.
It is nothing sort of remarkable that large language models can generate text that appears so plausible in the first place. Like Samuel Johnson’s quote about a dog walking on his hind legs – you may be surprised to see it done well, but you should be shocked to find it done at all.
The likelihood is that the models today are as bad as they will ever be, and they will only get better in future. For some tasks, they are already as good if not better than people. Who wants to read and summarise a million pages of due diligence materials or disclosed evidence? Even if you had an unlimited budget, people and time, people make mistakes too. As always, trust, but verify.
The likelihood is the models are as *good* as they will ever be, simply because they’ve already been trained on almost all of the training data available. The only way this technology gets better is with more training. It doesn’t “learn how to get better” or “get smarter”. That’s just how the technology works.
The most recent, ultra-hyped release of ChatGPT’s new model is *more* prone to halucinations than its predecessors.
I agree with Jon.
‘AI’ devotees don’t want to hear it but this is their technology at peak, after having so many billions of dollars spent on it people have lost count.
It’s not going to get better.
The improvements come in two forms. Better training with more and more data is one form. The greatest improvements come in the models themselves. ChatGPT is based on the transformer class of model, which was a revolutionary step beyond recurrent neural networks. Researchers will develop better models and we’ll go from there.
Jon,
The key limiting factor with current models is not the availability of training data but the technical features and capabilities of the underlying code. LLMs employ a feature known as Markov chains, a stochastic process from probability theory that enables an algorithm to predict and mathematically model the arrangement of words in to sentences.
I respectfully submit that your claim is *entirely* wrong, for the following reasons.
First, mathematicians are hard at work to refine the work of Andrey Markov and come up with better ways to mathematically model language. This will improve the output of LLMs. Second, software engineers are hard at work to write better implementations of stochastic processing as software; this, too, will improve the output of LLMs. Third, companies such as nVidia are investing literally billions of dollars in to techniques that will improve the ability of the silicon to execute the models within computers. Fourth, the entire semiconductor and technology industry continues to much forward – despite e.g. Moore’s Law – with faster and better chips.
I *think* it was Nicholas Wood [I’m sorry, I can’t find the citation], but it was one of the judges of the Rainhill Trials, who claimed that a man would die if subjected to speeds in excess of 30 miles per hour. George Stephenson’s “Rocket” managed 29 in the trials and improved further. Today, the Shanghai Maglev operates at 268mph.
In 1903, Orville and Wilbur Wright achieved powered flight for one person with the Wright Flyer, at Kitty Hawk. On January 21st, 1976, Concord departed on its first supersonic flight with passengers, from London to Bahrain.
In March 1986, I started my first professional job, as a trainee programmer for Bournemouth Borough Council, in Dorset. The mainframe on which I worked had 3 megabytes of RAM and 200 megabyte hard drives with 19 “platters” to store data. See here: https://commons.wikimedia.org/wiki/File:Disk_drive_of_professional_large_computer_system_%281970s%29_with_removable_disk_pack_as_storage_medium_inside,_from_%27International_Computers_Limited%27.jpg
Today, my smartwatch has massively more processing power and massively more storage capacity than that mainframe did 40 years ago.,
Very respectfully, you have absolutely zero understand of the technological principles that underpin AI and thus provide its present operational constraints and appear to be willfully ignorant of the unfathomable speed with which technology advances.
And bear in mind, please, that one of the most significant limitations presently applied to ChatGPT has nothing to do with any of the rebuttal points I offer here, but the simple fact that it is being shared by literally millions of humans at any given moment in time and, as such, every response it gives is inherently time-limited – each answer returned is prepared using milliseconds of processing power. As AI technology becomes more ubiquitous and costs come down, individual law firms will be able to afford to buy and run their own [pre-trained] models on their own hardware – and at that point the quality of responses will rise exponentially.
Again, I don’t mean to be disrespectful, but you could not be further from the truth.
This is very timely from my perspective, since I have just been working on a new set of guidelines for students about what use they are permitted to make of Gen AI. I told them that it is permitted to use it as a stepping stone in the research process, but although it can provide useful summaries of facts and theories, it sometimes distorts information and ideas and they must trace everything back to a source written by a human being. If they fail to do that, they have not done their due diligence. I will now use this blog as a cautionary tale, secure in the knowledge that I am following my own guidelines and referring students to a source written by a human.
I have edited the post this morning to remove lingering typos, etc.
At least those typos indicate my posts are not written by AI!
Thank you for another fascinating journey into the Alice in Wonderland world that is the law. Like those old sea charts – ‘Here Be Dragons’ we find new dangers – phoney AI systems.
All’s well that ends well, poor chap at the centre got housed, the lawyers got a flea in the ear, a small dent in their wallets and must report to the headmaster. Whether six of the best will be forthcoming we shall see.
There is an infelicity though, the phrase ‘may also be the most charitable explanation’ looks worth a deeper look. Perhaps the whiff, the tiniest whiff of the proverbial rat. Does a judge still have access to hurdles – dragging for the use of?