A close reading of the “AI” fake cases judgment

16 thoughts on “A close reading of the “AI” fake cases judgment”

  1. I agree with all of this.

    As a specialist in the field, I am driven to say that there are real cases supporting all of the claimant’s points except for one – the claimant argued that provision of interim accommodation pending review under section 188(3) Housing Act 1996 was a mandatory duty, and cited the fake case of R (on the application of El Gendi) v Camden London Borough Council in support. This is just wrong, it is a discretion, not a duty and there are no cases to support that point.

    My account of this judgment, with even more housing law, is here – https://nearlylegal.co.uk/2025/05/the-cases-that-werent/

  2. Even legal textbooks can wrongly describe what a case decided, so this case also constitutes a warning to any advocate against stacking a document with unnecessary authorities said to support fairly run-of-the-mill propositions and not checking the authority first.

    I suspect that the essential irrelevance of the cases was why nobody bothered tracking them down for months before the hearing, and that if they had been vital to the case or cited for any challengeable proposition this would either not have happened or played out differently.

    The case of Roberto Mata v. Avianca decided in the Southern District of New York in 2023 was a pretty egregious example of citation of fake AI authorities – https://storage.courtlistener.com/recap/gov.uscourts.nysd.575368/gov.uscourts.nysd.575368.54.0_8.pdf

  3. Is there a danger that non-existent “cases” created by AI/LLMs that are used in cases but are not detected at the time may then begin to be findable in case law through normal research/search methods?

  4. “This case is now being used as morality story – to warn lawyers not to rely on AI in their legal research.”

    What I say to this emphatically is “of course”. Of course lawyers need to be careful, more so than any other profession, not to rely on generated content. They are officers of the court, and litigation involves findings of fact. Lawyers need to be meticulous to the nth degree.

    What you suggest about the likelihood of other erroneous citations going unnoticed ought to be alarming. Those citations are apt to be recycled, and their presence in existing authority would give them weight. There’s no excuse. Lawyers have access to a LexisNexis. Use that. That’s what it’s for.

    I notice that the LexisNexis site has something on it about AI. Let’s hope it wasn’t a hallucination there. That might be a reasonable excuse. Surely, though… surely they have carefully designed it to guard against this sort of thing when searching for citations.

  5. I use ChatGPT as a research tool. You can give it standing instructions. My standing instructions are that it must provide a link to each site it quotes. You may not be surprised to hear that as well as creating fake case names, it can create fake links. Yesterday it gave me links to a couple of cases on BAILLI that did indeed take me to BAILLI but to a page saying Not Found.

    AS DAG says, ChatGPT is *superficially* like a search engine. But it’s not a search engine. I still, however, find it a really useful tool. I can give it a vague description of a half-remembered case and it will usually make a better shot at finding it than I would with Google. The key thing is that you have to check its results. If you have asked it to provide the addresses of each site it references, it takes very little time to click on each one and make sure that 1. the case is real and 2. it has interpreted it correctly.

    1. It ought, of course, do a further pass on its output to check that any URLs it is quoting do actually exist and not have a 404 response.

      In my own field, I’ve found the AI tools give a mixture of responses: some ok, some a reasonable starting point, some total nonsense. But without further (human) knowledge, analysis and research, there’s no way to know which bucket the answer falls into.

      1. Yes, it should. But the small charity that allow public access to a selection of our decision does not allow this: “BAILII’s policy is that judgments on BAILII should not be accessible using search engines”.

        The Find Case Law beta is available for recent decisions and can be used for this. I don’t know whether those cases behind the paywalls are available to LLMs (other than their own in house ones).

  6. It seems to me that the term “AI” is a bit of a misnomer. The large language model may be artificial, but it has no intelligence – it knows nothing about the world in general or the law in particular. They are essentially statistical models designed to analyse how language works and produce new text according to their algorithmic rules.

    It is nothing sort of remarkable that large language models can generate text that appears so plausible in the first place. Like Samuel Johnson’s quote about a dog walking on his hind legs – you may be surprised to see it done well, but you should be shocked to find it done at all.

    The likelihood is that the models today are as bad as they will ever be, and they will only get better in future. For some tasks, they are already as good if not better than people. Who wants to read and summarise a million pages of due diligence materials or disclosed evidence? Even if you had an unlimited budget, people and time, people make mistakes too. As always, trust, but verify.

    1. The likelihood is the models are as *good* as they will ever be, simply because they’ve already been trained on almost all of the training data available. The only way this technology gets better is with more training. It doesn’t “learn how to get better” or “get smarter”. That’s just how the technology works.

      The most recent, ultra-hyped release of ChatGPT’s new model is *more* prone to halucinations than its predecessors.

      1. I agree with Jon.

        ‘AI’ devotees don’t want to hear it but this is their technology at peak, after having so many billions of dollars spent on it people have lost count.

        It’s not going to get better.

      2. The improvements come in two forms. Better training with more and more data is one form. The greatest improvements come in the models themselves. ChatGPT is based on the transformer class of model, which was a revolutionary step beyond recurrent neural networks. Researchers will develop better models and we’ll go from there.

      3. Jon,
        The key limiting factor with current models is not the availability of training data but the technical features and capabilities of the underlying code. LLMs employ a feature known as Markov chains, a stochastic process from probability theory that enables an algorithm to predict and mathematically model the arrangement of words in to sentences.

        I respectfully submit that your claim is *entirely* wrong, for the following reasons.

        First, mathematicians are hard at work to refine the work of Andrey Markov and come up with better ways to mathematically model language. This will improve the output of LLMs. Second, software engineers are hard at work to write better implementations of stochastic processing as software; this, too, will improve the output of LLMs. Third, companies such as nVidia are investing literally billions of dollars in to techniques that will improve the ability of the silicon to execute the models within computers. Fourth, the entire semiconductor and technology industry continues to much forward – despite e.g. Moore’s Law – with faster and better chips.

        I *think* it was Nicholas Wood [I’m sorry, I can’t find the citation], but it was one of the judges of the Rainhill Trials, who claimed that a man would die if subjected to speeds in excess of 30 miles per hour. George Stephenson’s “Rocket” managed 29 in the trials and improved further. Today, the Shanghai Maglev operates at 268mph.

        In 1903, Orville and Wilbur Wright achieved powered flight for one person with the Wright Flyer, at Kitty Hawk. On January 21st, 1976, Concord departed on its first supersonic flight with passengers, from London to Bahrain.

        In March 1986, I started my first professional job, as a trainee programmer for Bournemouth Borough Council, in Dorset. The mainframe on which I worked had 3 megabytes of RAM and 200 megabyte hard drives with 19 “platters” to store data. See here: https://commons.wikimedia.org/wiki/File:Disk_drive_of_professional_large_computer_system_%281970s%29_with_removable_disk_pack_as_storage_medium_inside,_from_%27International_Computers_Limited%27.jpg

        Today, my smartwatch has massively more processing power and massively more storage capacity than that mainframe did 40 years ago.,

        Very respectfully, you have absolutely zero understand of the technological principles that underpin AI and thus provide its present operational constraints and appear to be willfully ignorant of the unfathomable speed with which technology advances.

        And bear in mind, please, that one of the most significant limitations presently applied to ChatGPT has nothing to do with any of the rebuttal points I offer here, but the simple fact that it is being shared by literally millions of humans at any given moment in time and, as such, every response it gives is inherently time-limited – each answer returned is prepared using milliseconds of processing power. As AI technology becomes more ubiquitous and costs come down, individual law firms will be able to afford to buy and run their own [pre-trained] models on their own hardware – and at that point the quality of responses will rise exponentially.

        Again, I don’t mean to be disrespectful, but you could not be further from the truth.

  7. This is very timely from my perspective, since I have just been working on a new set of guidelines for students about what use they are permitted to make of Gen AI. I told them that it is permitted to use it as a stepping stone in the research process, but although it can provide useful summaries of facts and theories, it sometimes distorts information and ideas and they must trace everything back to a source written by a human being. If they fail to do that, they have not done their due diligence. I will now use this blog as a cautionary tale, secure in the knowledge that I am following my own guidelines and referring students to a source written by a human.

  8. I have edited the post this morning to remove lingering typos, etc.

    At least those typos indicate my posts are not written by AI!

  9. Thank you for another fascinating journey into the Alice in Wonderland world that is the law. Like those old sea charts – ‘Here Be Dragons’ we find new dangers – phoney AI systems.

    All’s well that ends well, poor chap at the centre got housed, the lawyers got a flea in the ear, a small dent in their wallets and must report to the headmaster. Whether six of the best will be forthcoming we shall see.

    There is an infelicity though, the phrase ‘may also be the most charitable explanation’ looks worth a deeper look. Perhaps the whiff, the tiniest whiff of the proverbial rat. Does a judge still have access to hurdles – dragging for the use of?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.