A Time Capsule
Since the release of ChatGPT in 2022 until spring of this year, I sent occasional updates about the development of AI to my colleagues at the law school. In August of 2023, I shared a more substantial background paper, which you can read here. I thought it might be interesting to share all this here. Kind of interesting to see how things have developed over time.
AI Update 12/22/2022
Hi friends - I’m writing to let you know about a series of informal brownbags that the tech committee will host in the new year. If you are not aware of a new technology called chatGPT (chat.openai.com), you’re in for a surprise. I think it’s safe to assume that most of our students are aware of it. In a nutshell, it’s an AI-driven chat interface that has a remarkable ability to understand natural language, to remember context, to synthesize propositions, to carry on extended conversations about everything from constitutional interpretation to eigenvectors to mountaineering, and to write and revise poems, lyrics, stories, letters, computer code, memos, essays, and exam answers. Well, you get the idea. It is the most disruptive technology in our field that I have ever encountered—both for what it already is and for what it portends. While this capability has long been on the horizon, I had assumed we were a good number of years away from what seems to be here today.
I any event, we thought it would be a good idea to have an informal series of demonstrations and conversations where we could discuss how to approach this new tech in the classroom and how it might figure into practice. And, going forward, we can also talk more generally about new developments in AI and the emerging legal issues cropping up around its adoption and use. These technologies are controversial, subject to litigation, about to get much better quickly along some dimensions, and will hit walls in others.
I’m not announcing any particular times or dates today. We’ll wait until the rush of the first week of the semester is behind us. But I wanted to reach out today to suggest that you give it a try over the break if you haven’t already. Based on my use over the past weeks, I’ve already planned to use it in the classroom if possible. And I know I could benefit from other people’s thoughts, experiences, and practical ideas. It’s a little like figuring out how to teach math when graphing calculators (and Matlab and maple) appeared on the scene — in that it has caused me to reflect on what the essential skills and ideas we are hoping to instill actually are. A new, disruptive tool tends to do that.
I hope everyone has a peaceful, fun, and relaxing break. And I look forward to seeing you all again soon! Christian
Hi all - Following up on the meeting today, I am passing along a list of links that may be helpful in thinking through ChatGPT and related tech. We’re always interested in more, and so keep sending, as some of you have, links to items that could be helpful. And let me know if you’d like to help out. C
AI Update 1/25/2023
Overview and use of chatGPT and other AI
- https://arxiv.org/pdf/2301.04655.pdf (Review of large generative AI models)
- https://github.com/f/awesome-chatgpt-prompts (a collection of ChatGPT prompts)
- https://writings.stephenwolfram.com/2023/01/wolframalpha-as-the-way-to-bring-computational-knowledge-superpowers-to-chatgpt/ (comparing ChatGPT and Alpha)
Higher ed pedagogy:
- https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335905 (law school exam proficiency)
- https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4314839 (bar exam proficiency)
- https://mackinstitute.wharton.upenn.edu/2023/would-chat-gpt3-get-a-wharton-mba-new-white-paper-by-christian-terwiesch/ (Wharton business test-taking)
- https://www.nytimes.com/2023/01/12/technology/chatgpt-schools-teachers.html (opinion article encouraging adaptation)
- https://stanforddaily.com/2023/01/22/scores-of-stanford-students-used-chatgpt-on-final-exams-survey-suggests/ (great but unscientific poll of students on usage and opinion about honor code)
- https://educationalist.substack.com/p/lets-get-off-the-fear-carousel (chatGPT and problems with the academy’s culture)
- https://finiteeyes.net/technology/academic-integrity/ (on situating chatGPT within issues of plagiarism and scholarly integrity more broadly)
- https://criticalai.org/2023/01/17/critical-ai-adapting-college-writing-for-the-age-of-large-language-models-such-as-chatgpt-some-next-steps-for-educators/
- https://www.insidehighered.com/news/2023/01/12/academic-experts-offer-advice-chatgpt
- https://alperovitch.sais.jhu.edu/five-days-in-class-with-chatgpt/ (Report of intensive use of chatGPT in a classroom)
- https://www.nytimes.com/2023/01/16/technology/chatgpt-artificial-intelligence-universities.html (more on higher ed reactions)
- https://openai-openai-detector.hf.space (GPT 2.0 detector)
- https://gptzero.me (GPT detection service)
- https://www.articlerewriter.net (a non-AI service that will rewrite text and can be used to evade GPT detectors)
AI tools:
- https://techcrunch.com/2022/11/23/harvey-which-uses-ai-to-answer-legal-questions-lands-cash-from-openai/ (Harvey - Funded by OpenAI, law-specific (beta))
- https://docs.fermat.ws/getting-started/readme (Fermat - workspaces that permit interaction with AI, including counterargument)
Society and limitations:
- https://futurism.com/cnet-ai-plagiarism (revelation that CNET articles were AI written and often plagiarized by the AI)
- https://www.theverge.com/2015/1/29/7939067/ap-journalism-automation-robots-financial-reporting (An old problem: AI writing earnings reports as far back as 2015 )
- https://arxiv.org/abs/2206.09511 The fallacy of AI functionality (taxonomy of harms from failure)
- https://www.wired.com/story/large-language-models-critique (harms of modern AI generally)
- https://www.nytimes.com/2023/01/23/business/microsoft-chatgpt-artificial-intelligence.html (on emerging competitive landscape among big players)
AI Update 3/23/2023
Hi all - Another update from the Is-AI-the-End-of-Everything team. We’ve split into groups to prepare a report — which we intend to be an updatable platform for information that will support law school staff and faculty as AI-based tools change over time. You may recall that my message in December included our plan to conduct informal ChatGPT brownbags this semester. A lot has happened since then. ChatGPT has been a cultural sensation and the volume of popular and academic writing and conversation about it has made it seem somewhat less important to hold “get to know the tech” brownbags. So too, the introduction of new competitors and new apps based on the underlying tech have made the whole area a fast moving target during these early days. We’d like to gather information and produce something more informed, concise, and helpful.
So we’re now focused fully on producing a report and information resource for you. The areas we’ve identified include:
- Technical Background and Projections,
- AI in Practice (e.g., current and projected used by firms, impact on legal ethics, and potential for disruption of existing legal markets),
- AI in Pedagogy,
- and Ongoing Institutional Engagement (e.g., role of library in maintaining and creating resources, training or classes to develop, the potential of acting as a clearinghouse for practitioners).
I’ve attached the names of the people researching each of these areas (their emails are in the Cc list). It would be helpful for them to hear from you if you had information you’d like to share with them, questions for them, or areas of uncertainty or emphasis you think should be addressed. (One colleague sent along a very interesting use of GPT in an app that might, eventually, have a dramatic impact on document review and the use of research corpora. More like this!) As well, we have discussed how each of these groups might welcome focus groups, interviews, or brownbags as a way of gathering and sharing information.
As always, I’m available to chat if you have questions or thoughts about these technologies or wanted to address it with your students more immediately. For example, I attended one of Lisa’s clinic’s sessions to talk about it, and it was interesting to see what she was doing with ChatGPT in that context. And I’m always happy to talk through how to approach rules for assignments or design of assignments. Our report will address these and other areas, but I and the rest of the tech team are willing to chat anytime.
I hope you all have a great spring break and look forward to seeing you soon, Christian
AI Update 12/2023
Hi friends - I’m writing with a brief AI update. I thought this might be helpful as your attention turns to next semester’s classes. First, I’m attaching a set of memos that our excellent LLM student, (name omitted), has prepared for me over the course of the semester. They collect summaries of and links to relevant articles up until the end of October. If you’re interested in sifting through the firehose of law-related AI writing, ssrn maintains a special topics hub https://www.ssrn.com/index.cfm/en/AI-GPT-3/. The core of the memo I distributed earlier in the semester is still a good guide to how these systems work and how they impact practice and pedagogy. An excellent, recent introduction to LLMs by Andrej Karpathy is available on YouTube, https://www.youtube.com/watch?v=zjkBMFhNj_g. I recommend it. (I went through his videos building GPT models earlier this year as I was learning how this tech works. They are also great and referenced in my memo from earlier this semester.)
Second, the state of the products:
OpenAI’s GPT-4:
- The context size of chatgpt on the web, for the paid version, has increased substantially to ~25,000 words. You can paste whole articles or cases into the prompt. For example, copy and paste a case; ask it to “write a dissent,” “summarize the holding,” “compare and contrast [another case],” etc.
- You can upload PDFs, images, and other documents and chat with chatgpt about them.
- It can generate images.
- It can search the web.
- It can interact via voice when using the mobile apps.
- You can build custom “GPTs” by uploading files and giving special instructions. You can then chat with new custom bot. For example, I created one by uploading our student handbook, resulting in a bot that could answer questions about our policies.
Google: Just released an update to bard.google.com, using a new underlying foundation model they call Gemini. Gemini will come in three varieties, a tiny model for on-device use in Android, a mid-sized model available now via Bard, and a plus model they’re suggesting is better than GPT-4 in many ways. There’s lots of reporting about this. I found this video to be accurate and helpful: https://www.youtube.com/watch?v=toShbNUGAyo&t=1s.
Facebook (Meta) has released a standalone image generator and is continuing to push open source alliances: https://arstechnica.com/information-technology/2023/12/ibm-meta-form-ai-alliance-with-50-organizations-to-promote-open-source-ai/
Apple is working in customary secrecy, and I’d expect to hear about its work in early June at its annual developer conference.
Lexis+ AI is now out. Westlaw has integrated the AI assistant tools from Casetext, which Thomson Reuters bought earlier this year.
Third, my takeaways at the moment:
- Next semester is the first semester during which all the common tools our students use—from operating systems, to word processors, to web browsers, to research tools—will have transformer-based large language models built in.
- If you’ve used these tools and found them lacking or a mere novelty (as I have!), do not let that make you complacent. It is hard to overstate the amount of investment and engineering that is being poured into improving them.
- While our exam-taking software can lock down the exam-taking environment, it will not alter the future or lawyering, which is about to change dramatically.
- Anecdotal observations are that firms are moving quickly to experiment and may already be thinking about headcount. Nothing is certain, but the chance of fundamental disruption of the industry (and thus of our industry) is nonzero.
- I will be integrating this tech into my classes going forward and will expect students to use these tools. For me, it will be part of my job to help them learn to think, write, and argue using AI assistance. I’m experiencing this as a dramatic discontinuity from my approach thus far in my career.
AI Update 4/2024
Hi all - I'm writing with another AI update. I cover the following three topics: The state of the art among large language models; The near-future of LLMs; Non-LLM AI tech. My main message is that you should consider spending some time with the paid versions of the major models if you haven’t already. See below for more on that.
###State of the art among large language models:
The big players among commercially available and useful large language models continue to be: OpenAI (GPT-4), Google (Gemini 1.5), Meta (Llama 2), and Anthropic (Claude 3).
OpenAI, Google, and Anthropic all off free versions of their latest models, but they also offer much more capable models with additional features for $20/month. With the paid models, you can attach large PDFs, discuss whole articles or even books, work with images, and work with models that are much more sophisticated and able to "remember" very long conversations. For example, you can paste it in a bunch of cases or essays and discuss them all without having the model forget the beginning of the conversation. These improvements represent an order of magnitude and sometimes two or more order of magnitude increase in the models' context windows.
I have confirmed with the Dean that we are authorized to use faculty research funds to purchase these subscriptions. I urge you to consider doing so, even if it is only to play around with them. A recent episode of Ezra Klein's podcast has a good discussion of what it's like trying to incorporate these new tools into work like ours. See https://www.nytimes.com/2024/04/02/opinion/ezra-klein-podcast-ethan-mollick.html. If you've only tried the free models, you are likely to be surprised by how capable these larger models are.
Be mindful that these services have different policies on the use of the data you provide. All permit you to opt out of having your data used in further training, but it is sometimes inconvenient to do so. There are also options to purchase group or enterprise accounts, but these are more expensive per seat and do not seem to offer advantages, as yet, for our use cases. It is possible, however, that a clinic or working group might benefit from OpenAI's Team plan. See https://openai.com/chatgpt/team.
Here at the university we also have free access to the Microsoft Copilot (https://copilot.microsoft.com - sign in with your MyID). Copilot uses some incarnation of GPT-4 under the hood, but the context lengths appear smaller (4k or 8k in chat, 18k in a somewhat clumsy notebook interface) than the paid ChatGPT version of GPT-4, and I wasn't able to get quite as good results from Copilot. But it is free for us to use, and you might get excellent results from it.
Google just had an event announcing some improvements to Gemini and further integration into their existing services. I haven’t digested this yet and have not used Gemini very much myself. But Ben Thompson’s thoughts are always worth reading: https://stratechery.com/2024/gemini-1-5-and-googles-nature/.
The best results I have obtained discussing law and legal theory have come from Anthropic's Claude 3 Opus model. (Go to Claude.ai.) I'm going to oversimplify here, but: From a single prompt I get B+ to A- work, and with a little extra prompting, I get A+ answers to exam questions, fairly deeply considered summaries and critiques of scholarship, and citations that are much more likely to be real than hallucinated. I was able to upload a PDF of a county's zoning ordinance and get accurate answers to whether I could build an accessory dwelling unit, how to appeal an adverse decision, etc. We're still at a point of "sometimes great, sometimes not so great" with these frontier foundation models, but when they're good, they are very, very good. And conversing with them, learning how to prod and how to correct, often leads to amazing places. This video does a nice job explaining how the current generation of models stack up against one another: https://www.youtube.com/watch?v=ReO2CWBpUYk.
I don't have anything to report about Lexis or Westlaw. In my brief use of the Lexis offering back in January, I found it slow and disappointing. I expect it will improve (and may already have improved), but my money is on the major LLM companies and on startups that leverage those companies' offerings. For example, Harvey.ai is a startup that has partnered with OpenAI to fine-tune GPT4 for case law. (Fine-tuning involves starting with a large, fully-trained model and then conducting some additional, specialized training to nudge the model's billions of parameters in ways that better model a specific domain, like law.) OpenAI reports that Harvey was fine-tuned with "the equivalent of 10 billion tokens worth of data." And they claim that "[t]he resulting model achieved an 83% increase in factual responses and attorneys preferred the customized model’s outputs 97% of the time over GPT-4." (See https://openai.com/blog/introducing-improvements-to-the-fine-tuning-api-and-expanding-our-custom-models-program -- and scroll down about halfway to see a box containing comparisons of GPT-4's and Harvey's answers to the question "What is a claim of disloyalty?" See also, e.g.: https://www.robinai.com, https://www.darrow.ai, https://www.evenuplaw.com.)
The near future of LLMs:
Meta's Llama 3 likely will drop in the summer, with smaller versions of the new model launching as soon as next week. OpenAI has begun and has perhaps finished training GPT-5, with safety evaluation and testing likely to take several months. They will probably release some sort of GPT update over the summer, maybe a GPT-4.5, but many are speculating that GPT-5 might not be released until after the election. GPT-5 may include techniques (some understood already and some the subject of speculation (just go down the "what is Q*" rabbit hole...)) that will dramatically increase its abilities to reason and to plan.
I cannot emphasize enough that there is a realistic prospect that we are not far away (five years of less?) from a model that outperforms every living attorney in research, writing, and advising tasks under just about any conceivable metric. This is not at all a certainty, but it is astounding to me that it is a real and near-term possibility. What we should do, what we should insist on from our law and lawmaking processes, what we teach students in a world where machines handily outperform us — all these questions have been constantly on my mind, and I hope we can think productively about this together in the coming months and (hopefully) years.
Non-LLM AI-related technologies:
OpenAI's Sora: I'm not going to say anything more about this than that OpenAI has demonstrated a working model that can take text prompts as simple as "Historical footage of California during the gold rush." and produce mind-blowingly realistic video as output. Just go to https://openai.com/sora and check it out if you haven't seen this already.
AI and search: You might have heard of https://perplexity.ai, which aims to replace the google search box with an AI prompt box. Just go and test it out. I have found more useful, though, a paid search alternative, called Kagi (at https://kagi.com. Kagi is a drop-in replacement for google with much better privacy terms but also a number of great features and less of the clutter that has been plaguing google in recent years. One feature I find particularly useful is the ability to read a bullet-point, AI-generated summary of any link without visiting the link. And this includes academic articles. In fact, you can restrict your search to academic articles via a pulldown. Whatever your search engine, you might try exploring the settings to see what AI-enhanced features it has.
Other projects: I was particularly impressed with the explosion of AI projects featured in this thread rounding up some YCombinator-funded companies. https://twitter.com/snowmaker/status/1773402574332530953?s=20. They're things like weather forecasts generated by AI rather than more compute-intensive traditional weather models, text to video, AI-generated voices, and the like. And here's another example: https://app.suno.ai -- just try it out (and word is that there's apparently a competitor about to launch an even better tool for this purpose). Also, we're at the cusp of some very impressive advancements in robotics. All this is enabled by the parallelization of training and inference that transformers unlocked.
Research uses: You might want to think about whether any of your scholarly interests have aspects that might be modeled. Any kind of question that can be approached as a next-token prediction task can possibly, given sufficient data, be modeled using a transformer. It might not be obvious that a given task is susceptible to this approach, but creative re-description can sometimes convert a problem into one that can. (See, e.g., voice, video, images, music, movements of robotic limbs, financial data, etc., all of which have been modeled using transformers by defining the problem as a next-token prediction task.) Think of transformers as computer programs that can speak extremely exotic "languages" if trained to do so. UGA maintains high-performance computer clusters, and I've started to investigate how we might be able to use them, perhaps in cooperation with other departments. Please get in touch if you have research questions that might be able to take advantage of an AI approach.
AI Update 2/2025
Hi all - It's been a long time since my last AI-related missive. Part of the reason for the delay is that AI has become so ubiquitous in the news that's there little I can really add. And, indeed, the big players (OpenAI, Google, Meta, and Anthropic) are still the big players (with one recent arrival). And their latest products, mostly accessible via $20/month subscriptions, are covered as major news, not tech stories.
Here's my big message: Whether the "intelligence" of models continues increase at the rate it has, overcomes significant gaps in certain reasoning and world-modeling tasks, or doesn't for a long, long time, we haven't yet come close to capitalizing all that the tech can already do. That means that even if we shut down all new LLM research today, creating new technologies and services based on existing models would still be incredibly valuable and disruptive. As models move out of chat boxes and increasingly into services suited to them, we'll see more of this. (One tangential point that seems reasonable to insert here: The key to having a decent appreciation of what these things are is to remember that, relative to deterministic computing as we've all come to know it, they're good at the kinds of things we're good at, not the kinds of things you'd normally think a computer would be well-suited to do. They struggle with things we struggle with. But unlike us, they don't get tired. And there are both important respects in which they are super-human and many respects in which their reasoning and understanding of the world is laughable. All at the same time! Like I wrote in my first report, best to think of them as alien intelligences.)
Anything that could be improved by being hooked up to a relentless reasoner stands to get the AI treatment. Early efforts include Anthropic's computer-using agent and OpenAI's Operator. These work by using specially fine-tuned versions of their best models to operate a computer. They work... ok but not yet well. In terms of what they do, think: "Computer, find out all about what ultralight camping gear is on the market, including pros and cons, and go ahead an buy gear that improves on what I have at prices that seem especially good." Again, these models are in many ways better suited to this task than to counting the number of r's in the word "strawberry" -- the successful and reliable doing of which was a major milestone in LLM tech last year. But they still, for safety and reliability reasons, interrupt you a lot to ask for guidance or just plain fail.
Here's a summary of where we are on the foundation models that you're probably familiar with and some you may not be:
Anthropic. Latest model: Claude 3.5 Sonnet. Still some of the best output for what we do. Still somewhat expensive. Poised for a new release. Early efforts at computer-using agents not ready for prime time.
OpenAI. Latest model: o3-mini. Most people are still using 4o, the default in the ChatGPT interface. Coming soon is their best known model: o3. The o-models (but not 4o) have been specially trained as reasoning models. These are models in which the basic LLM architecture that predicts next tokens (like the human conceptual system) is deployed to think as well as speak. Thinking is just a series of words, just like speaking, but this series is not (entirely) shown to the user. Just like your own thoughts, these thinking tokens are spent trying to figure out what to say. The speaking then happens after the thinking. These thinking models have been trained on how well they reason (either based on raw results, on the quality and correctness of the reasoning chain, or on some combination). The capabilities of this architecture, at its limits, are staggering. But it's not, yet, cheap.
Google. Latest model: Gemini 1.5, with Gemini 2.0 Flash available in experimental mode. (2.0 Flash seems great, but my own use of it -- see below -- has shown it not to be reliable enough to use routinely. Very much looking forward to an improved release.) Google's models still lag the competition. But Google has done some very interesting things with them. Many or most of you have probably heard about NotebookLM (https://notebooklm.google.com -- through which you can generate, among other things, on-demand, life-like podcasts based on documents you upload. It was one of the mind-blowing developments last year. In December, Google announced updates (https://blog.google/technology/google-labs/notebooklm-new-features-december-2024/) that would allow you to join this synthesized conversation in some way. If you haven't used it, try it out. It's free and one of those oh my god moments.
One more thing on Google: They announced Deep Research late last year. I haven't used it yet. Read about it here: https://www.tomsguide.com/ai/google-gemini/i-just-saw-the-future-of-the-web-googles-new-gemini-deep-research-ai-agent-is-incredible. You can compare it with OpenAI Deep Research that just dropped today (but only to the $200/month (that's two hundred dollars) Pro subscribers, with regular subscribers "next"): https://openai.com/index/introducing-deep-research/.
These are the sorts of tools that will combine exhaustive research with whatever reasoning and writing capabilities are available in the best models. We should start using them asap. Because of the price of the best agents, it's easy to foresee wealth even further compounding and better-resourced organizations moving yet further ahead.
Meta: Llama 3 is still their best model. Still open source. They're promising multiple Llama 4 releases this year. See generally: https://ai.meta.com/blog/future-of-ai-built-with-llama/.
DeepSeek: Chinese startup DeepSeek has been all over the news. They released an excellent chat model (like 4o or Claude) and a very competitive reasoning model (like o1 or o3-mini) at very, very low prices. It's unknown to what degree the products owe their capabilities and low cost to ingenious training architecture, skirting export bans, and distilling the outputs of other models. And so it's hard to say whether these models are harbingers of an incipient commodification of intelligence (which would lead to the proliferation of apps). DeepSeek models have been either down or unreliable since last week from some combination of unanticipated demand and cyberattacks. But they worked very well when I tested them a couple of weeks ago.
Finally, one way I've tried to understand better what current technology can do is by building an app myself. The tech helped immensely with coding, and the resulting app leverages AI services to do what it does. It was a deep dive. Indeed, the process of doing it from start to finish and seeing what was possible was the goal. But the app itself has proven to be, at least for my use, pretty damn cool. Upshot: it's an iPhone app (not out yet) that will take a PDF or text and allow you to generate audio narrations (think of clean audiobook-like narrations of ssrn articles on demand), seminars, drafts, oral arguments, debates, critical responses, and whatever else you can dream up based on the input. I just published a blog post about it here: https://www.hydratext.com/blog/2025/2/1/entalkenator-a-tool-for-thinking.
Relevant to us as a faculty is this experience: I have a template in the app that creates first drafts. You give it anything, and it creates a draft of about 3,000 to 5,000 or so words. I googled "circuit split," found the very first one that came up, copied the text from the decision in the case I found first, and prepended this text to the file: "This case out of the tenth circuit creates a circuit split with the fifth circuit. The Supreme Court should take the case and [affirm]." The app generated the attached (which I copied and made into a PDF without alteration). Yes, it's not great, but I didn't cherry-pick. This was just based on the very first output of a simple template calling for an outline with three parts and three subparts in each part and then, step by step, to draft each part. The template is not law-specific. It's not using the best model available - just Sonnet. I could have made it nine parts, called for certain kinds of reasoning or attempts at citation. I could have made it arbitrarily longer or shorter, could have split up subsections into more or fewer parts, could have had it output in Icelandic. And this writing and analysis is as bad as AI will ever be. I made an oral argument too. (My critical response template generated some scathing critiques of my articles, and so I'm all good on that front for awhile.) And over the past couple of months, I've generated workshops based on our workshop papers in advance of our own, actual workshops. And for all this, my guess is that the app is obsolete within months.
Ok - to make a long email one paragraph longer, I admit to being overwhelmed by all these developments, even setting aside events in the broader world. Here's where I come down: It is imperative that we continue to think about what lawyering and the rule of law ought to look like in a world of ubiquitous artificial intelligence. What exactly will our students be doing in five years? What can and should we do to have some influence over a question like that? Will the more specific professional skills we teach be commercially valuable in five years? (I think there is a reasonable — meaning significantly nonzero — chance they will not be.) As I've told many of you before, whatever the answers to these questions, I think it is essential that law as a communal concept be part of the humanities and that we continue to insist education in the humanities is at the core of preserving law's value. Education that leads us to understand ourselves and others more deeply is essential to the rule of law, and that basic need won't disappear when the models out-think us.
AI Update 3/2025:
Hi all - A few updates: some practical news worth sharing and an opportunity. I had intended and briefly scheduled a lunch time session for demonstration of some of these items. Some unexpected events cause me to cancel this and not reschedule new ones. But if any of you would be interested in Q&A, demos, conversations in April, please let me know. I’d be happy to present, to facilitate, to demo, or just to listen.
To start off, here's a blog post by an Google Deep Mind researcher and former LLM-skeptic that reflects in longer form and much better detail what I’ve relayed to you all in prior updates: https://nicholas.carlini.com/writing/2025/thoughts-on-future-ai.html. The opening lines are a good summary: "I have very wide error bars on the potential future of large language models, and I think you should too. Specifically, I wouldn't be surprised if, in three to five years, language models are capable of performing most (all?) cognitive economically-useful tasks beyond the level of human experts. And I also wouldn't be surprised if, in five years, the best models we have are better than the ones we have today, but only in ‘normal' ways where costs continue to decrease considerably and capabilities continue to get better but there's no fundamental paradigm shift that upends the world order.”
As a faculty and as individual scholars and teachers, I think we need to reflect on the possibility that even in the less performant scenario, our industry will be heavily disrupted. I cannot overstate this. While there is still a possibility that the barriers to progress are so substantial that LLMs remain mere “tools,” I’ll bet otherwise. (For more on this, at least one of you mentioned a recent episode of the Ezra Klein podcast with a former Biden AI czar as alarming you concerning the potential near-term developments. Relatedly, it really does seem to be the case that there has not been a Manhattan-project-style effort underway to ensure superior AI capability at the national governmental level. That’s, to put it mildly, surprising and alarming to me. But for a recent effort to think about national AI strategies, see here: https://www.nationalsecurity.ai/. And, to extend the free association of this parenthetical to an even more annoying length, a recent conversation I had with Claude about the scenarios by which AI might escape its confines in code and datacenters opened my eyes to a surprisingly rich world of possibilities.)
Anthropic: Updated Claude Sonnet to 3.7. This latest update includes an optional reasoning mode. Remember that when you hear about a “reasoning model,” we’re still just talking about an LLM using the transformer architecture I described in my first report to you all. It’s fundamentally still outputting a sequence of tokens (words or parts of words) using the unfathomably large number of parameters to do so. The difference is that a reasoning model only shows some of these tokens to the user, much like we think in chains of words or other sensations but only speak or write some of them. So a reasoning model can “think” (output lots and lots of tokens in the background as it tries solutions to probes) before it speaks. The engineering to create these sorts of models focused on how to reward the model (nudging its many parameters) for “good” chains of thought and punish it for “bad” chains of thought.
In my testing, the new Sonnet produces longer outputs. For example, in my app when I use the "first draft" template (which asks the model to prepare an outline and then, iteratively, to produce a paper with three major parts, each with three subparts), the old Sonnet would generate about 10,000 or so words, and the new sonnet tends to produce near 20,000 words. I think the quality is far better for some applications but only marginally better for others.
OpenAI: The big news here is that Deep Research is now available in the $20/month subscription. I think you get 10 Deep Research queries per month at that tier. Deep Research is amazing, and it uses a specially fine-tuned version of an otherwise unreleased OpenAI model o3. You should definitely give it a spin. I have had occasion to engage heavily with all these models to evaluate a complex medical condition for a family member. I used deep research, put the output of that into Claude, asked o1 whether it concurred with Claude (or, more accurately, whether it agreed with “my medical students who generated the following report”), asked Claude to criticize o1’s theories, etc. The results were stunningly detailed, accurate, and more reliable than the actual medical providers involved.
OpenAI also released GPT 4.5 — its older style, non-reasoning model. It is expensive, slow, and not much better than 4o (the default model in ChatGPT). This is the model on which was based all sorts of reporting last year on leaks concerning disappointing results from further scaling up existing models. It does seem that just increasing model size without other architectural changes might have been hitting a wall.
My app, enTalkenator: I’ve just pushed an update that should appear on the iOS App Store later today or tomorrow. The current version (once it is approved and appears in the store) is free to use for almost all features: https://www.entalkenator.com/ (you pay the model providers as you use them)— and I’ve started a podcast that updates twice weekly. It comprises (a) faculty workshops based on Larry Solum’s download of the week and (b) brief, introductory classes on new papers in other fields. For example, the episode I just pushed is a brief class on the groundbreaking results reported yesterday about dark energy. It’s kind of amazing to put in a paper I don’t understand at all and get a class out that makes it understandable. And kind of hilarious to see the output that results from a template that creates a highly critical response paper. When there’s a new paper that interests me, I sometimes just dump it into enTalkenator and generate a workshop. As I indicated in my last email, seeing it generate a paper from a few notes is eye-opening. It doesn’t always do great. But it can generate relentlessly and gets better as the underlying models get better. Normal science legal writing and analysis is now automated.
New opportunity: As some of you know, Matt Hall and I are on a UGA seed-grant team focused on AI ethics. We have members in philosophy, CS, Terry, and SPIAA. We’re applying for a grant that uses methods I developed making enTalkenator to create an application that allows people in a conflict to engage in a mediated conversation — where the system acts as mediator and go-between. The innovation here is that unlike talking to ChatGPT about a conflict you’re having, this system would interact with you based not only on its training and what you type into the chat window, but also the context from the other participant. The system would thus be a kind of “empathy machine” helping to translate perspectives, potentially across cultures. We think this will unlock a lot of research avenues. Even restricting attention to law, we envision human trials comparing the app as mediator to human mediators, e.g. But there’s a lot more that could be done. If this sounds interesting to you or if you can envision applications of such a system to work we may not have thought of, I invite you to get in touch.
That’s enough for the moment! C
AI Update 4/25:
A lot is happening, and what I intended as quick addendum to my recent March update has grown long. (And I didn’t even discuss recent gee whiz demos like https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice##demo).
(1) I’ll be hosting an AI brownbag in Cheeley on Monday, April 14. On the agenda, in addition to open discussion, are demos of OpenAI’s Deep Research (not be confused with the lesser Google product having exactly the same name), o1-pro, whatever they drop between now and then, and my enTalkenator app. I hesitate to do any self-promotion here, but the app is free for everything you’d likely want to do with it - and just get in touch if you want to do more. Orin Kerr succinctly blogged about it here: https://blog.dividedargument.com/p/when-artificial-intelligence-workshops — I think showing what the app can do in terms of paper-writing, class and workshop generation, etc. is a good way to begin to see what’s coming to our field.
(2) Speaking of Deep Research, OpenAI has made 10 uses of their Deep Research tool available per month to the $20/month tier. If you have an OpenAI subscription, you should definitely try it. It was formerly only available to the (very) expensive subscription tier.
(3) And… speaking of the $20/month tier, OpenAI just announced yesterday that this subscription tier will be free to all US university students for April and May (just in time for finals and final papers). This will give students, for free, the aforementioned 10 deep research queries as well as more access to o3-mini and their recently launched new image generation model (built into ChatGPT). In addition to being much better than prior image generation tools, it permits the generation of images depicting identifiable people, including public figures.
(4) Relatedly, Anthropic (whose Claude 3.7 Sonnet is still my go-to) announced a Claude for Education plan that features site-wide licenses and new model products, including a socratic, guided learning mode. https://www.anthropic.com/education. I don’t know whether UGA’s participation in the OpenAI (https://news.uga.edu/uga-joins-nextgenai-consortium-to-accelerate-ai-innovation-education/ would have any bearing on exploring these kinds of opportunities.
(5) Google finally released a model that I think is competitive with Anthropic’s and OpenAI’s offerings. Gemini 2.5 Pro Experimental is very, very good. I’d put it neck and neck with Claude 3.7 Sonnet and above OpenAI’s models in my usage. And it beats those models in many benchmarks. It’s available to all Gemini users.
(6) One of the things I’d like to hear about from you — and to discuss at the April 14 brownbag — is what our norms (and perhaps ultimately rules) should be concerning the use of AI in our communications with each other and in our scholarly work. There are some seemingly easy answers here (always disclose usage) that strike me as likely to become problematic in the medium to long term as AI tools become embedded components in the tools we all use. See also the literature on the limits of disclosure as a regulatory tool. My own practice thus far has been not to use AI at all in my memos to colleagues or in these updates. And I haven’t used it in anything I’ve written on the scholarly front. (Though I did use my app to generate papers from notes for papers that I’d decided not to pursue.) But I cannot say that either of these practices is wise, helpful, or reasonable. Anyway, this is an increasingly pressing issue, and I think we should begin talking about it this semester.
(7) One more thing — on the research front, I found two recent papers from Anthropic researchers (linked and explained in this post: https://www.anthropic.com/news/tracing-thoughts-language-model to be somewhat mind-blowing and should be mandatory to consider if you’re inclination is to think of LLMs as “stochastic parrots” or “just regurgitating training data." If you subscribe to the enTalkenator podcast feed (linked in the Kerr post), you’ll find intro classes on both papers and an academic workshop on one of them.
That’s it for now.
Christian
AI Update 4/25, part 2
Hi all - Here’s a second AI update for the month of April. Feels like enough has happened to warrant an update.
Some updates:
(1) Our AI brownbag was very helpful for me and I hope also for those who attended. Those attendees who have been using these tools a lot had what I consider to be fairly existential questions about lawyering that were prompted by what they’ve experienced. We also learned a little about the extent to which at least some clinics have been affected by AI use by both students and by clients. (This is as good a place as any to suggest to those of you who haven’t played around with LLMs much that using one of these models is as easy as going to Claude.ai, gemini.google.com, or ChatGPT.com in a browser and just typing into a chat box. You don’t have to type in any particular format or sound smart or think too much. You can be unafraid of sounding like a dum-dum. Just start typing about something you want to figure out. And the models and you will likely jointly steer your clickety-clack toward ideas and answers.)
In any event, AI now seems to be a ubiquitous part of at least some practice areas on both the lawyer and client sides. Some seemingly easy responses in both practice and law school contexts, such as use restrictions or disclosure mandates, do not some feasible in the medium to longer term. There was consensus that we need to have some very serious conversations as a faculty about fundamental issues in the coming months.
Further to this conversation, some recent writing has helped me think through my own uncertainty (falling to the hype side of the stochastic-parrot skeptics and to the skeptical side of the singularity-is-upon-us crowds). First is Ethan Mollick’s post, which I highly recommend, on the “jagged” intelligence of the latest crop of models — the way they far exceed human intelligence in some tasks and fall far below it in others: https://www.oneusefulthing.org/p/on-jagged-agi-o3-gemini-25-and-everything. Second, is the set of Anthropic papers I linked in the last update: https://www.anthropic.com/news/tracing-thoughts-language-model (uses models to interpret the circuitry of concepts their models use to arrive at next tokens), and third is a skeptical take: AI as Normal Technology, https://knightcolumbia.org/content/ai-as-normal-technology — arguing that we should conceive as AI as a normal technological development rather than as an alien super-intelligence and that its adoption and resultant changes will take time to percolate through the economy — and that this sociology and economy of change has implications for regulatory policy. Be aware, though, that when the authors say “normal technology,” they’re conceiving of AI as an innovation comparable to electricity or the internet — so a huge deal but not qualitatively distinct from other tool developments
I used my app to create a roundtable conversation about these three articles (and a summary up front for the listener) and put it into the enTalkenator podcast feed. You can listen here: https://www.entalkenator.com/podcast/a-roundtable-on-three-ai-articles-normal-tech-biological-or-jagged-super-intelligence — or, better yet, search for enTalkenator in your podcast app of choice and listen there, where you can listen at 1.5x or 2x.
A fourth article along these lines is a popular piece that appeared in the Guardian on April 19: https://www.theguardian.com/technology/2025/apr/19/dont-ask-what-ai-can-do-for-us-ask-what-it-is-doing-to-us-are-chatgpt-and-co-harming-human-intelligence — on whether using AI will harm our intellectual faculties. I found it resonant with discussions I’ve had here and around the university about creating the conditions for humane intellectual struggle that leads to learning and about the value of what we do generally in universities. (The value in teaching law certainly includes the intellectual growth that comes with any learning about ideas but is also normative, in the sense that it perpetuates collective human authorship of the conditions of our collective life. But that’s a longer story…) This article is more suggestive than strictly illuminating, but I bring it to your attention as gathering a set of concerns that we should have as educators.
(2) OpenAI recently released their next generation models: o3 (their most advanced reasoning model) and o4-mini (the smaller version of what will be their next-generation model). 4o, the default model you usually use when just typing into ChatGPT has been updated fairly recently, including a dramatic improvement to and liberalization of the usage of its image generation capabilities. It’s not just you, OpenAI’s product naming has become the butt of jokes, even in their own product launch videos.
With a standard $20/month OpenAI ChatGPT Plus account, you get enough access to 4o that you should not have to think about it, 100 messages a week with o3, 300 messages a day with o4-mini, and 100 messages a day with o4-mini-high. As mentioned in my last update, you also get 10 uses per month of Deep Research (you click or tap the little telescope in the chat field to invoke Deep Research).
Note also that you can click the Search icon in that same field to prompt the model to search the web in providing an answer. The use of tools, though, like web search and coding, is increasingly built-in to these models, and you’ll notice more citations, more tables, and other capabilities.
(3) No substantial updates tech-wise on Anthropic. But this report: https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude — on how college students in different fields have used Claude is interesting. Oh, and by the way, if you use Claude 3.7 Sonnet, which seems still to be my go-to for at least the moment, clicking on the + in the chat box brings up options, including the ability to turn on “extended thinking.” This activates Claude’s reasoning mode, akin to o3, o4-mini, and Gemini 2.5 Pro. Keeping it off makes the model a more traditional LLM like GPT 4o.
(4) Google’s Gemini 2.5 Pro continues to impress me, and it might just be the best available model. Google has now released an experimental version of Gemini 2.5 Flash, which as its name suggests, is a faster less-capable-but-often-good-enough version of 2.5 Pro. Notably, Google’s own Deep Research product now uses 2.5 Pro and is much more capable.
You’ll recall from earlier updates — and from your own reading probably — that models have a limited “context window” of tokens of which they keep track in a conversation. It’s a little like short-term memory, where if your chat (including files you upload) exceeds the size of the context window, the earlier parts of the conversation roll off, forgotten. This was a big problem with the first generation of models. The original ChatGPT could only keep about 3,000 words in mind at a time. So your conversations with it would become frustrating when not-so-recent parts of your conversation were forgotten. GPT 4 increased its context window to 32,000 tokens, and Google’s Gemini introduced very long context windows of a million or more tokens.
Maybe more important, though, than the size of the context window is how good the model is at being able to work with all that information. One measure of that is the “needle in a haystack” performance — the model’s ability to pull out a single bit of data from its context when asked. But even more important than that is the model’s ability to reason over a very large context - not just pluck out a stray word. Gemini 2.5 has a million token context window (about 750,000 words), is potentially capable of at least 2 million tokens, and reasons very well over more than 100,000 words (perhaps over the whole 750,000). OpenAI’s o3 and o4-mini (200,000 token context windows or about 150,000 words) show similar strengths.
My point here is that the effectiveness of LLMs’ abilities to reason over long texts (many cases, several articles) has exploded since my first message to you about LLMs back in 2022. This unlocks qualitative changes in use cases. Whereas before you could maybe talk about a portion of an article or a single case or maybe two, now it might be possible to have intelligent conversation and analysis of more than a dozen long articles at a time.
(5) Over time, I expect the model offerings from the major vendors to become much simpler from the perspective of the user — with a single chat-like interface, whether text, audio, or video, that will invoke the model, tools, and capabilities appropriate for the task, without the user’s having to think about or select anything. Just a chat box, and the model takes care of the rest.
I also expect the debates about AI to continue toward emotional polarization. This is unfortunate, I think, and yet another ill effect of broader cultural splintering. There are serious issues concerning environmental impact, workforce displacement, effects on expertise and general intelligence, the leveraging of bad actors (see, e.g., the recent reports on o3’s facility with virology: https://www.ai-frontiers.org/articles/ais-are-disseminating-expert-level-virology-skills), the contingent role of distinctively human and culturally-specific moral and legal reasoning, and, more speculatively, the potential runaway scenarios that international non-cooperation might make possible. There is a better way than uncritical buy-in or using “AI slop” as an all-purpose epithet. More on that at another time.
Ok, enough for now. Always happy to chat with any of you individually or collectively! Christian
p.s. Count me among those who do not believe disclosure can be the stable cornerstone of AI ethics in the longterm. But for now, because some of you have raised the issue, I will say that I have not used and do not plan to use AI in any of my actual academic writing or in any of these updates or other messages to any of you. (Thinking and feedback, yes, writing or even paraphrasing, no.) I do not suggest that that should be our norm or should otherwise be an ethical principle to which others should adhere. I’m uncertain, to put it mildly, about that. But for now at least, if you get something from me, it’s from me.