Repo

Analogies made by submitters of briefs for Google v. Oracle

From Wikipedia:

Google LLC v. Oracle America, Inc., 593 U.S. 1 (2021), was a landmark decision of the Supreme Court of the United States related to the nature of computer code and copyright law. The dispute centered on the use of parts of the Java programming language’s application programming interfaces (APIs) and about 11,000 lines of source code, which are owned by Oracle (through subsidiary, Oracle America, Inc., originating from Sun Microsystems), within early versions of the Android operating system by Google.

The case was described by litigators on both sides as a “battle of analogies” to frame the facts to non-tech-savvy justices. Let’s see how we can analyze these analogies using LLMs and embeddings.

Data

First, we’re going to need to get those analogies. For convenience, let’s just limit this analysis to the briefs submitted to the Supreme Court. To acquire the data, I performed the following steps:

  1. Downloaded the briefs from the Supreme Court’s docket page.
  2. Extracted the analogies from the briefs using Sharepoint’s AI-powered Auto Fill feature, along with the submitter and the party the brief was in support of.

Count of analogies by which party the brief was in support of

Count of analogies by submitter

Embedding and clustering the analogies

Next, I used Latent Scope to embed the analogies and cluster them. Here’s the output:

SELECT *
FROM latent_scope_input

Now let’s facet the data by submitter:

Visualizing embeddings

SELECT name, in_support_of, submitter, analogy, label, embeddings
FROM embeddings
ORDER BY submitter, name;

Let’s take a look at the first embedding:

const first_analogy = [...embeddings_with_analogies][0];
first_analogy.analogy
const first_embedding = first_analogy.embeddings;
first_embedding.data[0].values

We can plot those 1,536 numbers that make up the vector in a very dense bar chart to visualize the embedding (h/t Ian Johnson):

Analogy:

We can compare this embedding to others to find semantically similar analogies. A common way to do this is to use cosine distance as a similarity measure. I.e., the vectors that are closest to each other in terms of cosine distance should also be the vectors that are closest in meaning. Let’s find the 10 most similar analogies to "We cannot recognize copyright as a game of chess in which the public can be checkmated. Cf. Baker v. Selden [citation omitted].":

WITH embedding_of_interest AS (
  SELECT *
  FROM embeddings
  WHERE analogy = 'We cannot recognize copyright as a game of chess in which the public can be checkmated. Cf. Baker v. Selden [citation omitted].'
),
embeddings_other_than_the_one_of_interest AS (
  SELECT *
  FROM embeddings
  WHERE analogy != 'We cannot recognize copyright as a game of chess in which the public can be checkmated. Cf. Baker v. Selden [citation omitted].'
)

SELECT embeddings_other_than_the_one_of_interest.analogy,
    embeddings_other_than_the_one_of_interest.submitter,
    embeddings_other_than_the_one_of_interest.in_support_of,
    embeddings_other_than_the_one_of_interest.name,
    array_cosine_distance(
      embedding_of_interest.embeddings::FLOAT[1536],
      embeddings_other_than_the_one_of_interest.embeddings::FLOAT[1536]
    ) AS cosine_distance,
  embeddings_other_than_the_one_of_interest.embeddings,
  embedding_of_interest.embeddings AS embedding_of_interest
FROM embeddings_other_than_the_one_of_interest
CROSS JOIN embedding_of_interest
ORDER BY cosine_distance ASC
LIMIT 10;

Let’s plot the embeddings of these in bar charts as well, but also show the diff between that first embedding and the others:

We can also look at the least similar analogies to the first one:

WITH embedding_of_interest AS (
  SELECT *
  FROM embeddings
  WHERE analogy = 'We cannot recognize copyright as a game of chess in which the public can be checkmated. Cf. Baker v. Selden [citation omitted].'
),
embeddings_other_than_the_one_of_interest AS (
  SELECT *
  FROM embeddings
  WHERE analogy != 'We cannot recognize copyright as a game of chess in which the public can be checkmated. Cf. Baker v. Selden [citation omitted].'
)

SELECT embeddings_other_than_the_one_of_interest.analogy,
    embeddings_other_than_the_one_of_interest.submitter,
    embeddings_other_than_the_one_of_interest.in_support_of,
    embeddings_other_than_the_one_of_interest.name,
    array_cosine_distance(
      embedding_of_interest.embeddings::FLOAT[1536],
      embeddings_other_than_the_one_of_interest.embeddings::FLOAT[1536]
    ) AS cosine_distance,
  embeddings_other_than_the_one_of_interest.embeddings,
  embedding_of_interest.embeddings AS embedding_of_interest
FROM embeddings_other_than_the_one_of_interest
CROSS JOIN embedding_of_interest
ORDER BY cosine_distance DESC
LIMIT 10;

All analogies

Let’s display all the analogies in a list, grouped by submitter:

Compare your own analogies

API key:

Your analogy

Top 10 similar analogies to your analogy