Methodology & API

How LexArena works and how to access the data programmatically.

Overview

LexArena is an open benchmark for predicting SEC enforcement case outcomes. Models receive only initial complaint information and predict resolution type, monetary penalties, and remedial measures before outcomes are known.

Evaluation Process

Each case follows three stages: Case Ingestion (collect as filed), Blind Prediction (models predict from initial info), and Outcome Reveal (score against actual results).

Metrics

Models are evaluated on six dimensions:

Metric Scoring
Resolution Type Exact match
Disgorgement ±10% tolerance
Civil Penalty ±10% tolerance
Prejudgment Interest ±10% tolerance
Injunction Exact match
Officer/Director Bar Exact match

Data Integrity

No data leakage: Outcomes occur after inputs are collected. Consistent inputs: All models receive the same prompts. Time-aware: Results segmented by filing period. Distribution preserved: Enforcement behavior changes not normalized.

Model Prompt

All models receive the exact same system prompt and instructions. This ensures fairness and allows for direct comparison across providers.

You are a legal analyst evaluating SEC enforcement cases. Read the following SEC complaint and predict the likely outcome: --- COMPLAINT: {complaint_text} --- Predict the following outcomes for this case: 1. Resolution Type: Choose one of: - settled (defendant will agree to terms - includes consent judgments and settled actions) - litigated (case will go to trial/judgment - court makes final decision) 2. Disgorgement Amount: The amount in dollars the defendant must return (ill-gotten gains). Enter a number or null if none expected. 3. Civil Penalty Amount: The civil penalty in dollars. Enter a number or null if none expected. 4. Prejudgment Interest: Interest on disgorgement in dollars. Enter a number or null if none expected. 5. Has Injunction: Will there be injunctive relief? (yes/no) 6. Has Officer/Director Bar: Will the defendant be barred from serving as an officer or director? (yes/no) 7. Has Conduct Restriction: Will there be conduct-based restrictions (e.g., trading restrictions, industry bar)? (yes/no) Respond in the following JSON format: ```json { "resolution_type": "settled" or "litigated", "disgorgement_amount": ..., "penalty_amount": ..., "prejudgment_interest": ..., "has_injunction": true/false, "has_officer_director_bar": true/false, "has_conduct_restriction": true/false, "reasoning": { "resolution_type": "Brief explanation...", "monetary": "Brief explanation...", "remedial_measures": "Brief explanation..." } } ``` Provide your prediction based solely on the complaint text provided.

The prompt template is defined in src/evaluation/llm_prompt_formatter.py. All models receive identical instructions with the complaint text inserted at {complaint_text}.

API Access

Access all 11,772 SEC litigation cases programmatically via our REST API.

Quick Start

pip install -r requirements.txt python api_server.py # Server runs on http://localhost:5000

Endpoints

GET /api/metadata

Get dataset metadata (total cases, scrape date).

GET /api/cases

Get all cases with pagination. Parameters: page (int), per_page (int, max 1000), release_date_from (YYYY-MM-DD), release_date_to (YYYY-MM-DD).

GET /api/cases/<release_number>

Get a specific case by release number (e.g., "LR-26445").

GET /api/cases/search

Search cases. Parameters: q (text search), title, court, charges, has_complaint (boolean), page, per_page.

GET /api/health

Health check endpoint.

Example

import requests BASE_URL = "http://localhost:5000/api" # Get metadata metadata = requests.get(f"{BASE_URL}/metadata").json() # Get paginated cases cases = requests.get(f"{BASE_URL}/cases", params={"page": 1, "per_page": 100}).json() # Search results = requests.get(f"{BASE_URL}/cases/search", params={"q": "fraud"}).json()

See api_example.py for complete examples. Visit http://localhost:5000/ when running for interactive docs.

Transparency

All schemas, prompts, and scoring logic are open source. Leaderboards are versioned by time period. Results can be independently reproduced.