export const metadata = {
  title: "Ingestion Methods",
  description:
    "Four ways to submit content to FaithTranscripts: paste, YouTube URL, remote file URL, and presigned upload.",
  alternates: { canonical: "/docs/api/ingestion" },
};

# Ingestion Methods

Every transcript starts with `POST /transcripts`. The request body is a
discriminated union on `source_type`, with four valid branches.

<Callout variant="info">
  **Text only in v1.** FaithTranscripts does not transcribe audio or video. If
  you point us at a YouTube video without captions or a URL to an audio file,
  we'll return `422 unprocessable` with a clear message.
</Callout>

## 1. Paste

Send raw text directly in the request body. The deterministic pass runs
inline and the response contains the corrected transcript.

<EndpointBadge method="POST" path="/api/v1/transcripts" />

```json
{
  "source_type": "paste",
  "title": "Sunday sermon 2026-04-12",
  "content": "...plaintext or SRT/VTT/markdown...",
  "source_format": "txt"
}
```

| Field           | Required | Notes                                                   |
| --------------- | -------- | ------------------------------------------------------- |
| `content`       | Yes      | UTF-8 text. **Hard cap: 2 MB.**                         |
| `title`         | No       | Falls back to the first line if omitted.                |
| `source_format` | No       | One of `srt`, `vtt`, `txt`, `markdown`, `html`, `json`. |

## 2. YouTube

Submit a single video URL (or short `youtu.be` link). We'll fetch the
captions asynchronously.

```json
{
  "source_type": "youtube",
  "url": "https://youtube.com/watch?v=dQw4w9WgXcQ",
  "title": "Optional override"
}
```

If the video has no captions, the job will complete with
`transcript.failed` and the error body will explain. If the URL is a
channel or playlist, we reject it up front with `422 unprocessable`.

<Callout variant="warn">
  **Single videos only.** Channel and playlist URLs are not accepted in v1
  because the volume is hard to bound. Submit each video individually.
</Callout>

## 3. URL (remote file)

Pull content from any fetchable URL. Great for pre-signed S3/R2/GCS URLs.

```json
{
  "source_type": "url",
  "url": "https://storage.example.com/sermon.txt",
  "source_format": "txt",
  "title": "Optional"
}
```

| Field           | Required | Notes                                                           |
| --------------- | -------- | --------------------------------------------------------------- |
| `url`           | Yes      | Must resolve to a public host. Private IPs are rejected.        |
| `source_format` | Yes      | One of `srt`, `vtt`, `txt`, `markdown`, `html`, `json`, `docx`. |

**Hard cap: 5 MB.** Redirects are followed (max 3 hops, all must pass the
same SSRF guard).

## 4. Upload (presigned R2)

For files too large to paste or URL-fetch, upload directly to our storage
using a short-lived presigned URL. Two-step dance:

### Step 1: Get a presigned URL

<EndpointBadge method="POST" path="/api/v1/transcripts/uploads" />

```bash
curl https://www.faithtranscripts.com/api/v1/transcripts/uploads \
  -H "Authorization: Bearer ft_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "filename": "sermon.vtt", "source_format": "vtt" }'
```

Response:

```json
{
  "upload_url": "https://r2.example.com/...signed...",
  "upload_headers": { "Content-Type": "text/vtt" },
  "transcript_id": "tr_01HW..."
}
```

### Step 2: Upload the file

```bash
curl -X PUT "$UPLOAD_URL" \
  -H "Content-Type: text/vtt" \
  --data-binary @./sermon.vtt
```

### Step 3: Confirm

<EndpointBadge method="POST" path="/api/v1/transcripts" />

```json
{
  "source_type": "upload",
  "transcript_id": "tr_01HW..."
}
```

---

See also: [Processing options](/docs/api/options) for per-request
configuration, [Sync, poll, webhooks](/docs/api/async) to pick the right
interaction pattern.
