Files
gitea/modules/markup/jupyter/jupyter_test.go
Karthik Bhandary e82352f156 feat(web): Add Jupyter Notebook (.ipynb) Rendering Support (#37433)
### Summary

Closes #37308

Adds native rendering support for Jupyter notebook files (`.ipynb`) in
Gitea using backend rendering, allowing users to view formatted
notebooks with code cells, markdown, outputs, and visualizations
directly in the repository browser.

### Motivation

Jupyter notebooks are widely used in data science, machine learning, and
scientific computing. Currently, Gitea displays `.ipynb` files as raw
JSON, making them difficult to read. This feature enables users to view
notebooks in a formatted, readable way similar to GitHub and GitLab.

### Implementation Approach

**Evolution:** Initially implemented frontend rendering using `marked`
and `Shiki` libraries. After review feedback, migrated to backend
rendering for better performance, security, and consistency with Gitea
architecture.

#### Backend Rendering Advantages

- Server-side HTML generation eliminates client-side parsing overhead
- Integrates with Gitea existing markup sanitizer for security
- Uses Chroma for syntax highlighting (consistent with code files)
- Uses Goldmark for markdown rendering (consistent with `.md` files)
- No additional frontend dependencies required
- Better performance for large notebooks

### Features

#### Supported Cell Types

- **Markdown cells:** Rendered with Goldmark (tables, lists, links, code
blocks, etc.)
- **Code cells:** Syntax-highlighted with Chroma, execution counts,
language detection from notebook metadata
- **Output cells:** Multiple output types in a single cell

#### Supported Output Types

-  Text/plain outputs
-  Images (PNG, JPEG, SVG) with base64 data URIs
-  HTML outputs (tables, DataFrames, formatted text)
-  LaTeX/math equations (rendered as code blocks)
-  Error outputs with traceback (styled in red)
-  Stream outputs (`stdout`/`stderr`)
- ⚠️ Interactive widgets (Plotly, ipywidgets) show informative messages
- ⚠️ JavaScript outputs show security warning (disabled for safety)

#### Edge Cases Handled

- Empty notebooks or notebooks with no outputs
- Corrupted JSON with graceful error display
- Mixed output types in single cell
- Large base64-encoded images
- Execution count of `null` or `0`
- `nbformat` version compatibility (only renders `nbformat 4+`, shows
message for older versions)

### Changes

#### Backend (Go)

- `modules/markup/jupyter/jupyter.go` (**NEW**)

  - Jupyter notebook renderer implementation
  - Parses `.ipynb` JSON structure and generates HTML
  - Integrates Chroma for code syntax highlighting
  - Integrates Goldmark for markdown cell rendering
  - Dynamic language detection from notebook metadata
  - Handles all standard Jupyter output types
  - Comprehensive error handling with user-friendly messages

- `modules/markup/renderer.go` (**MODIFIED**)

  - Registered Jupyter renderer in markup system

- `main.go` (**MODIFIED**)

  - Import Jupyter renderer package for initialization

#### Styling (CSS)

- `web_src/css/markup/jupyter.css` (**NEW**)

  - Comprehensive styling for notebook cells, code, outputs
  - Uses Gitea CSS variables for consistent theming
  - Responsive layout with proper spacing
  - Table styling for DataFrame outputs
- Removed parent container padding for consistency with other renderers

#### Sanitizer Rules

- `modules/markup/jupyter/jupyter.go` → `SanitizerRules()`

  - Configured HTML sanitization rules for safe rendering:
    - Cell structure (markdown, code, input/output wrappers)
    - Code highlighting (Chroma classes)
    - Images (base64 data URIs only)
    - Tables (DataFrames)
    - Markdown elements (headers, lists, links, etc.)

### Security Considerations

- Server-side rendering: No client-side JavaScript execution
- HTML sanitization: Strict allowlist for HTML elements and attributes
- Image security: Only base64 data URIs allowed (no external URLs)
- JavaScript disabled: `application/javascript` outputs show warning
- XSS protection: Gitea markup sanitizer handles all HTML output

### Testing

Manual testing performed with various notebooks:

- Markdown rendering (headers, lists, tables, links, code blocks)
- Code cells with execution counts and syntax highlighting
- Multiple output types (text, images, HTML, LaTeX, errors, streams)
- Error handling for edge cases
- Theme compatibility (light/dark mode)

### Screenshots

<img width="1080" height="553" alt="image"
src="https://github.com/user-attachments/assets/aef9afa7-ed96-434d-98b0-b160565fc967"
/>
<img width="1092" height="552" alt="image"
src="https://github.com/user-attachments/assets/6e61e792-4737-41c1-851e-5c375c1f932a"
/>
<img width="1104" height="622" alt="image"
src="https://github.com/user-attachments/assets/4ac630c1-3a75-4e1c-9bba-c0a27484d001"
/>
<img width="1104" height="529" alt="image"
src="https://github.com/user-attachments/assets/33750c47-70de-4ab2-893d-e5d09fa8d9c4"
/>
<img width="1111" height="343" alt="image"
src="https://github.com/user-attachments/assets/52107d9f-0e06-420b-9ab4-1603dcd676b1"
/>
<img width="1091" height="650" alt="image"
src="https://github.com/user-attachments/assets/0addae21-efa4-44bb-a56e-0418e3d4d227"
/>
<img width="1077" height="298" alt="image"
src="https://github.com/user-attachments/assets/a3a8c5be-638c-45ff-82f3-816264254ead"
/>

### Dependencies

No new dependencies required:

- Chroma (existing) - Syntax highlighting
- Goldmark (existing) - Markdown rendering
- Standard library - JSON parsing

### Key Design Decisions

- Backend rendering for performance and security
- Reuses existing Gitea infrastructure (Chroma, Goldmark, sanitizer)
- Consistent styling with other markup renderers
- Graceful degradation for unsupported features

---

**Development Note:** This PR was developed with assistance from Amazon
Q Developer and Claude AI for implementation, debugging, and testing.

---------

Signed-off-by: Karthik Bhandary <34509856+karthikbhandary2@users.noreply.github.com>
Co-authored-by: karthik.bhandary <karthik.bhandary@kfintech.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: bircni <bircni@icloud.com>
2026-06-14 15:52:37 +02:00

315 lines
8.0 KiB
Go

// Copyright 2026 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT
package jupyter
import (
"fmt"
"strings"
"testing"
"gitea.dev/modules/markup"
"gitea.dev/modules/test"
"github.com/stretchr/testify/assert"
)
func TestRender(t *testing.T) {
r := renderer{}
t.Run("Basic notebook", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["print('hello')"],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": ["hello\n"]
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := &markup.RenderContext{}
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.Contains(t, result, `<div class="jupyter-notebook">`)
assert.Contains(t, result, `<div class="notebook-cell cell-type-code">`)
assert.Contains(t, result, `In [1]:`)
assert.Contains(t, result, `print`)
assert.Contains(t, result, `hello`)
assert.Contains(t, result, `stream-stdout`)
})
t.Run("Markdown cell with XSS Protection", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Title\n",
"Some text\n",
"[click me](javascript:alert(1))\n",
"<script>alert('dangerous')</script>"
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
// Assert normal markup still renders correctly
assert.Contains(t, result, `<div class="notebook-cell cell-type-markdown">`)
assert.Contains(t, result, `Title`)
assert.Contains(t, result, `Some text`)
assert.Contains(t, result, `click me`)
// CRITICAL SECURITY ASSERTIONS: Ensure XSS vectors are completely stripped
assert.NotContains(t, result, `javascript:alert`)
assert.NotContains(t, result, `<script>`)
})
t.Run("Cell limit truncation guardrail", func(t *testing.T) {
// Generate an oversized notebook containing 105 cells dynamically
var cellBlocks []string
for range 105 {
cellBlocks = append(cellBlocks, `{"cell_type": "markdown", "source": ["cell text"]}`)
}
input := fmt.Sprintf(`{"cells": [%s], "metadata": {}, "nbformat": 4}`, strings.Join(cellBlocks, ","))
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
// Verify it halts rendering gracefully and shows the truncation warning
assert.Contains(t, result, "Output truncated.")
assert.Contains(t, result, "This notebook contains too many cells to display efficiently.")
// Count occurrences of the rendered cells to ensure it sliced down to exactly 100 elements
assert.Equal(t, 100, strings.Count(result, `class="notebook-cell cell-type-markdown"`))
})
t.Run("Image output", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["import matplotlib.pyplot as plt"],
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
}
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.Contains(t, result, `<img src="data:image/png;base64,`)
assert.Contains(t, result, `iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==`)
})
t.Run("HTML output with style tag", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["import pandas as pd"],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": ["<style scoped>.dataframe tbody tr th { vertical-align: top; }</style><table class=\"dataframe\"><tr><td>1</td></tr></table>"]
}
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.NotContains(t, result, `<style scoped>`)
assert.Contains(t, result, `<table><tr><td>1</td></tr></table>`)
assert.Contains(t, result, `<td>1</td>`)
})
t.Run("Error output", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["raise ValueError('test error')"],
"outputs": [
{
"output_type": "error",
"ename": "ValueError",
"evalue": "test error",
"traceback": ["ValueError: test error"]
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.Contains(t, result, `ValueError: test error`)
assert.Contains(t, result, `cell-output-error`)
})
t.Run("Old nbformat version", func(t *testing.T) {
input := `{
"cells": [],
"metadata": {},
"nbformat": 3
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
assert.Regexp(t, `<div class="ui info message">This notebook uses an older format.*</div>`, output.String())
})
}
func TestJoinSource(t *testing.T) {
tests := []struct {
name string
input any
expected string
}{
{
name: "String input",
input: "hello world",
expected: "hello world",
},
{
name: "Array input",
input: []any{"line1\n", "line2\n", "line3"},
expected: "line1\nline2\nline3",
},
{
name: "Empty array",
input: []any{},
expected: "",
},
{
name: "Single element array",
input: []any{"single"},
expected: "single",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := joinSource(tt.input)
assert.Equal(t, tt.expected, result)
})
}
}
func TestIntegrationAndSanitization(t *testing.T) {
// A mock malicious Jupyter notebook containing an XSS injection attempt
// inside a text/html output cell (e.g., pretending to be a poisoned Pandas DataFrame).
maliciousNotebook := `{
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {},
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"source": ["a=1"],
"outputs": [
{
"output_type": "execute_result",
"execution_count": 1,
"data": {
"text/html": [
"<div><script>alert('XSS Vector')</script><table class=\"dataframe\"><tr><td>Safe Content</td></tr></table></div>"
]
},
"metadata": {}
}
]
}
]
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
ctx.RenderOptions.MarkupType = "jupyter-render"
err := markup.Render(ctx, strings.NewReader(maliciousNotebook), &output)
assert.NoError(t, err)
const expected = `
<div class="jupyter-notebook">
<div class="notebook-cell cell-type-code">
<div class="cell-line">
<div class="cell-left cell-prompt">In [1]:</div>
<div class="cell-right cell-input">
<pre><code class="chroma language-python">
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>
</code></pre>
</div>
</div>
<div class="cell-line">
<div class="cell-left cell-prompt">Out [1]:</div>
<div class="cell-right cell-output">
<div class="cell-output-html">
<div><table><tbody><tr><td>Safe Content</td></tr></tbody></table></div>
</div>
</div>
</div>
</div>
</div>`
assert.Equal(t, test.NormalizeHTMLSpaces(expected), test.NormalizeHTMLSpaces(output.String()))
}