feat(web): Add Jupyter Notebook (.ipynb) Rendering Support (#37433)

### Summary

Closes #37308

Adds native rendering support for Jupyter notebook files (`.ipynb`) in
Gitea using backend rendering, allowing users to view formatted
notebooks with code cells, markdown, outputs, and visualizations
directly in the repository browser.

### Motivation

Jupyter notebooks are widely used in data science, machine learning, and
scientific computing. Currently, Gitea displays `.ipynb` files as raw
JSON, making them difficult to read. This feature enables users to view
notebooks in a formatted, readable way similar to GitHub and GitLab.

### Implementation Approach

**Evolution:** Initially implemented frontend rendering using `marked`
and `Shiki` libraries. After review feedback, migrated to backend
rendering for better performance, security, and consistency with Gitea
architecture.

#### Backend Rendering Advantages

- Server-side HTML generation eliminates client-side parsing overhead
- Integrates with Gitea existing markup sanitizer for security
- Uses Chroma for syntax highlighting (consistent with code files)
- Uses Goldmark for markdown rendering (consistent with `.md` files)
- No additional frontend dependencies required
- Better performance for large notebooks

### Features

#### Supported Cell Types

- **Markdown cells:** Rendered with Goldmark (tables, lists, links, code
blocks, etc.)
- **Code cells:** Syntax-highlighted with Chroma, execution counts,
language detection from notebook metadata
- **Output cells:** Multiple output types in a single cell

#### Supported Output Types

-  Text/plain outputs
-  Images (PNG, JPEG, SVG) with base64 data URIs
-  HTML outputs (tables, DataFrames, formatted text)
-  LaTeX/math equations (rendered as code blocks)
-  Error outputs with traceback (styled in red)
-  Stream outputs (`stdout`/`stderr`)
- ⚠️ Interactive widgets (Plotly, ipywidgets) show informative messages
- ⚠️ JavaScript outputs show security warning (disabled for safety)

#### Edge Cases Handled

- Empty notebooks or notebooks with no outputs
- Corrupted JSON with graceful error display
- Mixed output types in single cell
- Large base64-encoded images
- Execution count of `null` or `0`
- `nbformat` version compatibility (only renders `nbformat 4+`, shows
message for older versions)

### Changes

#### Backend (Go)

- `modules/markup/jupyter/jupyter.go` (**NEW**)

  - Jupyter notebook renderer implementation
  - Parses `.ipynb` JSON structure and generates HTML
  - Integrates Chroma for code syntax highlighting
  - Integrates Goldmark for markdown cell rendering
  - Dynamic language detection from notebook metadata
  - Handles all standard Jupyter output types
  - Comprehensive error handling with user-friendly messages

- `modules/markup/renderer.go` (**MODIFIED**)

  - Registered Jupyter renderer in markup system

- `main.go` (**MODIFIED**)

  - Import Jupyter renderer package for initialization

#### Styling (CSS)

- `web_src/css/markup/jupyter.css` (**NEW**)

  - Comprehensive styling for notebook cells, code, outputs
  - Uses Gitea CSS variables for consistent theming
  - Responsive layout with proper spacing
  - Table styling for DataFrame outputs
- Removed parent container padding for consistency with other renderers

#### Sanitizer Rules

- `modules/markup/jupyter/jupyter.go` → `SanitizerRules()`

  - Configured HTML sanitization rules for safe rendering:
    - Cell structure (markdown, code, input/output wrappers)
    - Code highlighting (Chroma classes)
    - Images (base64 data URIs only)
    - Tables (DataFrames)
    - Markdown elements (headers, lists, links, etc.)

### Security Considerations

- Server-side rendering: No client-side JavaScript execution
- HTML sanitization: Strict allowlist for HTML elements and attributes
- Image security: Only base64 data URIs allowed (no external URLs)
- JavaScript disabled: `application/javascript` outputs show warning
- XSS protection: Gitea markup sanitizer handles all HTML output

### Testing

Manual testing performed with various notebooks:

- Markdown rendering (headers, lists, tables, links, code blocks)
- Code cells with execution counts and syntax highlighting
- Multiple output types (text, images, HTML, LaTeX, errors, streams)
- Error handling for edge cases
- Theme compatibility (light/dark mode)

### Screenshots

<img width="1080" height="553" alt="image"
src="https://github.com/user-attachments/assets/aef9afa7-ed96-434d-98b0-b160565fc967"
/>
<img width="1092" height="552" alt="image"
src="https://github.com/user-attachments/assets/6e61e792-4737-41c1-851e-5c375c1f932a"
/>
<img width="1104" height="622" alt="image"
src="https://github.com/user-attachments/assets/4ac630c1-3a75-4e1c-9bba-c0a27484d001"
/>
<img width="1104" height="529" alt="image"
src="https://github.com/user-attachments/assets/33750c47-70de-4ab2-893d-e5d09fa8d9c4"
/>
<img width="1111" height="343" alt="image"
src="https://github.com/user-attachments/assets/52107d9f-0e06-420b-9ab4-1603dcd676b1"
/>
<img width="1091" height="650" alt="image"
src="https://github.com/user-attachments/assets/0addae21-efa4-44bb-a56e-0418e3d4d227"
/>
<img width="1077" height="298" alt="image"
src="https://github.com/user-attachments/assets/a3a8c5be-638c-45ff-82f3-816264254ead"
/>

### Dependencies

No new dependencies required:

- Chroma (existing) - Syntax highlighting
- Goldmark (existing) - Markdown rendering
- Standard library - JSON parsing

### Key Design Decisions

- Backend rendering for performance and security
- Reuses existing Gitea infrastructure (Chroma, Goldmark, sanitizer)
- Consistent styling with other markup renderers
- Graceful degradation for unsupported features

---

**Development Note:** This PR was developed with assistance from Amazon
Q Developer and Claude AI for implementation, debugging, and testing.

---------

Signed-off-by: Karthik Bhandary <34509856+karthikbhandary2@users.noreply.github.com>
Co-authored-by: karthik.bhandary <karthik.bhandary@kfintech.com>
Co-authored-by: wxiaoguang <wxiaoguang@gmail.com>
Co-authored-by: bircni <bircni@icloud.com>
This commit is contained in:
Karthik Bhandary
2026-06-14 19:22:37 +05:30
committed by GitHub
parent aab9737651
commit e82352f156
10 changed files with 987 additions and 33 deletions

View File

@@ -17,6 +17,7 @@ import (
// register supported doc types
_ "gitea.dev/modules/markup/console"
_ "gitea.dev/modules/markup/csv"
_ "gitea.dev/modules/markup/jupyter"
_ "gitea.dev/modules/markup/markdown"
_ "gitea.dev/modules/markup/orgmode"

View File

@@ -4,6 +4,7 @@
package htmlutil
import (
"errors"
"fmt"
"html/template"
"io"
@@ -88,6 +89,52 @@ func EscapeString(s string) template.HTML {
return template.HTML(template.HTMLEscapeString(s))
}
type HTMLWriter interface {
OriginWriter() io.Writer
WriteString(s string) HTMLWriter
WriteHTML(s template.HTML) HTMLWriter
WriteFormat(fmt template.HTML, args ...any) HTMLWriter
Err() error
}
type htmlWriter struct {
w io.Writer
errs []error
}
func (h *htmlWriter) OriginWriter() io.Writer {
return h.w
}
func (h *htmlWriter) WriteString(s string) HTMLWriter {
if _, err := io.WriteString(h.w, template.HTMLEscapeString(s)); err != nil {
h.errs = append(h.errs, err)
}
return h
}
func (h *htmlWriter) WriteHTML(s template.HTML) HTMLWriter {
if _, err := io.WriteString(h.w, string(s)); err != nil {
h.errs = append(h.errs, err)
}
return h
}
func (h *htmlWriter) WriteFormat(fmt template.HTML, args ...any) HTMLWriter {
if _, err := HTMLPrintf(h.w, fmt, args...); err != nil {
h.errs = append(h.errs, err)
}
return h
}
func (h *htmlWriter) Err() error {
return errors.Join(h.errs...)
}
func NewHTMLWriter(w io.Writer) HTMLWriter {
return &htmlWriter{w: w}
}
type HTMLBuilder struct {
sb strings.Builder
}

View File

@@ -5,6 +5,7 @@ package htmlutil
import (
"html/template"
"strings"
"testing"
"github.com/stretchr/testify/assert"
@@ -29,3 +30,11 @@ func TestHTMLBuilder(t *testing.T) {
assert.Equal(t, "&lt;<hr><span>&gt;&gt;</span>", b.String())
assert.Equal(t, template.HTML("&lt;<hr><span>&gt;&gt;</span>"), b.HTMLString())
}
func TestHTMLWriter(t *testing.T) {
sb := new(strings.Builder)
w := NewHTMLWriter(sb)
w.WriteString("<").WriteHTML("<hr>").WriteFormat("<span>%s%s</span>", ">", EscapeString(">"))
assert.Equal(t, "&lt;<hr><span>&gt;&gt;</span>", sb.String())
assert.NoError(t, w.Err())
}

View File

@@ -0,0 +1,74 @@
{
"metadata": {},
"nbformat": 4,
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["print('very-looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong')"],
"outputs": [
{
"output_type": "execute_result",
"text": ["very-looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong ...\n"]
},
{
"output_type": "stream",
"name": "stdout",
"text": ["stdout 1 ...\n", "stdout 2 ...\n"]
},
{
"output_type": "stream",
"name": "stderr",
"text": ["stderr ...\n"]
},
{
"data": {
"text/plain": ["data text 1\n", "data text 2\n"]
}
},
{
"data": {
"text/plain": true
}
},
{
"data": {
"image/svg+xml": ["<svg xmlns=\"http://www.w3.org/2000/svg\" width=\"2000\" height=\"20\"><rect width=\"2000\" height=\"20\" x=\"0\" y=\"0\" rx=\"5\" ry=\"5\" fill=\"red\"/></svg>"]
}
},
{
"data": {
"text/html": "<a href='/'>HTML Link</a>"
}
},
{
"data": {
"text/latex": "$$a=1$$"
}
},
{
"data": {
"text/plain": "plain text"
}
},
{
"output_type": "error",
"ename": "Error Name",
"traceback": ["stacktrace 1", "stacktrace 2"]
}
]
},
{
"cell_type": "unknown-cell"
},
{
"cell_type": "markdown",
"source": [
"# h1\n", "## h2\n", "### h3\n", "\n", "paragraph 1\n", "\n",
"very-looooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong\n",
"- list item 1\n", "- list item 2\n", "\n", "```python\n", "print('code block')\n", "```\n",
"<table><tr><th>th1</th><th>th2</th></tr><tr><td>td1</td><td>td2</td></tr></table>\n"
]
}
]
}

View File

@@ -0,0 +1,393 @@
// Copyright 2026 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT
package jupyter
import (
"encoding/base64"
"fmt"
"io"
"strings"
"sync"
"gitea.dev/modules/highlight"
"gitea.dev/modules/htmlutil"
"gitea.dev/modules/json"
"gitea.dev/modules/log"
"gitea.dev/modules/markup"
"gitea.dev/modules/markup/markdown"
"gitea.dev/modules/setting"
"gitea.dev/modules/util"
)
func init() {
markup.RegisterRenderer(renderer{})
}
// Renderer implements markup.Renderer for Jupyter notebooks
type renderer struct{}
var (
_ markup.Renderer = (*renderer)(nil)
_ markup.PostProcessRenderer = (*renderer)(nil)
_ markup.ExternalRenderer = (*renderer)(nil) // FIXME: this is not an external render, need to refactor the framework in the future
)
type mimeHandler struct {
Mime string
Fn func(w htmlutil.HTMLWriter, data string) error
}
func renderCellCodeOutputTextPlain(w htmlutil.HTMLWriter, text string) error {
w.WriteFormat(`<div class="cell-output-text"><pre>%s</pre></div>`, text)
return w.Err()
}
func renderCellCodeOutputUnsupported(w htmlutil.HTMLWriter, message string) error {
w.WriteFormat(`<div class="cell-output-unsupported">%s</div>`, message)
return w.Err()
}
var dataMimeHandlers = sync.OnceValue(func() []mimeHandler {
renderImage := func(w htmlutil.HTMLWriter, subtype, payload string) error {
w.WriteFormat(`<div class="cell-output-image"><img src="data:image/%s;base64,%s"></div>`, subtype, payload)
return w.Err()
}
renderUnsupportedOutput := func(message string) func(htmlutil.HTMLWriter, string) error {
return func(w htmlutil.HTMLWriter, _ string) error {
return renderCellCodeOutputUnsupported(w, message)
}
}
return []mimeHandler{
// Images (PNG, JPEG, SVG)
{"image/png", func(w htmlutil.HTMLWriter, d string) error {
return renderImage(w, "png", d)
}},
{"image/jpeg", func(w htmlutil.HTMLWriter, d string) error {
return renderImage(w, "jpeg", d)
}},
{"image/svg+xml", func(w htmlutil.HTMLWriter, d string) error {
return renderImage(w, "svg+xml", base64.StdEncoding.EncodeToString(util.UnsafeStringToBytes(d)))
}},
// Rich & Math Layouts
{"text/html", func(w htmlutil.HTMLWriter, d string) error {
// To future developers: don't allow custom CSS classes or attributes,
// because ".link-action" or "data-fetch-xxx" can send POST requests and lead to XSS.
// If you'd really like to support more, do remember to correctly sanitize the values.
w.WriteFormat(`<div class="cell-output-html">%s</div>`, markup.Sanitize(d))
return w.Err()
}},
{"text/latex", func(w htmlutil.HTMLWriter, d string) error {
w.WriteFormat(`<div class="cell-output-latex"><pre><code class="language-math display">%s</code></pre></div>`, trimMathDelimiters(d))
return w.Err()
}},
{"text/plain", renderCellCodeOutputTextPlain},
// Security Placeholders
{"application/javascript", renderUnsupportedOutput("[JavaScript output - execution disabled for security]")},
{"application/vnd.plotly.v1+json", renderUnsupportedOutput("[Plotly output - interactive plots not supported]")},
{"application/vnd.jupyter.widget-view+json", renderUnsupportedOutput("[Jupyter widget - interactive widgets not supported]")},
}
})
func (renderer) Name() string {
return "jupyter-render"
}
func (renderer) NeedPostProcess() bool { return true }
func (renderer) GetExternalRendererOptions() markup.ExternalRendererOptions {
return markup.ExternalRendererOptions{
// HINT: no need to let markup render sanitize the output because there are many special CSS class names, inline attributes.
// This render must guarantee that the output is safe and no XSS
SanitizerDisabled: true,
}
}
func (renderer) FileNamePatterns() []string {
return []string{"*.ipynb"}
}
func (renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return nil
}
// Notebook structures
type Notebook struct {
Cells []Cell `json:"cells"`
Metadata map[string]any `json:"metadata"`
Nbformat int `json:"nbformat"`
}
type Cell struct {
CellType string `json:"cell_type"`
Source any `json:"source"` // string or []string
Outputs []Output `json:"outputs,omitempty"`
ExecutionCount any `json:"execution_count,omitempty"` // int or null
Metadata map[string]any `json:"metadata,omitempty"`
}
type Output struct {
OutputType string `json:"output_type"`
Data map[string]any `json:"data,omitempty"`
Text any `json:"text,omitempty"` // string or []string
Name string `json:"name,omitempty"`
Traceback any `json:"traceback,omitempty"` // []string
Ename string `json:"ename,omitempty"`
Evalue string `json:"evalue,omitempty"`
}
// Render renders Jupyter notebook to HTML
func (renderer) Render(ctx *markup.RenderContext, input io.Reader, outputWriter io.Writer) error {
htmlWriter := htmlutil.NewHTMLWriter(outputWriter)
// the size is (should be) checked and/or limited by the caller to avoid OOM
var notebook Notebook
if err := json.NewDecoder(input).Decode(&notebook); err != nil {
htmlWriter.WriteFormat(`<div class="ui error message">Failed to parse notebook JSON: %v</div>`, err)
return htmlWriter.Err()
}
// Check nbformat version
if notebook.Nbformat < 4 {
htmlWriter.WriteFormat(
`<div class="ui info message">This notebook uses an older format (nbformat %d). Only nbformat 4+ is supported for rendering. Please upgrade the notebook in Jupyter or view the raw JSON.</div>`,
notebook.Nbformat,
)
return htmlWriter.Err()
}
// Detect language
language := "python" // default
if metadata, ok := notebook.Metadata["language_info"].(map[string]any); ok {
if name, ok := metadata["name"].(string); ok {
language = name
}
} else if kernelSpec, ok := notebook.Metadata["kernelspec"].(map[string]any); ok {
if lang, ok := kernelSpec["language"].(string); ok {
language = lang
}
}
// Start rendering
htmlWriter.WriteHTML(`<div class="jupyter-notebook">`)
// limiting the cell rendering to 100 cells
cells := notebook.Cells
truncated := false
const maxRenderedCells = 100
if len(cells) > maxRenderedCells {
cells = cells[:maxRenderedCells] // Slice down to exactly 100 elements instantly at the pointer layer
truncated = true
}
for _, cell := range cells {
if err := renderCell(ctx, htmlWriter, cell, language); err != nil {
log.Warn("Failed to render cell: %v", err) // TODO: RENDER-LOG-HANDLING: see other comments
continue
}
}
if truncated {
htmlWriter.WriteHTML(`<div class="ui warning message">`)
htmlWriter.WriteHTML(`<strong>Output truncated.</strong> This notebook contains too many cells to display efficiently.`)
htmlWriter.WriteHTML(`</div>`)
}
htmlWriter.WriteHTML(`</div>`)
return htmlWriter.Err()
}
func renderCellCode(output htmlutil.HTMLWriter, cell Cell, language string) error {
source := joinSource(cell.Source)
var executionCount *int64
if cell.ExecutionCount != nil {
if count, err := util.ToInt64(cell.ExecutionCount); err == nil {
executionCount = &count
}
}
output.WriteHTML(`<div class="cell-line">`)
{
if executionCount != nil {
output.WriteFormat(`<div class="cell-left cell-prompt">In [%d]:</div>`, *executionCount)
} else {
output.WriteHTML(`<div class="cell-left cell-prompt">In [ ]:</div>`)
}
// Highlight code
lexer := highlight.DetectChromaLexerByFileName("", language)
output.WriteFormat(`<div class="cell-right cell-input"><pre><code class="chroma language-%s">`, strings.ToLower(language))
output.WriteHTML(highlight.RenderCodeByLexer(lexer, source))
output.WriteHTML("</code></pre></div>")
}
output.WriteHTML(`</div>`)
// Render outputs
if len(cell.Outputs) > 0 {
hasExecutionResult := false
for _, out := range cell.Outputs {
if out.OutputType == "execute_result" {
hasExecutionResult = true
break
}
}
output.WriteHTML(`<div class="cell-line">`)
{
if hasExecutionResult && executionCount != nil {
output.WriteFormat(`<div class="cell-left cell-prompt">Out [%d]:</div>`, *executionCount)
} else {
output.WriteHTML(`<div class="cell-left cell-prompt"></div>`)
}
output.WriteHTML(`<div class="cell-right cell-output">`)
for _, out := range cell.Outputs {
renderCellCodeOutput(output, out)
}
output.WriteHTML(`</div>`)
}
output.WriteHTML(`</div>`)
}
return output.Err()
}
func renderCell(ctx *markup.RenderContext, output htmlutil.HTMLWriter, cell Cell, language string) error {
switch cell.CellType {
case "markdown":
output.WriteHTML(`
<div class="notebook-cell cell-type-markdown">
<div class="cell-line">
<div class="cell-left cell-prompt"></div>
<div class="cell-right">`)
if err := renderCellMarkdown(ctx, output, joinSource(cell.Source)); err != nil {
return err
}
output.WriteHTML(`</div></div></div>`)
case "code":
output.WriteHTML(`<div class="notebook-cell cell-type-code">`)
if err := renderCellCode(output, cell, language); err != nil {
return err
}
output.WriteHTML(`</div>`)
default:
output.WriteFormat(`
<div class="notebook-cell">
<div class="cell-line">
<div class="cell-left cell-prompt">Cell:</div>
<div class="cell-right cell-prompt">[Cell type %s - unsupported, skipped]</div>
</div>
</div>`, cell.CellType)
}
return output.Err()
}
func renderCellMarkdown(rctx *markup.RenderContext, output htmlutil.HTMLWriter, source string) error {
markdownCtx := markup.NewRenderContext(rctx)
// make sure the markdown render use the same options and helper to generate correct contents (e.g.: links)
markdownCtx.RenderOptions = rctx.RenderOptions
markdownCtx.RenderHelper = rctx.RenderHelper
output.WriteHTML(`<div class="embedded-markdown">`)
if err := markdown.Render(markdownCtx, strings.NewReader(source), output.OriginWriter()); err != nil {
return err
}
output.WriteHTML(`</div>`)
return output.Err()
}
func renderCellCodeOutput(output htmlutil.HTMLWriter, out Output) {
if out.Data != nil {
// Iterate through our priority list to find the best matching MIME handler available
for _, h := range dataMimeHandlers() {
if rawPayload, exists := out.Data[h.Mime]; exists {
var stringPayload string
// Flatten the polymorphic JSON input (string or []any) into a single clean string
switch v := rawPayload.(type) {
case string:
stringPayload = v
case []any:
stringPayload = joinSource(v)
default:
_ = renderCellCodeOutputUnsupported(output, fmt.Sprintf("[Data output - unsupported data type %T for mime type %s]", rawPayload, h.Mime))
continue
}
if err := h.Fn(output, stringPayload); err != nil {
// TODO: RENDER-LOG-HANDLING: outputting render's error to sever's log is not a proper approach
// The errors can be:
// * unsupported element (cell, data, etc): it should render the message on the UI to tell users that the content is not supported, or ignore them if they are ignore-able
// * logic error: it should report to server logs
// * network error: io.Writer tries to write to the HTTP connection, so the error can also be a network error, such error should be ignored
log.Error("Jupyter rendering engine failed for MIME type %s: %v", h.Mime, err)
}
// Return immediately after rendering the top matching priority format
return
}
}
}
// Stream output
if out.OutputType == "stream" && out.Text != nil {
streamName := util.Iif(out.Name == "stderr", "stderr", "stdout")
output.WriteFormat(`<pre class="cell-output-stream stream-%s">%s</pre>`, streamName, joinSource(out.Text))
return
}
// Error output
if out.OutputType == "error" {
traceback := ""
if tb, ok := out.Traceback.([]any); ok {
lines := make([]string, len(tb))
for i, line := range tb {
lines[i] = fmt.Sprint(line)
}
traceback = strings.Join(lines, "\n")
}
if traceback == "" && out.Ename != "" {
traceback = fmt.Sprintf("%s: %s", out.Ename, out.Evalue)
}
output.WriteFormat(`<pre class="cell-output-error">%s</pre>`, traceback)
return
}
// Generic text output
if out.Text != nil {
_ = renderCellCodeOutputTextPlain(output, joinSource(out.Text))
}
}
func joinSource(source any) string {
switch v := source.(type) {
case nil:
return ""
case string:
return v
case []any:
// the "source slice item" has EOL ("\n"), so just join them together
parts := make([]string, len(v))
for i, part := range v {
parts[i] = fmt.Sprint(part)
}
return strings.Join(parts, "")
default:
return fmt.Sprint(v)
}
}
// trimMathDelimiters strips a single pair of surrounding math delimiters ("$$...$$" or "$...$"),
// so the inner expression is handled by the math post-processor. Unlike strings.Trim, it does not
// eat unrelated "$" characters elsewhere in multi-expression content.
func trimMathDelimiters(s string) string {
s = strings.TrimSpace(s)
if t, ok := strings.CutPrefix(s, "$$"); ok {
return strings.TrimSuffix(t, "$$")
}
if t, ok := strings.CutPrefix(s, "$"); ok {
return strings.TrimSuffix(t, "$")
}
return s
}

View File

@@ -0,0 +1,314 @@
// Copyright 2026 The Gitea Authors. All rights reserved.
// SPDX-License-Identifier: MIT
package jupyter
import (
"fmt"
"strings"
"testing"
"gitea.dev/modules/markup"
"gitea.dev/modules/test"
"github.com/stretchr/testify/assert"
)
func TestRender(t *testing.T) {
r := renderer{}
t.Run("Basic notebook", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["print('hello')"],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": ["hello\n"]
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := &markup.RenderContext{}
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.Contains(t, result, `<div class="jupyter-notebook">`)
assert.Contains(t, result, `<div class="notebook-cell cell-type-code">`)
assert.Contains(t, result, `In [1]:`)
assert.Contains(t, result, `print`)
assert.Contains(t, result, `hello`)
assert.Contains(t, result, `stream-stdout`)
})
t.Run("Markdown cell with XSS Protection", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Title\n",
"Some text\n",
"[click me](javascript:alert(1))\n",
"<script>alert('dangerous')</script>"
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
// Assert normal markup still renders correctly
assert.Contains(t, result, `<div class="notebook-cell cell-type-markdown">`)
assert.Contains(t, result, `Title`)
assert.Contains(t, result, `Some text`)
assert.Contains(t, result, `click me`)
// CRITICAL SECURITY ASSERTIONS: Ensure XSS vectors are completely stripped
assert.NotContains(t, result, `javascript:alert`)
assert.NotContains(t, result, `<script>`)
})
t.Run("Cell limit truncation guardrail", func(t *testing.T) {
// Generate an oversized notebook containing 105 cells dynamically
var cellBlocks []string
for range 105 {
cellBlocks = append(cellBlocks, `{"cell_type": "markdown", "source": ["cell text"]}`)
}
input := fmt.Sprintf(`{"cells": [%s], "metadata": {}, "nbformat": 4}`, strings.Join(cellBlocks, ","))
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
// Verify it halts rendering gracefully and shows the truncation warning
assert.Contains(t, result, "Output truncated.")
assert.Contains(t, result, "This notebook contains too many cells to display efficiently.")
// Count occurrences of the rendered cells to ensure it sliced down to exactly 100 elements
assert.Equal(t, 100, strings.Count(result, `class="notebook-cell cell-type-markdown"`))
})
t.Run("Image output", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["import matplotlib.pyplot as plt"],
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
}
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.Contains(t, result, `<img src="data:image/png;base64,`)
assert.Contains(t, result, `iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==`)
})
t.Run("HTML output with style tag", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["import pandas as pd"],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": ["<style scoped>.dataframe tbody tr th { vertical-align: top; }</style><table class=\"dataframe\"><tr><td>1</td></tr></table>"]
}
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.NotContains(t, result, `<style scoped>`)
assert.Contains(t, result, `<table><tr><td>1</td></tr></table>`)
assert.Contains(t, result, `<td>1</td>`)
})
t.Run("Error output", func(t *testing.T) {
input := `{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"source": ["raise ValueError('test error')"],
"outputs": [
{
"output_type": "error",
"ename": "ValueError",
"evalue": "test error",
"traceback": ["ValueError: test error"]
}
]
}
],
"metadata": {},
"nbformat": 4
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
result := output.String()
assert.Contains(t, result, `ValueError: test error`)
assert.Contains(t, result, `cell-output-error`)
})
t.Run("Old nbformat version", func(t *testing.T) {
input := `{
"cells": [],
"metadata": {},
"nbformat": 3
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
err := r.Render(ctx, strings.NewReader(input), &output)
assert.NoError(t, err)
assert.Regexp(t, `<div class="ui info message">This notebook uses an older format.*</div>`, output.String())
})
}
func TestJoinSource(t *testing.T) {
tests := []struct {
name string
input any
expected string
}{
{
name: "String input",
input: "hello world",
expected: "hello world",
},
{
name: "Array input",
input: []any{"line1\n", "line2\n", "line3"},
expected: "line1\nline2\nline3",
},
{
name: "Empty array",
input: []any{},
expected: "",
},
{
name: "Single element array",
input: []any{"single"},
expected: "single",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := joinSource(tt.input)
assert.Equal(t, tt.expected, result)
})
}
}
func TestIntegrationAndSanitization(t *testing.T) {
// A mock malicious Jupyter notebook containing an XSS injection attempt
// inside a text/html output cell (e.g., pretending to be a poisoned Pandas DataFrame).
maliciousNotebook := `{
"nbformat": 4,
"nbformat_minor": 2,
"metadata": {},
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"source": ["a=1"],
"outputs": [
{
"output_type": "execute_result",
"execution_count": 1,
"data": {
"text/html": [
"<div><script>alert('XSS Vector')</script><table class=\"dataframe\"><tr><td>Safe Content</td></tr></table></div>"
]
},
"metadata": {}
}
]
}
]
}`
var output strings.Builder
ctx := markup.NewRenderContext(t.Context())
ctx.RenderOptions.MarkupType = "jupyter-render"
err := markup.Render(ctx, strings.NewReader(maliciousNotebook), &output)
assert.NoError(t, err)
const expected = `
<div class="jupyter-notebook">
<div class="notebook-cell cell-type-code">
<div class="cell-line">
<div class="cell-left cell-prompt">In [1]:</div>
<div class="cell-right cell-input">
<pre><code class="chroma language-python">
<span class="n">a</span><span class="o">=</span><span class="mi">1</span>
</code></pre>
</div>
</div>
<div class="cell-line">
<div class="cell-left cell-prompt">Out [1]:</div>
<div class="cell-right cell-output">
<div class="cell-output-html">
<div><table><tbody><tr><td>Safe Content</td></tr></tbody></table></div>
</div>
</div>
</div>
</div>
</div>`
assert.Equal(t, test.NormalizeHTMLSpaces(expected), test.NormalizeHTMLSpaces(output.String()))
}

View File

@@ -12,12 +12,16 @@ import (
"net/http"
"net/http/httptest"
"os"
"regexp"
"slices"
"strconv"
"strings"
"sync"
"gitea.dev/modules/json"
"gitea.dev/modules/util"
"golang.org/x/net/html"
)
// RedirectURL returns the redirect URL of a http response.
@@ -182,3 +186,48 @@ func ExternalServiceHTTP(t TestingT, envVarName, def string) string {
}
return val
}
var normalizeHTMLSpacesRegexp = sync.OnceValue(func() (ret struct {
afterRt, beforeLt *regexp.Regexp
},
) {
ret.afterRt = regexp.MustCompile(`>\s*`)
ret.beforeLt = regexp.MustCompile(`\s*<`)
return ret
})
func NormalizeHTMLSpaces(s string) string {
vars := normalizeHTMLSpacesRegexp()
s = vars.afterRt.ReplaceAllString(s, ">\n")
s = vars.beforeLt.ReplaceAllString(s, "\n<")
return strings.TrimSpace(s)
}
func NormalizeHTMLAttributes(t TestingT, s string) string {
nodes, err := html.Parse(strings.NewReader(s))
if err != nil {
t.Errorf("failed to parse expected HTML: %v", err)
return ""
}
var normalize func(n *html.Node)
normalize = func(n *html.Node) {
slices.SortFunc(n.Attr, func(a, b html.Attribute) int {
if cmp := strings.Compare(a.Namespace, b.Namespace); cmp != 0 {
return cmp
}
if cmp := strings.Compare(a.Key, b.Key); cmp != 0 {
return cmp
}
return strings.Compare(a.Val, b.Val)
})
for c := n.FirstChild; c != nil; c = c.NextSibling {
normalize(c)
}
}
var sb strings.Builder
if err = html.Render(&sb, nodes); err != nil {
t.Errorf("failed to render HTML: %v", err)
}
return sb.String()
}

View File

@@ -5,13 +5,12 @@ package integration
import (
"io"
"slices"
"strings"
"testing"
"gitea.dev/modules/test"
"github.com/PuerkitoBio/goquery"
"github.com/stretchr/testify/assert"
"golang.org/x/net/html"
)
// HTMLDoc struct
@@ -53,36 +52,10 @@ func AssertHTMLElement[T int | bool](t testing.TB, doc *HTMLDoc, selector string
func assertHTMLEq(t testing.TB, expected, actual string) {
t.Helper()
if expected == actual {
if expected == actual { // fast path
return
}
exp, err := html.Parse(strings.NewReader(expected))
if !assert.NoError(t, err) {
return
}
act, err := html.Parse(strings.NewReader(actual))
if !assert.NoError(t, err) {
return
}
var normalize func(n *html.Node)
normalize = func(n *html.Node) {
slices.SortFunc(n.Attr, func(a, b html.Attribute) int {
if cmp := strings.Compare(a.Namespace, b.Namespace); cmp != 0 {
return cmp
}
if cmp := strings.Compare(a.Key, b.Key); cmp != 0 {
return cmp
}
return strings.Compare(a.Val, b.Val)
})
for c := n.FirstChild; c != nil; c = c.NextSibling {
normalize(c)
}
}
normalize(exp)
normalize(act)
var expNormalized, actNormalized strings.Builder
assert.NoError(t, html.Render(&expNormalized, exp))
assert.NoError(t, html.Render(&actNormalized, act))
assert.Equal(t, expNormalized.String(), actNormalized.String())
exp := test.NormalizeHTMLAttributes(t, expected)
act := test.NormalizeHTMLAttributes(t, actual)
assert.Equal(t, exp, act)
}

View File

@@ -52,6 +52,7 @@
@import "./markup/content.css";
@import "./markup/codeblock.css";
@import "./markup/codepreview.css";
@import "./markup/jupyter.css";
@import "./font_i18n.css";
@import "./base.css";

View File

@@ -0,0 +1,93 @@
.markup.jupyter-render {
padding: 0;
}
.markup .jupyter-notebook {
padding: 20px;
background: var(--color-body);
border-bottom-left-radius: var(--border-radius);
border-bottom-right-radius: var(--border-radius);
font-family: var(--fonts-monospace);
display: flex;
flex-direction: column;
gap: 2em;
}
/* cell code */
.markup .jupyter-notebook .cell-line {
display: flex;
width: 100%;
gap: 0.5em;
}
.markup .jupyter-notebook .cell-left {
width: 100px;
flex-shrink: 0;
}
.markup .jupyter-notebook .cell-right {
flex: 1;
}
.markup .jupyter-notebook .cell-prompt {
padding: 10px 0;
color: var(--color-text-light-2);
font-size: 13px;
}
.markup .jupyter-notebook .cell-left.cell-prompt {
padding-left: 10px;
text-align: right;
white-space: nowrap;
user-select: none;
}
.markup .jupyter-notebook .cell-right.cell-prompt {
padding-right: 10px;
}
.markup .jupyter-notebook .cell-input,
.markup .jupyter-notebook .cell-output {
overflow-x: auto;
}
.markup .jupyter-notebook .cell-input pre,
.markup .jupyter-notebook .cell-output pre {
padding: 10px 16px;
font-size: 13px;
min-height: 40px;
margin: 0;
}
.markup .jupyter-notebook .cell-input pre {
background-color: var(--color-code-bg);
white-space: pre-wrap;
overflow-wrap: anywhere;
}
.markup .jupyter-notebook .cell-output {
display: flex;
flex-direction: column;
gap: 1em;
}
.markup .jupyter-notebook .cell-type-code {
display: flex;
flex-direction: column;
gap: 1em;
}
.markup .jupyter-notebook .cell-output-unsupported {
color: var(--color-text-light-2);
font-style: italic;
font-size: 13px;
}
.markup .jupyter-notebook .cell-output-error {
color: var(--color-red);
}
/* cell markdown */
.markup .jupyter-notebook .cell-right .embedded-markdown {
padding: 0 16px; /* match cell code right padding */
}