Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# ncnn - AI Agent Developer Guide

ncnn is a high-performance neural network inference framework optimized for mobile and embedded platforms (Tencent, BSD-3-Clause). Written in C/C++ with minimal dependencies. Supports x86, ARM, RISC-V, LoongArch, MIPS CPUs and Vulkan GPU. Includes PNNX for PyTorch/ONNX-to-ncnn conversion.

## Repository Layout

```
src/ Core library (mat.h, net.h, layer.h, option.h, ...)
src/layer/ Generic layer implementations
src/layer/{x86,arm,riscv,loongarch,mips}/ Arch-optimized layers
src/layer/vulkan/ Vulkan GPU layers + shader/ (.comp GLSL shaders)
tools/pnnx/ PyTorch Neural Network eXchange converter
tools/{caffe,onnx}/ Legacy model converters
tests/ Unit tests (test_<layername>.cpp)
cmake/ Build modules (ncnn_add_layer.cmake)
toolchains/ Cross-compilation toolchain files
docs/ Documentation
.clang-format Code formatting (Allman, 4-space, C++03)
.github/workflows/ CI (build, test, coverage, format)
```

## Agent Documentation Index

Read these docs selectively based on the task at hand:

| Topic | Doc | When to read |
|---|---|---|
| Key data structures | [docs/agents/data-structures.md](docs/agents/data-structures.md) | Working with Mat, Layer, Net, Blob, ParamDict |
| Build and test | [docs/agents/build-and-test.md](docs/agents/build-and-test.md) | Building, testing, cross-compilation, coverage |
| Code style and portability | [docs/agents/code-style.md](docs/agents/code-style.md) | Writing code for src/ (C++03, simplestl, OpenMP rules) |
| CPU/GPU dispatch | [docs/agents/dispatch.md](docs/agents/dispatch.md) | Understanding layer registration, packing, Vulkan flow |
| PNNX architecture | [docs/agents/pnnx.md](docs/agents/pnnx.md) | Model conversion pipeline, IR, pass system |
| Task: Add ncnn operator | [docs/agents/task-add-operator.md](docs/agents/task-add-operator.md) | Adding a new layer to ncnn |
| Task: Add PNNX operator | [docs/agents/task-add-pnnx-operator.md](docs/agents/task-add-pnnx-operator.md) | Adding PyTorch op support to PNNX |
| Task: x86 SIMD optimization | [docs/agents/task-x86-optimization.md](docs/agents/task-x86-optimization.md) | SSE/AVX/AVX-512 layer optimization |
| Task: Vulkan optimization | [docs/agents/task-vulkan-optimization.md](docs/agents/task-vulkan-optimization.md) | GPU compute shader layer |
| Task: Cross-arch optimization | [docs/agents/task-cross-arch-optimization.md](docs/agents/task-cross-arch-optimization.md) | ARM NEON/SVE, RISC-V RVV, QEMU testing |
172 changes: 172 additions & 0 deletions docs/agents/build-and-test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Build and Test

## Basic Build (Linux)

```bash
cd ncnn
mkdir build && cd build
cmake ..
cmake --build . -j$(nproc)
```

## Key CMake Options

| Option | Default | Description |
|--------|---------|-------------|
| `NCNN_VULKAN` | OFF | Enable Vulkan GPU support |
| `NCNN_OPENMP` | ON | Enable OpenMP multi-threading |
| `NCNN_BUILD_TESTS` | OFF | Build unit tests |
| `NCNN_BUILD_TOOLS` | ON* | Build converter tools |
| `NCNN_BUILD_EXAMPLES` | ON* | Build example programs |
| `NCNN_BUILD_BENCHMARK` | ON | Build benchmark tool |
| `NCNN_SHARED_LIB` | OFF | Build shared library |
| `NCNN_RUNTIME_CPU` | ON | Runtime CPU feature detection & dispatch |
| `NCNN_SSE2` | ON | x86 SSE2 support |
| `NCNN_AVX` | ON | x86 AVX support |
| `NCNN_AVX2` | ON | x86 AVX2/FMA support |
| `NCNN_AVX512` | ON* | x86 AVX-512 support |
| `NCNN_ARM82` | ON | AArch64 fp16 (ARMv8.2) |
| `NCNN_ARM82DOT` | ON | AArch64 dot product |
| `NCNN_ARM84BF16` | ON | AArch64 BFloat16 |
| `NCNN_ARM84I8MM` | ON | AArch64 Int8 matrix multiply |
| `NCNN_ARM86SVE` | ON | AArch64 SVE |
| `NCNN_RVV` | ON | RISC-V Vector extension |
| `NCNN_SIMPLEMATH` | OFF | Use built-in math (no libm) |
| `NCNN_SIMPLESTL` | OFF | Use built-in STL (no libstdc++) |
| `WITH_LAYER_xxx` | ON | Enable/disable individual layers |

\* `NCNN_BUILD_TOOLS` and `NCNN_BUILD_EXAMPLES` default to OFF when cross-compiling or targeting Android/iOS. `NCNN_AVX512` defaults to ON only when the compiler supports it and `NCNN_AVX2` is ON.

## Build with Vulkan

```bash
cmake -DNCNN_VULKAN=ON ..
cmake --build . -j$(nproc)
```

Requires the Vulkan SDK. The bundled `glslang/` submodule compiles GLSL shaders to SPIR-V at build time.

## Build with Tests

```bash
cmake -DNCNN_BUILD_TESTS=ON -DNCNN_BUILD_TOOLS=OFF -DNCNN_BUILD_EXAMPLES=OFF ..
cmake --build . -j$(nproc)
ctest --output-on-failure -j$(nproc)
```

## Cross-Compilation

Toolchain files are in `toolchains/`. Example for AArch64:

```bash
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/aarch64-linux-gnu.toolchain.cmake \
-DNCNN_BUILD_TESTS=ON ..
cmake --build . -j$(nproc)
```

Run tests with QEMU:

```bash
TESTS_EXECUTABLE_LOADER=qemu-aarch64-static \
TESTS_EXECUTABLE_LOADER_ARGUMENTS="-L;/usr/aarch64-linux-gnu" \
ctest --output-on-failure -j8
```

For RISC-V with RVV:

```bash
export RISCV_ROOT_PATH=/path/to/riscv-toolchain
cmake -DCMAKE_TOOLCHAIN_FILE=../toolchains/riscv64-unknown-linux-gnu.toolchain.cmake \
-DNCNN_RVV=ON -DNCNN_BUILD_TESTS=ON ..
cmake --build . -j$(nproc)

# Test with QEMU (vlen=256)
TESTS_EXECUTABLE_LOADER=qemu-riscv64 \
TESTS_EXECUTABLE_LOADER_ARGUMENTS="-cpu;rv64,v=true,zfh=true,zvfh=true,vlen=256,elen=64,vext_spec=v1.0;-L;/path/to/sysroot" \
ctest --output-on-failure -j8
```

## Intel SDE for x86 ISA Testing

The CI uses Intel SDE to test advanced ISA extensions (AVX-512, AVX-VNNI, etc.) on machines that do not natively support them:

```bash
TESTS_EXECUTABLE_LOADER=/path/to/sde64 \
TESTS_EXECUTABLE_LOADER_ARGUMENTS="-spr;--" \
ctest --output-on-failure -j8
```

## Testing

Tests are in `tests/`. Each layer has a `test_<layername>.cpp` file.

### Test Pattern

Tests use `testutil.h` which provides `test_layer()` — it creates a layer with given `ParamDict` and weights, runs forward with random input using the naive (generic, non-optimized) layer implementation, then runs the same input through the CPU-optimized and Vulkan paths (when available), and compares the results with numerical tolerance checks.

```cpp
// tests/test_relu.cpp
#include "testutil.h"

static int test_relu(const ncnn::Mat& a, float slope)
{
ncnn::ParamDict pd;
pd.set(0, slope);
std::vector<ncnn::Mat> weights(0);
int ret = test_layer("ReLU", pd, weights, a);
if (ret != 0)
fprintf(stderr, "test_relu failed a.dims=%d a=(%d %d %d %d) slope=%f\n",
a.dims, a.w, a.h, a.d, a.c, slope);
return ret;
}

int main()
{
SRAND(7767517);
return test_relu(RandomMat(5, 6, 7, 24), 0.f)
|| test_relu(RandomMat(128), 0.1f);
}
```

### Adding a New Test

1. Create `tests/test_<layername>.cpp`
2. Add to `tests/CMakeLists.txt`: `ncnn_add_test(test_<layername>)`
3. Test all dimension ranks (1D, 2D, 3D, 4D) with various sizes, including:
- Sizes divisible by common pack sizes (4, 8, 16)
- Non-aligned sizes to test remainder loops
- Multiple parameter combinations

## Code Coverage

CI runs code coverage on every push/PR (see `.github/workflows/test-coverage.yml`). It builds with `NCNN_COVERAGE=ON` which adds `-coverage -fprofile-arcs -ftest-coverage` flags and links `-lgcov`. After tests run, `lcov` collects the `.gcda` / `.gcno` data and uploads to Codecov.

When developing, you should measure coverage locally to ensure your new code is well tested:

```bash
# Build with coverage
mkdir build-coverage && cd build-coverage
cmake -DCMAKE_BUILD_TYPE=debug \
-DNCNN_COVERAGE=ON \
-DNCNN_RUNTIME_CPU=OFF \
-DNCNN_OPENMP=OFF \
-DNCNN_BUILD_TOOLS=OFF \
-DNCNN_BUILD_EXAMPLES=OFF \
-DNCNN_BUILD_TESTS=ON ..
cmake --build . -j$(nproc)

# Run tests
ctest --output-on-failure -j$(nproc)

# Collect coverage
lcov -d ./src -c -o lcov.info
lcov -r lcov.info '/usr/*' -o lcov.info
lcov -r lcov.info '*/build-coverage/*' -o lcov.info
lcov --list lcov.info

# (Optional) Generate HTML report
genhtml lcov.info --output-directory coverage-html
# Open coverage-html/index.html in a browser
```

Aim for high coverage of your new or modified code paths. The CI coverage matrix tests multiple configurations — x86 ISA variants (none/sse2/avx/avx2/avx512/avx512vnni), cross-compiled architectures (ARM, RISC-V RVV, MIPS, LoongArch, PowerPC) via QEMU, Vulkan GPU (llvmpipe and SwiftShader), and OpenMP on/off — so make sure your tests exercise the relevant branches.
62 changes: 62 additions & 0 deletions docs/agents/code-style.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Code Style and Portability

## Formatting

The project uses **Allman brace style** with 4-space indentation, no tabs. Defined in `.clang-format` and `.astylerc`.

Key conventions:
- **Indentation**: 4 spaces, no tabs
- **Braces**: Allman style (opening brace on new line for functions, classes, control statements)
- **Namespaces**: No indentation inside `namespace ncnn { ... }`
- **Pointers**: Left-aligned (`float* ptr`, not `float *ptr`)
- **Column limit**: None (no hard line length limit)
- **Includes**: Not sorted by clang-format
- **Naming**: `snake_case` for variables/functions, `PascalCase` for class names, `UPPER_CASE` for macros
- **Comments**: `//` style, minimal — code is expected to be self-explanatory
- **Copyright header**: Every ncnn-authored source file starts with `// Copyright YYYY Tencent` and `// SPDX-License-Identifier: BSD-3-Clause`
- **SIMD code**: Uses `#if __SSE2__` / `#if __AVX__` / `#if __ARM_NEON` preprocessor guards, nested from wider to narrower
- **OpenMP**: `#pragma omp parallel for num_threads(opt.num_threads)` on the outer channel loop

Format code with:
```bash
./codeformat.sh # runs clang-format + astyle twice for stable output
```

You do **not** need to run this locally before submitting. The GitHub CI workflow (`.github/workflows/code-format.yml`) automatically formats all C/C++ source files and GLSL shaders on every push/PR and commits the formatting changes back. Just write code following the conventions above, and CI will fix any minor formatting deviations.

## Code Portability (Core Library)

ncnn's core library (`src/`) is designed for maximum compiler and platform compatibility. Strict portability rules apply to all code under `src/`:

### Language Standard

- **C code**: C99
- **C++ code**: C++03 (`.clang-format` enforces `Standard: c++03`)
- **Do NOT use** C++11 or later features in `src/`: no `auto`, `nullptr`, range-based for loops, `constexpr`, `std::move`, lambda expressions, `override`/`final` keywords, uniform initialization `{}`, `<thread>`, `<mutex>`, `<atomic>`, etc.
- Use `0` instead of `nullptr`, explicit type declarations instead of `auto`, traditional for loops instead of range-for.

### STL Restrictions

ncnn provides its own minimal STL implementation in `src/simplestl.h` (enabled with `NCNN_SIMPLESTL=ON`) to support environments without a C++ standard library (bare-metal, some embedded systems). All core library code must be compatible with this subset:

- **Allowed**: `std::vector`, `std::string`, `std::pair`, `std::list`, `std::stack`, `std::swap`, `std::min`, `std::max`, `std::partial_sort`, `std::less`, `std::greater`
- **Not available in simplestl**: `std::map`, `std::set`, `std::unordered_map`, `std::shared_ptr`, `std::unique_ptr`, `<algorithm>` (beyond `partial_sort`), `<functional>`, `<iostream>`, streams, smart pointers, etc.
- When writing core library code, only use STL templates that are implemented in `simplestl.h`.

### Math Restrictions

ncnn also provides `src/simplemath.h` / `src/simplemath.cpp` (enabled with `NCNN_SIMPLEMATH=ON`) as a drop-in replacement for `<math.h>` / `<cmath>`, for platforms without a math library. Core code should stick to standard C99 math functions.

### OpenMP Restrictions

ncnn provides a minimal OpenMP runtime (`src/simpleomp.h` / `src/simpleomp.cpp`, enabled with `NCNN_SIMPLEOMP=ON`) that supports both the LLVM libomp ABI and the GCC libgomp ABI. Only the following OpenMP usage is allowed in the core library:

```cpp
#pragma omp parallel for num_threads(opt.num_threads)
```

Do not use any other OpenMP directives such as `critical`, `atomic`, `reduction`, `task`, `simd`, `sections`, or `barrier`. The `collapse(2)` clause is used in a few places but should be limited to simple cases.

### Tools and PNNX — No Restriction

Code outside the core library — specifically `tools/pnnx/`, `tools/caffe/`, `tools/onnx/`, `examples/`, `tests/`, `python/` — is **not** subject to these portability restrictions. PNNX in particular uses **C++17** (or C++14 for PyTorch < 2.1) and freely uses modern C++ features, the full standard library, protobuf, etc.
97 changes: 97 additions & 0 deletions docs/agents/data-structures.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Key Data Structures

## Mat (`src/mat.h`)

The core tensor type. Supports 1D to 4D data with element packing for SIMD.

```cpp
class Mat {
void* data; // Raw data pointer
int* refcount; // Reference counting (NULL for external data)
size_t elemsize; // Bytes per element (4=fp32, 2=fp16, 1=int8 when elempack=1;
// equals scalar_size * elempack when packed, e.g., 16 for pack4 fp32)
int elempack; // Packed elements (1=scalar, 4=SSE/NEON, 8=AVX/fp16)
Allocator* allocator;
int dims; // 0=empty, 1=1D, 2=2D, 3=3D, 4=4D
int w, h, d, c; // Width, height, depth, channels
size_t cstep; // Channel stride (elements per channel)
};
```

Key concepts:
- **Element packing (`elempack`)**: Multiple elements stored together for SIMD. E.g., `elempack=4` means 4 floats packed as one unit (for SSE/NEON 128-bit). `elempack=8` for AVX 256-bit. Channel count `c` is divided by `elempack`.
- **Channel step (`cstep`)**: Aligned stride between channels for SIMD alignment.
- GPU variants: `VkMat` (Vulkan buffer), `VkImageMat` (Vulkan image).

## Net (`src/net.h`)

The inference engine. Loads param (graph) and model (weights), creates `Extractor` for inference.

```cpp
class Net {
Option opt; // Runtime options
int load_param(const char*); // Load graph structure (.param)
int load_model(const char*); // Load weights (.bin)
Extractor create_extractor(); // Create inference session
};

class Extractor {
int input(const char* name, const Mat& in); // Set input
int extract(const char* name, Mat& out); // Get output (runs inference)
};
```

## Layer (`src/layer.h`)

Base class for all operators. Key behavioral flags set in constructor:

```cpp
class Layer {
bool one_blob_only; // Single input/output (e.g., ReLU)
bool support_inplace; // Can modify input in-place
bool support_packing; // Accepts packed Mat (elempack > 1)
bool support_vulkan; // Has Vulkan implementation
bool support_bf16_storage;
bool support_fp16_storage;
bool support_int8_storage;
bool support_any_packing; // Layer handles any elempack internally (skip auto packing conversion)
bool support_vulkan_any_packing; // Same as above, but for Vulkan path

// CPU forward
virtual int forward(const std::vector<Mat>& bottom_blobs, std::vector<Mat>& top_blobs, const Option& opt) const;
virtual int forward(const Mat& bottom_blob, Mat& top_blob, const Option& opt) const;
virtual int forward_inplace(std::vector<Mat>& bottom_top_blobs, const Option& opt) const;
virtual int forward_inplace(Mat& bottom_top_blob, const Option& opt) const;

// Vulkan forward
virtual int forward(const VkMat& bottom_blob, VkMat& top_blob, VkCompute& cmd, const Option& opt) const;
virtual int forward_inplace(VkMat& bottom_top_blob, VkCompute& cmd, const Option& opt) const;

virtual int load_param(const ParamDict& pd); // Load params from .param
virtual int load_model(const ModelBin& mb); // Load weights from .bin
virtual int create_pipeline(const Option& opt); // Setup (e.g., create Vulkan pipelines)
virtual int destroy_pipeline(const Option& opt);
virtual int upload_model(VkTransfer& cmd, const Option& opt); // Upload weights to GPU
};
```

Forward interface selection table:

| one_blob_only | support_inplace | Required interface |
|---|---|---|
| false | false | `forward(vector<Mat>, vector<Mat>)` |
| false | true | `forward_inplace(vector<Mat>)` (must), `forward(vector<Mat>, vector<Mat>)` (optional) |
| true | false | `forward(Mat, Mat)` |
| true | true | `forward_inplace(Mat)` (must), `forward(Mat, Mat)` (optional) |

## Blob (`src/blob.h`)

A named tensor edge in the computation graph. Each blob has a producer layer and consumer layers.

## ParamDict (`src/paramdict.h`)

Key-value store for layer parameters. Keys are integers (0, 1, 2, ...). Values can be int, float, or arrays thereof. Used in `.param` files as `key=value`.

## Option (`src/option.h`)

Runtime configuration: `num_threads`, `use_vulkan_compute`, `use_fp16_packed`, `use_bf16_storage`, blob/workspace allocators, etc.
Loading
Loading