Sequence Models
Sequence-specific deep-learning models live under CSharpNumerics.ML.Sequence.
They keep the existing IModel contract by interpreting each Matrix row as a flattened (timesteps x features) sample.
Currently available models:
CNN1DClassifierinCSharpNumerics.ML.Sequence.Models.ClassificationCNN1DRegressorinCSharpNumerics.ML.Sequence.Models.RegressionLSTMClassifierinCSharpNumerics.ML.Sequence.Models.ClassificationLSTMRegressorinCSharpNumerics.ML.Sequence.Models.RegressionBiLSTMClassifierinCSharpNumerics.ML.Sequence.Models.ClassificationBiLSTMRegressorinCSharpNumerics.ML.Sequence.Models.RegressionTCNClassifierinCSharpNumerics.ML.Sequence.Models.ClassificationTCNRegressorinCSharpNumerics.ML.Sequence.Models.Regression
Current sequence infrastructure:
ISequenceModelinCSharpNumerics.ML.Sequence.InterfacesConvolutionPaddingModeinCSharpNumerics.ML.Sequence.EnumsโValid,Same,CausalConv1DLayerinCSharpNumerics.ML.Sequence.Layersโ supports causal padding and dilationMaxPool1DLayerinCSharpNumerics.ML.Sequence.LayersGlobalAvgPool1DLayerinCSharpNumerics.ML.Sequence.LayersFlattenLayerinCSharpNumerics.ML.Sequence.LayersLSTMLayerinCSharpNumerics.ML.Sequence.LayersBiLSTMLayerinCSharpNumerics.ML.Sequence.LayersActivationLayerinCSharpNumerics.ML.Sequence.Layersโ parameter-free pointwise activationDropoutLayerinCSharpNumerics.ML.Sequence.Layersโ inverted dropout (train only)BatchNorm1DLayerinCSharpNumerics.ML.Sequence.Layersโ channel-wise normalisationResidualBlockinCSharpNumerics.ML.Sequence.Layersโ TCN residual block (causal + dilated)TCNBlockinCSharpNumerics.ML.Sequence.Layersโ exponentially dilated residual stack
๐ CNN1D Architectureโ
Default CNN1D architecture:
Conv1D -> GlobalAvgPool -> Dense(hidden) -> Dense(output)
Optional variants:
UseMaxPooling = trueinsertsMaxPool1DLayerafter the convolution.UseGlobalAveragePooling = falseswitches toFlattenLayerbefore the dense projection.
Shared CNN1D hyperparameters:
TimeStepsFeaturesFiltersKernelSizeConvStridePadding(Same,Valid)UseMaxPoolingPoolSizePoolStrideUseGlobalAveragePoolingHiddenUnitsLearningRateEpochsBatchSizeActivation
Additional regression hyperparameters:
L2
๐ LSTM Architectureโ
Default LSTM architecture:
LSTMLayer(returnSequences=false) -> Dense(hidden) -> Dense(output)
The LSTM layer implements the standard four-gate equations (forget, input, output, cell candidate) with full BPTT and gradient clipping. Key features:
- Forget gate bias initialized to
1.0to reduce vanishing gradients - Configurable
ClipNormfor gradient clipping (default:5.0) returnSequences=falseoutputs only the final hidden state
LSTM hyperparameters:
TimeStepsFeaturesHiddenSize- LSTM hidden/cell state dimensionHiddenUnits- optional dense layer after LSTMClipNorm- max gradient norm (default:5.0)LearningRateEpochsBatchSizeActivationL2
โ๏ธ Bi-LSTM Architectureโ
Default Bi-LSTM architecture:
BiLSTMLayer(returnSequences=false) -> Dense(hidden) -> Dense(output)
The Bi-LSTM layer composes two LSTMLayer instances - one processing the input forwards, one backwards - and concatenates their hidden states per timestep so that output dimension = 2 x HiddenSize.
When returnSequences=false, the output is [h_fwd_T | h_bwd_1].
Bi-LSTM hyperparameters are identical to LSTM (same HiddenSize, ClipNorm, etc.). The dense layer automatically adapts to the 2 x HiddenSize input width.
โฑ๏ธ TCN Architectureโ
A Temporal Convolutional Network stacks dilated causal convolutions inside residual blocks, giving an exponentially growing receptive field while preserving sequence length. Unlike an RNN it processes all timesteps in parallel, and unlike a plain CNN its causal padding guarantees no leakage from future timesteps.
Default TCN architecture:
TCNBlock -> GlobalAvgPool -> Dense(hidden) -> Dense(output)
Each TCNBlock is a stack of ResidualBlocks whose dilation doubles per level (1, 2, 4, 8, โฆ). A residual block is:
Conv1D(causal, dilated) -> BatchNorm -> ReLU -> Dropout
-> Conv1D(causal, dilated) -> BatchNorm -> Dropout -> (+ skip) -> ReLU
The skip connection uses a 1ร1 convolution when the channel count changes, otherwise an identity. The receptive field of an L-level block with kernel k is 1 + 2(k-1)ยท(2^L โ 1) timesteps โ e.g. 8 levels with k = 3 cover 1021 timesteps.
TCN hyperparameters:
TimeStepsFeaturesChannels- channel width of every residual blockKernelSizeLevels- number of residual blocks (dilation doubles each level)DropoutRateHiddenUnits- optional dense layer after global poolingActivationLearningRateEpochsBatchSizeL2
The dilated/causal layers are also usable standalone for custom architectures:
using CSharpNumerics.ML.Sequence.Layers;
using CSharpNumerics.ML.Sequence.Enums;
using CSharpNumerics.ML.Enums;
// Causal, dilated convolution โ output[t] sees only inputs at or before t
var conv = new Conv1DLayer(
inputChannels: 1, filters: 8, kernelSize: 3, stride: 1,
padding: ConvolutionPaddingMode.Causal, activation: ActivationType.Linear, seed: 1, dilation: 4);
int rf = conv.ReceptiveField; // (3-1)*4 + 1 = 9
var tcn = new TCNBlock(inputChannels: 1, channels: 16, kernelSize: 3, levels: 8);
int reach = tcn.ReceptiveField; // 1021 timesteps
Example with SupervisedExperiment (CNN1D):
using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence.Models.Classification;
var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<CNN1DClassifier>(g => g
.Add("TimeSteps", 128)
.Add("Features", 1)
.Add("Filters", 8)
.Add("KernelSize", 5)
.Add("HiddenUnits", 16)
.Add("LearningRate", 0.01)
.Add("Epochs", 200)
.Add("BatchSize", 16)
.Add("Padding", CSharpNumerics.ML.Sequence.Enums.ConvolutionPaddingMode.Same)
.Add("Activation", ActivationType.ReLU)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 5))
.Run();
Example with SupervisedExperiment (LSTM):
using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence.Models.Classification;
var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<LSTMClassifier>(g => g
.Add("TimeSteps", 128)
.Add("Features", 1)
.Add("HiddenSize", 32)
.Add("HiddenUnits", 16)
.Add("LearningRate", 0.001)
.Add("Epochs", 200)
.Add("BatchSize", 16)
.Add("ClipNorm", 5.0)
.Add("Activation", ActivationType.ReLU)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 5))
.Run();
Example with SupervisedExperiment (TCN regressor):
using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence.Models.Regression;
var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<TCNRegressor>(g => g
.Add("TimeSteps", 128)
.Add("Features", 1)
.Add("Channels", 16)
.Add("KernelSize", 3)
.Add("Levels", 4) // dilations 1, 2, 4, 8
.Add("DropoutRate", 0.1)
.Add("HiddenUnits", 16)
.Add("LearningRate", 0.01)
.Add("Epochs", 200)
.Add("BatchSize", 16)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 5))
.Run();
๐ช TimeSeries Integration - SequenceDataHelperโ
SequenceDataHelper bridges TimeSeries (from CSharpNumerics.Statistics.Data) to the sequence model pipeline by creating sliding-window samples.
using CSharpNumerics.ML.Sequence;
using CSharpNumerics.Statistics.Data;
// Load a light curve from CSV (columns: Time, Flux, Label)
var ts = TimeSeries.FromCsv("lightcurve.csv");
// Create windows of 128 timesteps, stride 1, using column 1 ("Label") as target
var (X, y) = SequenceDataHelper.CreateWindows(ts, windowSize: 128, labelColumnIndex: 1, stride: 1);
// X shape: [numWindows x 128] (1 feature: Flux)
// y shape: [numWindows] (label from last timestep in each window)
Overloads:
CreateWindows(TimeSeries, windowSize, labelColumnIndex, stride)- extracts features and labels from aTimeSeries, excluding the label column from features.CreateWindows(double[][], double[], windowSize, stride)- works with raw column arrays when labels are computed separately.
๐ฐ๏ธ Exoplanet-Transit Detection Exampleโ
Synthetic Kepler-like light curve -> windowed samples -> CNN1D classification:
using CSharpNumerics.ML;
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.Experiment;
using CSharpNumerics.ML.Sequence;
using CSharpNumerics.ML.Sequence.Models.Classification;
using CSharpNumerics.Statistics.Data;
// 1. Build a TimeSeries with flux and transit labels
var ts = new TimeSeries(times, new[] { flux, labels }, new[] { "Flux", "Label" });
// 2. Window into samples
var (X, y) = SequenceDataHelper.CreateWindows(ts, windowSize: 20, labelColumnIndex: 1, stride: 5);
// 3. Train a CNN1DClassifier with grid search
var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<CNN1DClassifier>(g => g
.Add("TimeSteps", 20)
.Add("Features", 1)
.Add("Filters", 8)
.Add("KernelSize", 5)
.Add("HiddenUnits", 8)
.Add("LearningRate", 0.02)
.Add("Epochs", 150)
.Add("BatchSize", 16)
.Add("Activation", ActivationType.ReLU)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 3))
.Run();
// result.BestScore -> transit detection accuracy
๐งฉ Neural Network Building Blocksโ
The neural-network stack now exposes reusable components for sequence-oriented architectures without changing the existing IModel contract.
Reusable dense/activation orchestration remains in CSharpNumerics.ML.NeuralNetwork, while sequence-specific layers and models live under CSharpNumerics.ML.Sequence.
Available infrastructure:
Activationsfor reusable ReLU, Sigmoid, Tanh, Linear, and Softmax transformsILayerfor modular forward/backward layer compositionDenseLayerfor trainable fully connected sequence stepsSequentialModelfor stacking layers with shared forward/backward orchestration
These types are the reusable foundation for both generic feedforward models and the sequence-specific components in CSharpNumerics.ML.Sequence.
Example:
using CSharpNumerics.ML.Enums;
using CSharpNumerics.ML.NeuralNetwork;
using CSharpNumerics.ML.NeuralNetwork.Layers;
using CSharpNumerics.ML.Sequence.Models.Classification;
using CSharpNumerics.Numerics.Objects;
using CSharpNumerics.Numerics.Optimization.SingleObjective;
var model = new SequentialModel(
new DenseLayer(4, 8, ActivationType.ReLU),
new DenseLayer(8, 1, ActivationType.Linear));
var inputSequence = new[]
{
new VectorN(new[] { 0.2, 0.4, 0.6, 0.8 })
};
VectorN prediction = model.ForwardSingle(inputSequence);
VectorN lossGradient = prediction - new VectorN(new[] { 1.0 });
model.BackwardSingle(lossGradient);
model.ApplyGradients(
new GradientDescent(learningRate: 0.01),
new GradientDescent(learningRate: 0.01),
batchSize: 1);
var classifier = new CNN1DClassifier
{
TimeSteps = 128,
Features = 1,
Filters = 8,
KernelSize = 5,
HiddenUnits = 16,
LearningRate = 0.01,
Epochs = 200
};