Skip to main content

Optimization

🔌 Interfaces​

InterfaceKey methodPurpose
IOptimizerVectorN Step(VectorN parameters, VectorN gradient)Single gradient-based update step
IObjectiveFunctiondouble Evaluate(double[] x), double[] Gradient(double[] x)Objective + gradient for Minimizer
IConvergenceCriterionbool HasConverged(int iteration, double loss, double gradNorm)Termination check

🎯 Single-Objective Optimisers​

GradientDescent — vanilla, momentum, Nesterov

var gd = new GradientDescent(learningRate: 0.01, momentum: 0.9, nesterov: true);

VectorN w = new VectorN(new double[] { 0, 0, 0 });
VectorN grad = ComputeGradient(w);
w = gd.Step(w, grad); // one update step

Adam / AdamW — adaptive moment estimation

var adam = new Adam(learningRate: 0.001, beta1: 0.9, beta2: 0.999);
w = adam.Step(w, grad);

// AdamW (decoupled weight decay)
var adamw = new Adam(learningRate: 0.001, weightDecay: 0.01, decoupledWeightDecay: true);

CoordinateDescent — L1/L2 regularised least-squares

Minimises 12∥Xw−y∥2+λ1∥w∥1+12λ2∥w∥2\frac{1}{2}\|Xw - y\|^2 + \lambda_1\|w\|_1 + \frac{1}{2}\lambda_2\|w\|^2 via cyclic coordinate updates with soft-thresholding. Used by Lasso and ElasticNet.

var cd = new CoordinateDescent(maxIterations: 1000, tolerance: 1e-7);
double[] weights = cd.Solve(X, y, l1: 0.1, l2: 0.01, skipBiasRegularisation: true);

Minimizer — convenience runner

Ties an IOptimizer, IConvergenceCriterion, and IObjectiveFunction together:

var minimizer = new Minimizer(
new Adam(learningRate: 0.01),
new MaxIterationsOrTolerance(maxIterations: 5000, tolerance: 1e-8));

VectorN x0 = new VectorN(new double[] { 5, -3 });
VectorN xMin = minimizer.Minimize(myObjective, x0);

Console.WriteLine($"Converged in {minimizer.IterationsUsed} iterations, loss = {minimizer.FinalLoss:E3}");

⚙️ Strategies​

EarlyStopping — patience-based training halt

var es = new EarlyStopping(patience: 10, minDelta: 1e-4);

for (int epoch = 0; epoch < maxEpochs; epoch++)
{
double valLoss = Evaluate(model, validationData);
if (es.Check(valLoss))
{
Console.WriteLine($"Early stop at epoch {epoch}");
break;
}
}

LearningRateSchedule — lr adjustment over training

var schedule = new LearningRateSchedule(
initialLr: 0.01,
type: LearningRateSchedule.ScheduleType.CosineAnnealing,
decaySteps: 1000);

for (int step = 0; step < 1000; step++)
{
double lr = schedule.GetLearningRate(step);
// use lr in optimizer
}
ScheduleFormula
ConstantΡ0\eta_0
StepDecayη0⋅γ⌊t/s⌋\eta_0 \cdot \gamma^{\lfloor t / s \rfloor}
ExponentialDecayη0⋅γt/s\eta_0 \cdot \gamma^{t/s}
InverseTimeDecayΡ0/(1+γt)\eta_0 / (1 + \gamma t)
CosineAnnealingηmin⁡+12(η0−ηmin⁡)(1+cos⁡(πt/T))\eta_{\min} + \frac{1}{2}(\eta_0 - \eta_{\min})(1 + \cos(\pi t / T))

🏹 Multi-Objective: Pareto Front​

ParetoSolution holds decision variables and objective values. ParetoFront provides non-dominated sorting and crowding distance computation.

// Create solutions (all objectives are minimised)
var solutions = new List<ParetoSolution>
{
new ParetoSolution(new[] { 1.0 }, new[] { 0.2, 0.8 }),
new ParetoSolution(new[] { 2.0 }, new[] { 0.5, 0.5 }),
new ParetoSolution(new[] { 3.0 }, new[] { 0.9, 0.1 }),
new ParetoSolution(new[] { 4.0 }, new[] { 0.6, 0.7 }), // dominated
};

// Non-dominated sorting: partition into ranked fronts
List<List<ParetoSolution>> fronts = ParetoFront.NonDominatedSort(solutions);
// fronts[0] = Pareto-optimal, fronts[1] = next rank, ...

// Or just get the Pareto-optimal front directly
List<ParetoSolution> optimal = ParetoFront.GetFront(solutions);

// Crowding distance (higher = more isolated = more diverse)
ParetoFront.ComputeCrowdingDistance(optimal);
double cd = optimal[0].CrowdingDistance;

Dominance check:

bool dominates = solutionA.Dominates(solutionB);
// true if A ≤ B on all objectives and A < B on at least one

🧬 Multi-Objective: NSGA-II​

NSGA-II (Deb et al. 2002) is a population-based evolutionary algorithm that converges towards the Pareto front while maintaining solution diversity via crowding distance. Uses SBX crossover and polynomial mutation.

// Bi-objective: minimise f₁(x) = x² and f₂(x) = (x-2)²
var nsga2 = new NSGA2(
evaluate: x => new[] { x[0] * x[0], (x[0] - 2) * (x[0] - 2) },
numVariables: 1,
lowerBounds: new[] { -5.0 },
upperBounds: new[] { 5.0 },
populationSize: 100,
generations: 200,
crossoverRate: 0.9,
mutationRate: 0.1,
seed: 42);

NSGA2Result result = nsga2.Run();

Console.WriteLine($"Pareto front: {result.FrontSize} solutions");
foreach (var sol in result.ParetoFront)
Console.WriteLine($" x = {sol.Variables[0]:F3}, f₁ = {sol.Objectives[0]:F3}, f₂ = {sol.Objectives[1]:F3}");

Multi-variable, multi-objective (ZDT1 benchmark):

int n = 30;
var nsga2 = new NSGA2(
evaluate: x =>
{
double f1 = x[0];
double g = 1 + 9 * Enumerable.Range(1, n - 1).Sum(i => x[i]) / (n - 1);
double f2 = g * (1 - Math.Sqrt(f1 / g));
return new[] { f1, f2 };
},
numVariables: n,
lowerBounds: Enumerable.Repeat(0.0, n).ToArray(),
upperBounds: Enumerable.Repeat(1.0, n).ToArray(),
populationSize: 200,
generations: 500,
seed: 123);

NSGA2Result result = nsga2.Run();

// result.ParetoFront contains the approximated Pareto front
// result.FinalPopulation contains the full final population
ParameterDefaultDescription
populationSize100Number of individuals (rounded up to even)
generations200Evolution cycles
crossoverRate0.9SBX crossover probability per pair
mutationRate0.1Polynomial mutation probability per variable
mutationScale0.1Mutation range relative to variable bounds
seed42Random seed for reproducibility