C# and the AI tooling renaissance: Practical libraries you should try

feature image

Ever feel like every week there’s a new AI library promising to save you hours of work? You’re not alone. I spend a fair bit of time trying things out so you don’t have to. In this post I summarise the most useful, practical C# tooling I’ve used or researched recently and show tiny, runnable examples so you can try them quickly.

Open Table of contents

Introduction / context
Quick checklist: what I picked and why
Packages and how they fit together
Practical snippets: a quick RAG example
Where this fits in a real project
Comparison: cost, latency, privacy and maturity
Concluding Remark

Introduction / context

You can reasonably build modern AI features in C# today without stepping outside the .NET ecosystem. There are two common trajectories:

Cloud-first: use Azure OpenAI / OpenAI with managed embeddings, completions and assistive services.
Local / hybrid: host models locally with ONNX or TorchSharp for private inference and use a vector DB for RAG (retrieval-augmented generation).

Both are valid. Which you choose depends on latency, privacy, and cost.

I’ll be pragmatic and honest: I’ll point out where a library is mature and where it still feels experimental. Try the small examples below and adapt them as you see fit. I’ve used several of these on and off in projects for a while now.

Quick checklist: what I picked and why

Microsoft.SemanticKernel: orchestration and higher-level agent patterns for .NET
Microsoft.Extensions.AI and Azure.AI.OpenAI: first-class Azure + .NET integration
OpenAI .NET SDKs (community and official): when you want to call OpenAI directly
LangChain.NET / AutoGen.NET: if you want chains, tools and agent workflows in .NET
ONNX Runtime / TorchSharp: run local models or optimise inference
Vector DB clients (Redis OM / NRedisStack, Qdrant, Pinecone): store and search embeddings

Packages and how they fit together

Below I summarise each option with a quick example and a short note on where it fits.

Microsoft.SemanticKernel

What it is: a Microsoft-built SDK for composing prompts, chaining prompts with code, and building agent-like flows in .NET. It’s great when you want to call models but keep logic and tooling inside your application.

Why use it: opinionated patterns for building Copilot-style assistants and good integration with Azure and local models.

NuGet:

dotnet add package Microsoft.SemanticKernel

Illustrative usage (very small):

// Build a kernel and call an LLM completion service (illustrative)
using Microsoft.SemanticKernel;

var kernel = Kernel.Builder
    .WithDefaultAIService("openai", new /* provider config */ {})
    .Build();

var result = await kernel.RunAsync("Write a short summary of Dependency Injection in C#");
Console.WriteLine(result);

Notes: Semantic Kernel is richer than this snippet shows; it provides memory, functions and plan-based orchestration.

Microsoft.Extensions.AI and Azure.AI.OpenAI (Azure-first)

What they are: Microsoft.Extensions.AI is a set of .NET abstractions for AI services; Azure.AI.OpenAI is the official Azure client for OpenAI-style models on Azure.

Why use them: if you host models with Azure (OpenAI or Azure AI Model Catalog), these packages give you a safe, .NET-idiomatic way to interact with models and embeddings.

NuGet:

dotnet add package Azure.AI.OpenAI
dotnet add package Microsoft.Extensions.AI

Embeddings example (Azure OpenAI via the Azure SDK):

using Azure;
using Azure.AI.OpenAI;

var endpoint = new Uri("https://your-azure-openai-endpoint");
var client = new OpenAIClient(endpoint, new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")));

var input = "The quick brown fox jumps over the lazy dog";
var embeddingsResponse = await client.GetEmbeddingsAsync("text-embedding-3-small", new EmbeddingsOptions(input));
var vector = embeddingsResponse.Value.Data[0].Embedding;

Console.WriteLine($"Vector length: {vector.Count}");

Check docs: Azure SDK method names and types evolve; consult the official Azure.AI.OpenAI docs for exact signatures and model IDs.

OpenAI .NET SDKs (official & community)

What they are: client libraries for calling OpenAI’s APIs from .NET. There’s an official library maintained for .NET and a few community alternatives.

Why use them: if you want direct OpenAI usage (not Azure) with idiomatic async C#.

NuGet (example):

dotnet add package OpenAI

Tiny example (HTTP-style flow):

// A lightweight, package-agnostic approach using HttpClient to call the REST API
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text.Json;

var http = new HttpClient();
http.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));

var payload = new { model = "gpt-4o-mini", prompt = "Explain SOLID in plain English", max_tokens = 300 };
var res = await http.PostAsJsonAsync("https://api.openai.com/v1/completions", payload);
var json = await res.Content.ReadAsStringAsync();
Console.WriteLine(json);

This is the most portable approach and helps you understand the underlying API when things go sideways.

LangChain.NET and AutoGen.Net / Orchestrators

What they are: .NET ports / implementations of orchestration frameworks (LangChain patterns, AutoGen). They provide higher-order constructs like chains, tool-using agents, and memory management.

Why use them: if you want pre-built patterns for retrieval chains, QA assistants, or multi-step agent workflows.

Status note: some of these projects are early-stage; check maturity before committing to them in production.

ONNX Runtime and TorchSharp: run models locally

What they are: runtimes for running ML models locally in .NET. Microsoft.ML.OnnxRuntime is stable and great for optimised inference. TorchSharp exposes libtorch for .NET and is useful for running certain models (Llama family support via community tooling).

NuGet:

dotnet add package Microsoft.ML.OnnxRuntime
dotnet add package TorchSharp

ONNX Runtime example (inference skeleton):

using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

using var session = new InferenceSession("path/to/model.onnx");
// Create input tensors then run
var inputs = new List<NamedOnnxValue> {
    NamedOnnxValue.CreateFromTensor("input", new DenseTensor<float>(new [] {1, 3, 224, 224}))
};

using var results = session.Run(inputs);
// Handle outputs

Notes: converting large LLMs to ONNX and getting good performance takes work; for many workloads (embeddings, smaller specialised models) ONNX is an excellent choice.

Vector stores: Redis (NRedisStack / Redis OM), Qdrant, Pinecone

What they are: specialised stores for high-dimensional vectors with similarity search. Redis offers modules for vector search; Qdrant and Pinecone are popular vector-first DBs with .NET clients.

Why use them: fast similarity search for RAG, semantic search and recommendations.

Quick install examples:

dotnet add package NRedisStack
dotnet add package Qdrant.Client
dotnet add package Pinecone.Client

Example flow (conceptual):

Create embedding for a document with Azure.AI.OpenAI or OpenAI SDK
Upsert vector into your vector DB (Redis / Qdrant / Pinecone)
Query nearest neighbours for a user query and use retrieved documents as prompt context

Small conceptual snippet for upsert/query (pseudo-C#):

var embedding = await GetEmbeddingAsync(text);
await vectorDb.UpsertAsync(id, embedding, metadata);
var neighbours = await vectorDb.SearchAsync(queryEmbedding, topK:5);

Each vector store has its own API; check the client docs for exact method names and types.

Practical snippets: a quick RAG example

Below is a compact end-to-end sketch of a retrieval-augmented generation flow using Azure OpenAI for embeddings, Redis (NRedisStack) for vectors and an LLM for completion.

// 1. Create embedding for a document
var openAiClient = new OpenAIClient(new Uri(azureEndpoint), new AzureKeyCredential(azureKey));
var embedResp = await openAiClient.GetEmbeddingsAsync("text-embedding-3-small", new EmbeddingsOptions("Important text to store"));
var docVector = embedResp.Value.Data[0].Embedding;

// 2. Upsert into Redis (pseudocode; follow client docs)
// await redisClient.VectorUpsertAsync("docs_index", docId, docVector, metadata);

// 3. For a query: create query embedding and search
var qEmb = (await openAiClient.GetEmbeddingsAsync("text-embedding-3-small", new EmbeddingsOptions("How do I do X?"))).Value.Data[0].Embedding;
// var hits = await redisClient.VectorSearchAsync("docs_index", qEmb, topK:5);

// 4. Build a prompt using top hits and call the LLM for completion
// var context = string.Join("\n---\n", hits.Select(h => h.Metadata.text));
// var completion = await llmClient.CreateCompletionAsync(new { model = "gpt-4o-mini", prompt = $"Use the following context:\n{context}\nAnswer: {userQuery}" });

This is intentionally compact; the real work is in metadata, chunking strategy, prompt engineering and handling model costs and latency.

Where this fits in a real project

Start small: add embeddings + a vector store and a simple retrieval chain. That gives immediate value (semantic search, FAQ bots).
If you’re Azure-centric, Microsoft.Extensions.AI + Azure.AI.OpenAI + Semantic Kernel provide a smooth experience and patterns for production apps.
If you need privacy and control, invest time in ONNX/TorchSharp workflows and host models locally. Expect engineering effort.

Edge cases to watch for:

Cost & Rate limits: embeddings and LLM calls add up; cache aggressively.
Vector freshness: decide when to re-embed documents and how to version vectors.
Hallucination: always verify critical outputs and provide provenance (metadata) from retrieved sources.

Comparison: cost, latency, privacy and maturity

Below is a compact table to help you pick a stack based on the constraints that matter to you.

Stack	Cost	Latency	Privacy	Maturity
Cloud (Azure OpenAI / OpenAI)	Medium–High: per-call charges; pay-as-you-go	Low–Medium: depends on region and model size	Low–Medium: provider processes data; enterprise isolation possible	High: stable SDKs and SLAs
Cloud + Semantic Kernel / Microsoft.Extensions.AI	Medium–High: underlying model costs remain	Low–Medium: small orchestration overhead	Low–Medium: same provider caveats; better integration on Azure	High: Microsoft-supported patterns
Managed Vector DBs (Pinecone, Qdrant SaaS)	Medium: storage and query billing	Low: optimised for similarity search	Medium: vendor hosts data; check residency policies	Medium–High: Pinecone mature; Qdrant rapidly maturing
Self-hosted Vector DBs (Redis, Qdrant self-hosted)	Low–Medium: infra and ops costs	Low: fast when deployed near your app	High: you control residency and access	High for Redis; Qdrant adoption growing
Local models (ONNX Runtime, TorchSharp)	Low–Medium: no per-call fees; hardware costs apply	Very low with GPU; slower on CPU	Very high: data stays on your infra	Medium: ONNX stable; local LLMs need engineering
Orchestration / Chains (LangChain.NET, AutoGen.Net)	Varies: depends on model and storage	Varies: orchestration adds overhead	Varies: depends on hosting	Early–Medium: .NET ports maturing

Quick recommendations:

Production stability, minimal ops: choose cloud-first (Azure OpenAI + Semantic Kernel) and pair it with a managed vector DB.
Privacy-first workloads: invest in local models and self-hosted vector stores (ONNX/TorchSharp + Redis/Qdrant).
Middle ground: self-host your vector DB, use cloud models for heavy inference, and add caching and throttling to control costs.

Concluding Remark

There’s no single correct stack. If you want to move quickly and are comfortable with the cloud, start with Azure + Semantic Kernel. If you need to run local models for privacy, look into ONNX and TorchSharp and combine them with a vector store.