Ever feel like every week there’s a new AI library promising to save you hours of work? You’re not alone. I spend a fair bit of time trying things out so you don’t have to. In this post I summarise the most useful, practical C# tooling I’ve used or researched recently and show tiny, runnable examples so you can try them quickly.
Table of contents
Open Table of contents
Introduction / context
You can reasonably build modern AI features in C# today without stepping outside the .NET ecosystem. There are two common trajectories:
- Cloud-first: use Azure OpenAI / OpenAI with managed embeddings, completions and assistive services.
- Local / hybrid: host models locally with ONNX or TorchSharp for private inference and use a vector DB for RAG (retrieval-augmented generation).
Both are valid. Which you choose depends on latency, privacy, and cost.
I’ll be pragmatic and honest: I’ll point out where a library is mature and where it still feels experimental. Try the small examples below and adapt them as you see fit. I’ve used several of these on and off in projects for a while now.
Quick checklist: what I picked and why
- Microsoft.SemanticKernel: orchestration and higher-level agent patterns for .NET
- Microsoft.Extensions.AI and Azure.AI.OpenAI: first-class Azure + .NET integration
- OpenAI .NET SDKs (community and official): when you want to call OpenAI directly
- LangChain.NET / AutoGen.NET: if you want chains, tools and agent workflows in .NET
- ONNX Runtime / TorchSharp: run local models or optimise inference
- Vector DB clients (Redis OM / NRedisStack, Qdrant, Pinecone): store and search embeddings
Packages and how they fit together
Below I summarise each option with a quick example and a short note on where it fits.
Microsoft.SemanticKernel
What it is: a Microsoft-built SDK for composing prompts, chaining prompts with code, and building agent-like flows in .NET. It’s great when you want to call models but keep logic and tooling inside your application.
Why use it: opinionated patterns for building Copilot-style assistants and good integration with Azure and local models.
NuGet:
dotnet add package Microsoft.SemanticKernel
Illustrative usage (very small):
// Build a kernel and call an LLM completion service (illustrative)
using Microsoft.SemanticKernel;
var kernel = Kernel.Builder
.WithDefaultAIService("openai", new /* provider config */ {})
.Build();
var result = await kernel.RunAsync("Write a short summary of Dependency Injection in C#");
Console.WriteLine(result);
Notes: Semantic Kernel is richer than this snippet shows; it provides memory, functions and plan-based orchestration.
Microsoft.Extensions.AI and Azure.AI.OpenAI (Azure-first)
What they are: Microsoft.Extensions.AI is a set of .NET abstractions for AI services; Azure.AI.OpenAI
is the official Azure client for OpenAI-style models on Azure.
Why use them: if you host models with Azure (OpenAI or Azure AI Model Catalog), these packages give you a safe, .NET-idiomatic way to interact with models and embeddings.
NuGet:
dotnet add package Azure.AI.OpenAI
dotnet add package Microsoft.Extensions.AI
Embeddings example (Azure OpenAI via the Azure SDK):
using Azure;
using Azure.AI.OpenAI;
var endpoint = new Uri("https://your-azure-openai-endpoint");
var client = new OpenAIClient(endpoint, new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")));
var input = "The quick brown fox jumps over the lazy dog";
var embeddingsResponse = await client.GetEmbeddingsAsync("text-embedding-3-small", new EmbeddingsOptions(input));
var vector = embeddingsResponse.Value.Data[0].Embedding;
Console.WriteLine($"Vector length: {vector.Count}");
Check docs: Azure SDK method names and types evolve; consult the official Azure.AI.OpenAI docs for exact signatures and model IDs.
OpenAI .NET SDKs (official & community)
What they are: client libraries for calling OpenAI’s APIs from .NET. There’s an official library maintained for .NET and a few community alternatives.
Why use them: if you want direct OpenAI usage (not Azure) with idiomatic async C#.
NuGet (example):
dotnet add package OpenAI
Tiny example (HTTP-style flow):
// A lightweight, package-agnostic approach using HttpClient to call the REST API
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text.Json;
var http = new HttpClient();
http.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));
var payload = new { model = "gpt-4o-mini", prompt = "Explain SOLID in plain English", max_tokens = 300 };
var res = await http.PostAsJsonAsync("https://api.openai.com/v1/completions", payload);
var json = await res.Content.ReadAsStringAsync();
Console.WriteLine(json);
This is the most portable approach and helps you understand the underlying API when things go sideways.
LangChain.NET and AutoGen.Net / Orchestrators
What they are: .NET ports / implementations of orchestration frameworks (LangChain patterns, AutoGen). They provide higher-order constructs like chains, tool-using agents, and memory management.
Why use them: if you want pre-built patterns for retrieval chains, QA assistants, or multi-step agent workflows.
Status note: some of these projects are early-stage; check maturity before committing to them in production.
ONNX Runtime and TorchSharp: run models locally
What they are: runtimes for running ML models locally in .NET. Microsoft.ML.OnnxRuntime
is stable and great for optimised inference. TorchSharp
exposes libtorch for .NET and is useful for running certain models (Llama family support via community tooling).
NuGet:
dotnet add package Microsoft.ML.OnnxRuntime
dotnet add package TorchSharp
ONNX Runtime example (inference skeleton):
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using var session = new InferenceSession("path/to/model.onnx");
// Create input tensors then run
var inputs = new List<NamedOnnxValue> {
NamedOnnxValue.CreateFromTensor("input", new DenseTensor<float>(new [] {1, 3, 224, 224}))
};
using var results = session.Run(inputs);
// Handle outputs
Notes: converting large LLMs to ONNX and getting good performance takes work; for many workloads (embeddings, smaller specialised models) ONNX is an excellent choice.
Vector stores: Redis (NRedisStack / Redis OM), Qdrant, Pinecone
What they are: specialised stores for high-dimensional vectors with similarity search. Redis offers modules for vector search; Qdrant and Pinecone are popular vector-first DBs with .NET clients.
Why use them: fast similarity search for RAG, semantic search and recommendations.
Quick install examples:
dotnet add package NRedisStack
dotnet add package Qdrant.Client
dotnet add package Pinecone.Client
Example flow (conceptual):
- Create embedding for a document with
Azure.AI.OpenAI
or OpenAI SDK - Upsert vector into your vector DB (Redis / Qdrant / Pinecone)
- Query nearest neighbours for a user query and use retrieved documents as prompt context
Small conceptual snippet for upsert/query (pseudo-C#):
var embedding = await GetEmbeddingAsync(text);
await vectorDb.UpsertAsync(id, embedding, metadata);
var neighbours = await vectorDb.SearchAsync(queryEmbedding, topK:5);
Each vector store has its own API; check the client docs for exact method names and types.
Practical snippets: a quick RAG example
Below is a compact end-to-end sketch of a retrieval-augmented generation flow using Azure OpenAI for embeddings, Redis (NRedisStack) for vectors and an LLM for completion.
// 1. Create embedding for a document
var openAiClient = new OpenAIClient(new Uri(azureEndpoint), new AzureKeyCredential(azureKey));
var embedResp = await openAiClient.GetEmbeddingsAsync("text-embedding-3-small", new EmbeddingsOptions("Important text to store"));
var docVector = embedResp.Value.Data[0].Embedding;
// 2. Upsert into Redis (pseudocode; follow client docs)
// await redisClient.VectorUpsertAsync("docs_index", docId, docVector, metadata);
// 3. For a query: create query embedding and search
var qEmb = (await openAiClient.GetEmbeddingsAsync("text-embedding-3-small", new EmbeddingsOptions("How do I do X?"))).Value.Data[0].Embedding;
// var hits = await redisClient.VectorSearchAsync("docs_index", qEmb, topK:5);
// 4. Build a prompt using top hits and call the LLM for completion
// var context = string.Join("\n---\n", hits.Select(h => h.Metadata.text));
// var completion = await llmClient.CreateCompletionAsync(new { model = "gpt-4o-mini", prompt = $"Use the following context:\n{context}\nAnswer: {userQuery}" });
This is intentionally compact; the real work is in metadata, chunking strategy, prompt engineering and handling model costs and latency.
Where this fits in a real project
- Start small: add embeddings + a vector store and a simple retrieval chain. That gives immediate value (semantic search, FAQ bots).
- If you’re Azure-centric, Microsoft.Extensions.AI + Azure.AI.OpenAI + Semantic Kernel provide a smooth experience and patterns for production apps.
- If you need privacy and control, invest time in ONNX/TorchSharp workflows and host models locally. Expect engineering effort.
Edge cases to watch for:
- Cost & Rate limits: embeddings and LLM calls add up; cache aggressively.
- Vector freshness: decide when to re-embed documents and how to version vectors.
- Hallucination: always verify critical outputs and provide provenance (metadata) from retrieved sources.
Comparison: cost, latency, privacy and maturity
Below is a compact table to help you pick a stack based on the constraints that matter to you.
Stack | Cost | Latency | Privacy | Maturity |
---|---|---|---|---|
Cloud (Azure OpenAI / OpenAI) | Medium–High: per-call charges; pay-as-you-go | Low–Medium: depends on region and model size | Low–Medium: provider processes data; enterprise isolation possible | High: stable SDKs and SLAs |
Cloud + Semantic Kernel / Microsoft.Extensions.AI | Medium–High: underlying model costs remain | Low–Medium: small orchestration overhead | Low–Medium: same provider caveats; better integration on Azure | High: Microsoft-supported patterns |
Managed Vector DBs (Pinecone, Qdrant SaaS) | Medium: storage and query billing | Low: optimised for similarity search | Medium: vendor hosts data; check residency policies | Medium–High: Pinecone mature; Qdrant rapidly maturing |
Self-hosted Vector DBs (Redis, Qdrant self-hosted) | Low–Medium: infra and ops costs | Low: fast when deployed near your app | High: you control residency and access | High for Redis; Qdrant adoption growing |
Local models (ONNX Runtime, TorchSharp) | Low–Medium: no per-call fees; hardware costs apply | Very low with GPU; slower on CPU | Very high: data stays on your infra | Medium: ONNX stable; local LLMs need engineering |
Orchestration / Chains (LangChain.NET, AutoGen.Net) | Varies: depends on model and storage | Varies: orchestration adds overhead | Varies: depends on hosting | Early–Medium: .NET ports maturing |
Quick recommendations:
- Production stability, minimal ops: choose cloud-first (Azure OpenAI + Semantic Kernel) and pair it with a managed vector DB.
- Privacy-first workloads: invest in local models and self-hosted vector stores (ONNX/TorchSharp + Redis/Qdrant).
- Middle ground: self-host your vector DB, use cloud models for heavy inference, and add caching and throttling to control costs.
Concluding Remark
There’s no single correct stack. If you want to move quickly and are comfortable with the cloud, start with Azure + Semantic Kernel. If you need to run local models for privacy, look into ONNX and TorchSharp and combine them with a vector store.