Engineering·12 min read·Apr 20, 2026

Why We Chose AWS Bedrock Over OpenAI for Clinical Dental AI

Most dental AI tools default to OpenAI because it is the easiest button to press — the SDK is friendly, the docs are excellent, and a prototype is a weekend away. That choice looks defensible on a whiteboard right up until someone in legal asks for the Business Associate Agreement, the retention addendum, and the model-version pinning policy for a workload that will process thousands of PHI-laden clinical notes per day.

At NexV, the clinical AI infrastructure decision did not start with a model leaderboard. It started with a list of contractual and architectural guarantees we had to give every practice that handed us a patient record — and that list eliminated most options before any benchmark was run. What follows is the engineering rationale behind choosing AWS Bedrock over OpenAI's API for a HIPAA-grade dental AI platform processing real-time clinical decision support across practices and DSOs.

This is a specific capabilities argument for a specific constraint set — protected health information, sub-2-second latency ceilings, auditable model behavior, and compliance documentation that holds up under a third-party HIPAA audit. If you have read our coverage of dental AI data privacy and HIPAA compliance for AI systems, this is where theory becomes architecture.

Bedrock vs OpenAI for clinical dental AI: AWS Bedrock runs clinical dental AI entirely inside a HIPAA-covered VPC under a single AWS Business Associate Agreement, with zero data retention, audit-grade logging, and multi-model access through one API. OpenAI's API sends PHI to external servers, requires a separate enterprise BAA with narrower scope, retains inputs for 30 days by default, and offers GPT variants only.

Why did NexV choose AWS Bedrock over OpenAI?

NexV chose AWS Bedrock because it allows clinical dental AI to run entirely within a HIPAA-covered VPC under a single Business Associate Agreement, with zero data egress to third-party processors, multi-model flexibility for different clinical tasks, and deterministic model version pinning that prevents unannounced behavior changes in production clinical workflows.

Why the Default Choice Creates Compliance Risk

Most AI infrastructure discussions begin with an unstated assumption that the data being processed is either public or low-sensitivity. Dental clinical data is neither. A single patient interaction routinely generates clinical notes containing diagnoses, treatment histories, imaging findings, insurance identifiers, and demographics — all of which qualify as Protected Health Information under the HIPAA Privacy Rule and the HITECH Act.

Accordingly, the regulatory requirements are not theoretical. A dental AI vendor processing PHI without a BAA exposes every connected practice to HIPAA liability, and a vendor whose model provider retains training rights over clinical inputs creates a data-governance problem that no contract amendment can retroactively fix. These constraints filter the infrastructure options aggressively before anyone argues about model quality.

That said, a compliant architecture is a chain of guarantees across every hop the data takes, not a single checkbox. In practice, the weakest link is almost always the inference endpoint — that is where the richest clinical payloads leave the practice's trust boundary. Choosing the wrong endpoint silently undermines every other compliance control upstream and downstream of it.

What makes dental AI infrastructure different from general AI?

Dental AI processes protected health information on nearly every request — patient names, diagnoses, treatment plans, insurance identifiers — which demands HIPAA-compliant infrastructure with BAA coverage, encryption at rest and in transit, audit logging, and zero data retention by the model provider. General AI applications rarely face this combination of regulatory, latency, and data-residency constraints simultaneously.

What Bedrock Actually Gives You

AWS Bedrock is a managed service that provides API access to foundation models from multiple providers — Anthropic (Claude), Meta (Llama), Amazon (Titan), Cohere, and others — inside your existing AWS account, under the same BAA that already covers the rest of your HIPAA-eligible workload. The critical distinction, and the one that reshapes every downstream compliance conversation, is that Bedrock inference runs inside the AWS account boundary rather than on a third-party endpoint on the public internet.

Operationally, every inference request stays inside your trust boundary. No patient data traverses the public internet to reach a model provider's servers. No third party retains, logs, or trains on the clinical text you send. The model provider itself cannot see what you send, because the compute happens on AWS-managed infrastructure under the AWS BAA — not on infrastructure under the model maker's BAA, their terms, or their retention policy.

Bedrock Architecture Capabilities for Clinical AI

VPC-Native Deployment.

Bedrock endpoints are reachable via VPC PrivateLink, which means inference traffic never leaves your private network. Clinical notes travel from your application to the model and back without touching the public internet, which eliminates an entire category of data-exposure risks that external API calls introduce by definition.

Single BAA Coverage.

AWS offers a comprehensive BAA that covers Bedrock alongside the rest of your HIPAA-eligible AWS services. There is no separate BAA to negotiate with Anthropic, Meta, or any other model provider, because the AWS BAA already covers the entire inference path — which materially simplifies the compliance documentation that practices and DSOs have to produce when auditing their vendor chain.

Zero Data Retention.

Bedrock does not store input prompts or model outputs. There is no training on your data, no provider-side logging of clinical content, and no retention window — once inference completes, the only copy of that payload is the one you deliberately wrote to your own systems. This is a contractual guarantee under the AWS service terms rather than a policy that can quietly change with a terms-of-service update.

Multi-Model Access.

Bedrock exposes models from multiple providers through a single API, which lets us route Claude to clinical note generation and treatment-plan reasoning, specialized smaller models to structured data extraction, and Amazon Titan to embeddings — all within the same VPC, under the same BAA, with no additional vendor relationships, procurement cycles, or BAA negotiations to manage.

Where OpenAI Runs Out of Room

OpenAI builds excellent models. GPT-4 and its successors are capable across a wide range of tasks, clinical text processing among them, and we use them willingly in non-PHI contexts. The limitations we ran into are not about model quality; they are about the infrastructure and contractual framework wrapped around those models, and about how that framework interacts with a workload that has to defend itself in a HIPAA audit.

What are OpenAI's limitations for HIPAA-regulated dental AI?

OpenAI's primary limitations for clinical dental AI are architectural — data leaves your environment for inference on OpenAI-managed servers, BAA coverage requires a separate enterprise agreement with narrower scope than AWS, model versions change on a schedule that can shift clinical output formats without warning, and there is no VPC-native deployment option that keeps PHI inside your own network boundary.

Criterion	AWS Bedrock	OpenAI API
Data residency	Your VPC, your region, your account	OpenAI infrastructure (US-based)
BAA coverage	AWS BAA covers Bedrock + all HIPAA-eligible services	Separate BAA required, enterprise tier only
Data retention	Zero retention by default	30-day retention unless opted out via BAA
Model versioning	Pin exact model version indefinitely	Versions deprecated on schedule, silent updates
Network path	VPC PrivateLink (no public internet)	Public API endpoint (TLS encrypted)
Model selection	Claude, Llama, Titan, Cohere, others	GPT models only
Fine-tuning on dental data	Custom model training, data stays in your account	Fine-tuning available, data uploaded to OpenAI
Audit logging	CloudTrail + CloudWatch, native integration	API usage dashboard, limited granularity

The model-versioning row deserves its own paragraph, because the implications are easy to miss until they break production. In clinical AI, output consistency is a regulatory artifact, not a nice-to-have. When a model version changes underneath you, note format can shift, treatment-plan structure can reorder, and extracted entities can drift in ways that are visually subtle and functionally catastrophic for downstream validators. Bedrock lets us pin an exact model version indefinitely and promote the next one only after it clears our clinical validation suite — precisely the control plane a clinical workload needs.

The HIPAA Architecture in Practice

HIPAA compliance is not a checkbox; it is an architecture. Our Bedrock deployment follows a defense-in-depth model with multiple layers of protection around patient data at every stage of the inference pipeline, and every boundary is either inside the VPC or explicitly covered by the AWS BAA.

How Bedrock handles PHI: Under the AWS BAA, Bedrock processes clinical PHI inside your own VPC via PrivateLink, with AES-256 encryption at rest, TLS 1.3 in transit, and CloudTrail audit logging on every invocation. Inference runs on AWS-managed compute — model providers like Anthropic never receive, retain, or train on your patient data, and AWS itself does not use customer data for model training unless explicitly opted in.

HIPAA Compliance Layers

Encryption at rest (AES-256, AWS KMS)100%

Encryption in transit (TLS 1.3, PrivateLink)100%

Audit trail coverage (CloudTrail)100%

Data egress to third parties0%

Model provider data retention0%

IAM role-based access control100%

In practice, the AWS BAA covers the entire inference path — from API Gateway through the Lambda prompt builder to the Bedrock endpoint and back through the validation layer. There is no gap at which PHI exists outside a BAA-protected service, which is exactly the property an auditor looks for and exactly the property that a public-API model provider cannot offer by construction. For a deeper breakdown of how BAA coverage works across AI service chains, see our HIPAA compliance guide for dental AI.

Matching Models to Clinical Tasks

Different clinical tasks have different model requirements. Generating a treatment-plan narrative requires reasoning depth and long context. Extracting tooth numbers and CDT codes from a note requires precision, determinism, and sub-second latency. Embedding patient records for semantic search calls for a different architecture entirely. Bedrock lets us match models to tasks without managing multiple vendor relationships or threading PHI through multiple BAAs.

Our Model-to-Task Mapping

Claude (Anthropic) — Clinical Reasoning.

Claude handles tasks that require deep clinical reasoning: generating SOAP notes from ambient scribe transcripts, drafting treatment-plan narratives, explaining findings in patient-friendly language, and flagging contraindications against documented allergies and medications. Its long context window holds full patient history without truncation, and its instruction-following precision keeps output inside the structured formats our PMS integrations expect.

Amazon Titan — Embeddings and Semantic Search.

Titan Embeddings powers patient-record semantic search — when a clinician asks the system for similar cases to the patient in front of them, the retrieval layer returns records by clinical similarity rather than keyword match. Running embeddings on Titan inside the same VPC means vector representations of patient records never leave the AWS account boundary, which matters because embeddings, while compressed, are still PHI-adjacent.

Specialized Models — Structured Extraction.

For high-volume, low-latency tasks such as extracting tooth numbers, surface codes, and CDT codes from clinical text, we use smaller models fine-tuned on dental NLP extraction tasks. These models process an encounter note in under 200ms, which matters when a practice is running 80-plus encounters per day and the treatment-planning system needs structured input in real time.

With OpenAI, we would be limited to GPT variants across all of these tasks. Some workloads would be overserved by a frontier model, the cost curve would bend the wrong way, and embeddings would require a separate architecture because OpenAI embeddings run on OpenAI's infrastructure rather than inside our VPC. Moreover, a single-vendor stack eliminates the ability to route around a model-quality regression — and in clinical workloads, that routing headroom is its own form of uptime.

Cost at Dental-Practice Scale

Cost modeling for clinical AI is not a simple per-token calculation. The honest number has to account for full infrastructure — inference, data transfer, logging, encryption key management, compliance overhead — because any of those lines, ignored at planning, will later surface on a DSO's finance review and undercut the business case. We modeled costs for a mid-size DSO running 5,000 patient interactions per day across a representative mix of reasoning, extraction, and embedding tasks.

Monthly Cost Comparison: 5,000 Interactions/Day

Cost Component	AWS Bedrock	OpenAI API
Inference (clinical notes)	$2,800 - $3,400	$3,200 - $4,100
Inference (structured extraction)	$600 - $900	$1,100 - $1,600
Embeddings	$180 - $250	$120 - $180
VPC / PrivateLink	$150 - $200	N/A (public endpoint)
Compliance overhead (logging, KMS)	$300 - $400	$100 - $200 (less granular)
Total estimated monthly	$4,030 - $5,150	$4,520 - $6,080

Estimates based on an average note of 800-1,200 tokens input and 400-600 tokens output. Bedrock reflects Provisioned Throughput pricing; OpenAI reflects Enterprise-tier pricing with BAA.

The headline delta is modest — roughly 15% to 20% lower on Bedrock at this volume. After all, the savings on structured extraction are the most interesting line on that table, because we can route those requests to smaller, cheaper models for tasks that do not require frontier-model reasoning. The more important number the table cannot show is the cost of a compliance incident a public-internet inference path would expose you to — and that number is never modest.

Latency and Throughput Under Real Workloads

Clinical decision support is useless if it arrives after the clinician has already made the decision. Our latency budget is 2 seconds end-to-end for alerts and 5 seconds for full note generation. These are hard ceilings rather than soft targets — anything slower breaks the workflow, breaks clinician trust, and breaks the adoption curve that makes the platform economically viable.

What latency does clinical dental AI require?

Real-time clinical dental AI requires sub-2-second latency for decision-support alerts (contraindication warnings, missing-documentation flags) and sub-5-second latency for full clinical note generation. VPC-native deployment with Provisioned Throughput on AWS Bedrock delivers P95 latencies of 1.1 seconds for alerts and 3.8 seconds for note generation, compared to 1.4 seconds and 4.6 seconds through external API endpoints under comparable load.

Accordingly, Bedrock's Provisioned Throughput option earns its keep by guaranteeing dedicated compute capacity, which eliminates the latency spikes that shared endpoints exhibit at peak. When 200 practices finish morning huddles at 8:45 AM and simultaneously pull AI-generated day summaries, we cannot afford queuing delays, and a public multi-tenant endpoint cannot make that guarantee. Provisioned Throughput costs more per token, but it buys the consistency clinical workflows require — and in production, retry-avoidance alone more than pays for the uplift.

Production performance on Bedrock: Provisioned Throughput delivers P95 latencies of 1.1 seconds for clinical alerts and 3.8 seconds for full note generation, beating external API endpoints under peak-hour load while eliminating queuing-driven retries. After six months in production, routing simple extraction to smaller models cut average inference cost by 38 percent without sacrificing clinical accuracy benchmarks.

Fine-Tuning for Dental-Specific Terminology

Foundation models understand general medical language well, but dental terminology sits in a specialized corner of that space. CDT codes, tooth-numbering systems (Universal, Palmer, FDI), surface designations, periodontal classifications, and the abbreviation-heavy shorthand clinicians use in real notes all require domain adaptation before extraction accuracy clears the bar downstream validators expect.

In practice, Bedrock's custom model training keeps fine-tuning data inside the AWS account. The training dataset — de-identified dental clinical notes, annotated with entity labels and code mappings — never leaves the VPC. The resulting custom model is private to our account and cannot be accessed by other Bedrock customers or by AWS itself. This matters because dental training data is scarce and proprietary; routing it through a third-party fine-tuning pipeline would collapse the moat before the model ever served a request.

Fine-Tuning Impact on Dental Tasks

CDT code extraction (base model)78%

CDT code extraction (fine-tuned)93%

Periodontal classification (base model)71%

Periodontal classification (fine-tuned)91%

Clinical abbreviation expansion (base model)82%

Clinical abbreviation expansion (fine-tuned)96%

Fine-tuning on 12,000 de-identified dental clinical notes improved extraction accuracy by 12 to 20 percentage points across dental tasks, with the largest gains in periodontal classification where the base model frequently confused staging and grading terminology.

The Production Architecture We Actually Run

Diagrams on a slide deck can make any architecture look defensible. Implementation is where dental AI infrastructure either earns its compliance posture or quietly compromises it. Here is the actual production architecture we run for NexV's clinical AI platform, stage by stage, with the BAA boundary intact at every hop.

NexV Production Architecture

Step 1 — Ingestion Layer.

Clinical data enters through API Gateway with mutual TLS authentication. Each request is validated against the practice's API key and rate limits, and the payload — clinical note text, patient context, task type — is encrypted and placed on an SQS queue. Real-time decision-support requests bypass the queue and hit Lambda directly to preserve the sub-2-second latency budget.

Step 2 — Prompt Construction.

A Lambda function builds the prompt by combining clinical input with the system prompt for the task (note generation, code extraction, treatment planning, or clinical alert). Patient-history context is pulled from DynamoDB inside the VPC, and the full prompt is assembled inside the VPC boundary — no PHI touches any service outside the BAA-covered path at any point.

Step 3 — Bedrock Inference.

The prompt is sent to the Bedrock endpoint via VPC PrivateLink. The model — Claude for reasoning tasks, our fine-tuned specialized model for extraction tasks — processes the request and returns the output. Provisioned Throughput guarantees consistent latency regardless of platform-wide demand, and Bedrock Guardrails filter the output for obvious PII leaks or unsafe content before it hits the validation layer.

Step 4 — Post-Processing and Validation.

Model output passes through a validation layer that checks clinical consistency — are extracted tooth numbers valid (1-32 permanent, A-T primary)? Do CDT codes match documented surfaces? Is the treatment plan internally consistent with charted history? Failures trigger a retry or route the output to human review rather than letting clinically invalid content reach the provider's screen.

Step 5 — Delivery and Audit.

Validated output is returned to the practice's system and simultaneously logged to an encrypted S3 bucket for audit. CloudTrail captures every Bedrock invocation with timestamp, model version, token count, and request metadata. Clinical content is stored separately in the encrypted audit log, accessible only to authorized compliance personnel under IAM role-based access controls.

What Six Months in Production Has Taught Us

We have been running this architecture in production since October 2025. Six months at scale has surfaced a handful of lessons that no pre-deployment planning round could have predicted, because production clinical workloads break in ways that only production clinical workloads can break.

What lessons did NexV learn running Bedrock in production?

After six months processing clinical dental data at scale on AWS Bedrock, the key operational lessons are that model-version pinning is essential for clinical output stability, Provisioned Throughput eliminates the latency variability that breaks real-time workflows, prompt engineering outperforms fine-tuning for most clinical reasoning tasks, and the validation layer catches more clinically meaningful errors than the model itself introduces.

Production Lessons

Lesson 1 — Model Version Pinning Is Non-Negotiable.

A Claude model update once changed how the model formatted periodontal pocket-depth tables and broke our downstream parser in a way that was visually subtle and functionally catastrophic. Version pinning on Bedrock means we test every new version against our clinical validation suite before promoting it, and we run it in shadow mode for 72 hours against the pinned version before cutting the live workload over.

Lesson 2 — Provisioned Throughput Pays for Itself.

On-demand Bedrock pricing is cheaper per token, but P99 latency spikes in the morning-rush window (8:00-9:30 AM) produced timeouts that forced retries and effectively doubled cost-per-successful-request. Switching to Provisioned Throughput raised base cost 30 percent and eliminated retries, producing a net 12 percent reduction and a dramatically better clinician experience in the busiest hour of the day.

Lesson 3 — The Validation Layer Is the Product.

Our validation layer catches 3.2 percent of model outputs that contain clinical inconsistencies — an extracted CDT code mismatched to surface count, a tooth number outside the valid range, a plan referencing an already-extracted tooth, or a recommendation contradicting an active allergy. These catches matter more than raw model-accuracy improvements, because they prevent clinically meaningful errors from ever reaching the provider's screen.

Lesson 4 — Multi-Model Routing Reduces Cost Without Sacrificing Quality.

Not every request needs a frontier model. Routing simple extraction to smaller models and reserving Claude for complex reasoning reduced average inference cost 38 percent while maintaining accuracy benchmarks. The logic is straightforward — task type determines model — but it only works when you have multiple models inside the same compliant infrastructure, which a single-vendor API structurally cannot offer.

Lesson 5 — Prompt Engineering Outperforms Fine-Tuning for Reasoning.

Fine-tuning improved extraction accuracy, but for clinical reasoning tasks — treatment planning, contraindication detection, patient communication — well-crafted prompts with clinical few-shot examples outperformed fine-tuned models at a fraction of the operational overhead. We maintain a prompt library of 40-plus clinical task templates, version-controlled and regression-tested, and prompt updates ship in hours rather than the days a retraining cycle requires.

When OpenAI Is the Right Call

Engineering decisions are context-dependent. OpenAI's API is the better choice for dental AI applications that do not process PHI — marketing content generation, patient-education materials, or practice-management analytics on fully de-identified data. The API is simpler to integrate, the documentation is excellent, and the architectural friction of Bedrock would be wasted on those workloads.

Furthermore, for prototyping and research, OpenAI's playground and fine-tuning interface have a lower barrier to entry than Bedrock's setup, which requires AWS expertise on the team. For a solo developer exploring dental AI concepts before a production system, starting with OpenAI and migrating to Bedrock once you are ready to handle PHI at scale is a reasonable sequence — provided the migration plan is written down on day one, not the day before the first paying practice comes online.

What This Means for Practices Evaluating Dental AI

If your dental AI application processes PHI — and any application touching clinical notes, patient records, or treatment plans does, regardless of vendor positioning — the infrastructure decision should start with compliance and work backward to model selection. Bedrock's architecture makes compliance the default rather than an afterthought, which happens to make the vendor-evaluation exercise easier.

After all, the model-quality gap between providers has narrowed to the point where infrastructure, compliance, and operational characteristics matter more than benchmark scores at real clinical scale. The best-scoring model is the wrong choice if it cannot run inside a HIPAA-compliant architecture your legal team can defend in an audit — and it becomes the very wrong choice the moment a breach investigation begins and somebody has to reconstruct, request-by-request, where the PHI actually went.

To see how this architecture translates into a clinical workflow for your practice or DSO — from ambient capture through treatment planning to claims — book a technical demo. We walk through the actual infrastructure, not a slide deck. Explore the full platform capabilities or review pricing for your practice size.

Frequently Asked Questions

Can we switch from OpenAI to AWS Bedrock without rebuilding our dental AI application?

Migration requires changing the API integration layer but not the core clinical logic. Prompt formats differ between GPT and Claude, so templates need adaptation and re-validation against your clinical output suite. The larger engineering effort is setting up VPC infrastructure, PrivateLink endpoints, and IAM policies — typically 2-4 weeks for a team with AWS experience. Re-running the validation suite against the new model outputs adds another 1-2 weeks. A realistic total migration timeline is 4-6 weeks.

Does AWS Bedrock support real-time streaming for clinical note generation?

Yes. Bedrock supports streaming via its InvokeModelWithResponseStream API, which is critical for clinical note generation where clinicians need output appearing in real time rather than waiting for the full response. We stream notes token-by-token to the provider's screen, which reduces perceived latency from 3-5 seconds (full generation) to under 500ms (time to first token). The streaming connection stays inside the VPC through PrivateLink, preserving the same security posture as batch requests.

How does Bedrock handle model failover for high-availability clinical systems?

Bedrock provides cross-region inference profiles that automatically route requests to available regions if the primary region has issues. We configure clinical AI to fail over from us-east-1 to us-west-2, both covered under the same AWS BAA. For model-level failover, our routing layer automatically falls back to an alternative model with adapted prompts. Failover adds 200-400ms of latency but maintains service availability; in six months of production, we have triggered model failover twice, both under 15 minutes.

What is the minimum AWS infrastructure needed to run Bedrock for a single dental practice?

For a single practice, the footprint can be minimal — one VPC with a PrivateLink endpoint, a Lambda for prompt construction, an API Gateway for ingestion, and an S3 bucket for audit logs. This runs roughly $200-$400 per month before inference, and Bedrock inference for 30-50 patients per day adds $150-$300 per month depending on task complexity. The total is well under a part-time IT administrator, but it does require AWS expertise to set up correctly — which is why most single practices access Bedrock-powered AI through a platform like NexV rather than building from scratch.

Does using AWS Bedrock mean patient data is shared with Amazon?

No. Under the AWS BAA, Amazon acts as a Business Associate — they provide the infrastructure but do not access, use, or retain clinical data processed through Bedrock. This is the same relationship that already exists when a practice uses AWS to host its PMS. Model providers (Anthropic, Meta, and others) also do not receive or retain data processed through Bedrock, because inference runs on AWS-managed infrastructure rather than on the model provider's servers. AWS contractually commits to not using customer data for model training without explicit opt-in, which we do not enable.

How does Bedrock pricing compare for dental imaging AI versus clinical text AI?

Dental imaging AI and clinical text AI have different cost profiles on Bedrock. Text-based clinical AI (note generation, code extraction, treatment planning) is priced per token and typically costs $0.02-$0.08 per interaction depending on context length and model. Imaging AI requires multimodal or custom models trained on radiograph data, which cost more per inference — roughly $0.10-$0.25 per image analysis. However, imaging runs less frequently than text AI (radiographs are periodic, text runs every encounter), so text AI typically accounts for 70-80% of total Bedrock spend despite the lower per-request cost.

Can dental AI on Bedrock integrate with on-premise practice management systems?

Yes, but it requires a secure connection between the practice's on-premise network and the AWS VPC. We use AWS Site-to-Site VPN or AWS Direct Connect for on-premise PMS installations. Clinical data travels through an encrypted tunnel from the practice network to the VPC, is processed through Bedrock, and returns through the same tunnel. This adds 10-30ms of latency depending on geographic distance, which is negligible for clinical workflows. For cloud-hosted PMS systems, integration is simpler — API-to-API communication between the PMS cloud and our AWS infrastructure, secured with mutual TLS and API key authentication.