Why We Chose AWS Bedrock Over OpenAI for Clinical Dental AI

Why We Chose AWS Bedrock Over OpenAI for Clinical Dental AI
Most dental AI tools default to OpenAI because it is the easiest button to press — the SDK is friendly, the docs are excellent, and a prototype is a weekend away. That choice looks defensible on a whiteboard right up until someone in legal asks for the Business Associate Agreement, the retention addendum, and the model-version pinning policy for a workload that will process thousands of PHI-laden clinical notes per day.
At NexV, the clinical AI infrastructure decision did not start with a model leaderboard. It started with a list of contractual and architectural guarantees we had to give every practice that handed us a patient record — and that list eliminated most options before any benchmark was run. What follows is the engineering rationale behind choosing AWS Bedrock over OpenAI's API for a HIPAA-grade dental AI platform processing real-time clinical decision support across practices and DSOs.
This is a specific capabilities argument for a specific constraint set — protected health information, sub-2-second latency ceilings, auditable model behavior, and compliance documentation that holds up under a third-party HIPAA audit. If you have read our coverage of dental AI data privacy and HIPAA compliance for AI systems, this is where theory becomes architecture.
Why did NexV choose AWS Bedrock over OpenAI?
NexV chose AWS Bedrock because it allows clinical dental AI to run entirely within a HIPAA-covered VPC under a single Business Associate Agreement, with zero data egress to third-party processors, multi-model flexibility for different clinical tasks, and deterministic model version pinning that prevents unannounced behavior changes in production clinical workflows.
Why the Default Choice Creates Compliance Risk
Most AI infrastructure discussions begin with an unstated assumption that the data being processed is either public or low-sensitivity. Dental clinical data is neither. A single patient interaction routinely generates clinical notes containing diagnoses, treatment histories, imaging findings, insurance identifiers, and demographics — all of which qualify as Protected Health Information under the HIPAA Privacy Rule and the HITECH Act.
Accordingly, the regulatory requirements are not theoretical. A dental AI vendor processing PHI without a BAA exposes every connected practice to HIPAA liability, and a vendor whose model provider retains training rights over clinical inputs creates a data-governance problem that no contract amendment can retroactively fix. These constraints filter the infrastructure options aggressively before anyone argues about model quality.
That said, a compliant architecture is a chain of guarantees across every hop the data takes, not a single checkbox. In practice, the weakest link is almost always the inference endpoint — that is where the richest clinical payloads leave the practice's trust boundary. Choosing the wrong endpoint silently undermines every other compliance control upstream and downstream of it.
What makes dental AI infrastructure different from general AI?
Dental AI processes protected health information on nearly every request — patient names, diagnoses, treatment plans, insurance identifiers — which demands HIPAA-compliant infrastructure with BAA coverage, encryption at rest and in transit, audit logging, and zero data retention by the model provider. General AI applications rarely face this combination of regulatory, latency, and data-residency constraints simultaneously.
What Bedrock Actually Gives You
AWS Bedrock is a managed service that provides API access to foundation models from multiple providers — Anthropic (Claude), Meta (Llama), Amazon (Titan), Cohere, and others — inside your existing AWS account, under the same BAA that already covers the rest of your HIPAA-eligible workload. The critical distinction, and the one that reshapes every downstream compliance conversation, is that Bedrock inference runs inside the AWS account boundary rather than on a third-party endpoint on the public internet.
Operationally, every inference request stays inside your trust boundary. No patient data traverses the public internet to reach a model provider's servers. No third party retains, logs, or trains on the clinical text you send. The model provider itself cannot see what you send, because the compute happens on AWS-managed infrastructure under the AWS BAA — not on infrastructure under the model maker's BAA, their terms, or their retention policy.
Bedrock Architecture Capabilities for Clinical AI
VPC-Native Deployment.
Bedrock endpoints are reachable via VPC PrivateLink, which means inference traffic never leaves your private network. Clinical notes travel from your application to the model and back without touching the public internet, which eliminates an entire category of data-exposure risks that external API calls introduce by definition.
Single BAA Coverage.
AWS offers a comprehensive BAA that covers Bedrock alongside the rest of your HIPAA-eligible AWS services. There is no separate BAA to negotiate with Anthropic, Meta, or any other model provider, because the AWS BAA already covers the entire inference path — which materially simplifies the compliance documentation that practices and DSOs have to produce when auditing their vendor chain.
Zero Data Retention.
Bedrock does not store input prompts or model outputs. There is no training on your data, no provider-side logging of clinical content, and no retention window — once inference completes, the only copy of that payload is the one you deliberately wrote to your own systems. This is a contractual guarantee under the AWS service terms rather than a policy that can quietly change with a terms-of-service update.
Multi-Model Access.
Bedrock exposes models from multiple providers through a single API, which lets us route Claude to clinical note generation and treatment-plan reasoning, specialized smaller models to structured data extraction, and Amazon Titan to embeddings — all within the same VPC, under the same BAA, with no additional vendor relationships, procurement cycles, or BAA negotiations to manage.
Where OpenAI Runs Out of Room
OpenAI builds excellent models. GPT-4 and its successors are capable across a wide range of tasks, clinical text processing among them, and we use them willingly in non-PHI contexts. The limitations we ran into are not about model quality; they are about the infrastructure and contractual framework wrapped around those models, and about how that framework interacts with a workload that has to defend itself in a HIPAA audit.
What are OpenAI's limitations for HIPAA-regulated dental AI?
OpenAI's primary limitations for clinical dental AI are architectural — data leaves your environment for inference on OpenAI-managed servers, BAA coverage requires a separate enterprise agreement with narrower scope than AWS, model versions change on a schedule that can shift clinical output formats without warning, and there is no VPC-native deployment option that keeps PHI inside your own network boundary.
| Criterion | AWS Bedrock | OpenAI API |
|---|---|---|
| Data residency | Your VPC, your region, your account | OpenAI infrastructure (US-based) |
| BAA coverage | AWS BAA covers Bedrock + all HIPAA-eligible services | Separate BAA required, enterprise tier only |
| Data retention | Zero retention by default | 30-day retention unless opted out via BAA |
| Model versioning | Pin exact model version indefinitely | Versions deprecated on schedule, silent updates |
| Network path | VPC PrivateLink (no public internet) | Public API endpoint (TLS encrypted) |
| Model selection | Claude, Llama, Titan, Cohere, others | GPT models only |
| Fine-tuning on dental data | Custom model training, data stays in your account | Fine-tuning available, data uploaded to OpenAI |
| Audit logging | CloudTrail + CloudWatch, native integration | API usage dashboard, limited granularity |
The model-versioning row deserves its own paragraph, because the implications are easy to miss until they break production. In clinical AI, output consistency is a regulatory artifact, not a nice-to-have. When a model version changes underneath you, note format can shift, treatment-plan structure can reorder, and extracted entities can drift in ways that are visually subtle and functionally catastrophic for downstream validators. Bedrock lets us pin an exact model version indefinitely and promote the next one only after it clears our clinical validation suite — precisely the control plane a clinical workload needs.
The HIPAA Architecture in Practice
HIPAA compliance is not a checkbox; it is an architecture. Our Bedrock deployment follows a defense-in-depth model with multiple layers of protection around patient data at every stage of the inference pipeline, and every boundary is either inside the VPC or explicitly covered by the AWS BAA.
HIPAA Compliance Layers
In practice, the AWS BAA covers the entire inference path — from API Gateway through the Lambda prompt builder to the Bedrock endpoint and back through the validation layer. There is no gap at which PHI exists outside a BAA-protected service, which is exactly the property an auditor looks for and exactly the property that a public-API model provider cannot offer by construction. For a deeper breakdown of how BAA coverage works across AI service chains, see our HIPAA compliance guide for dental AI.
Matching Models to Clinical Tasks
Different clinical tasks have different model requirements. Generating a treatment-plan narrative requires reasoning depth and long context. Extracting tooth numbers and CDT codes from a note requires precision, determinism, and sub-second latency. Embedding patient records for semantic search calls for a different architecture entirely. Bedrock lets us match models to tasks without managing multiple vendor relationships or threading PHI through multiple BAAs.
Our Model-to-Task Mapping
Claude (Anthropic) — Clinical Reasoning.
Claude handles tasks that require deep clinical reasoning: generating SOAP notes from ambient scribe transcripts, drafting treatment-plan narratives, explaining findings in patient-friendly language, and flagging contraindications against documented allergies and medications. Its long context window holds full patient history without truncation, and its instruction-following precision keeps output inside the structured formats our PMS integrations expect.
Amazon Titan — Embeddings and Semantic Search.
Titan Embeddings powers patient-record semantic search — when a clinician asks the system for similar cases to the patient in front of them, the retrieval layer returns records by clinical similarity rather than keyword match. Running embeddings on Titan inside the same VPC means vector representations of patient records never leave the AWS account boundary, which matters because embeddings, while compressed, are still PHI-adjacent.
Specialized Models — Structured Extraction.
For high-volume, low-latency tasks such as extracting tooth numbers, surface codes, and CDT codes from clinical text, we use smaller models fine-tuned on dental NLP extraction tasks. These models process an encounter note in under 200ms, which matters when a practice is running 80-plus encounters per day and the treatment-planning system needs structured input in real time.
With OpenAI, we would be limited to GPT variants across all of these tasks. Some workloads would be overserved by a frontier model, the cost curve would bend the wrong way, and embeddings would require a separate architecture because OpenAI embeddings run on OpenAI's infrastructure rather than inside our VPC. Moreover, a single-vendor stack eliminates the ability to route around a model-quality regression — and in clinical workloads, that routing headroom is its own form of uptime.
Cost at Dental-Practice Scale
Cost modeling for clinical AI is not a simple per-token calculation. The honest number has to account for full infrastructure — inference, data transfer, logging, encryption key management, compliance overhead — because any of those lines, ignored at planning, will later surface on a DSO's finance review and undercut the business case. We modeled costs for a mid-size DSO running 5,000 patient interactions per day across a representative mix of reasoning, extraction, and embedding tasks.
Monthly Cost Comparison: 5,000 Interactions/Day
| Cost Component | AWS Bedrock | OpenAI API |
|---|---|---|
| Inference (clinical notes) | $2,800 - $3,400 | $3,200 - $4,100 |
| Inference (structured extraction) | $600 - $900 | $1,100 - $1,600 |
| Embeddings | $180 - $250 | $120 - $180 |
| VPC / PrivateLink | $150 - $200 | N/A (public endpoint) |
| Compliance overhead (logging, KMS) | $300 - $400 | $100 - $200 (less granular) |
| Total estimated monthly | $4,030 - $5,150 | $4,520 - $6,080 |
Estimates based on an average note of 800-1,200 tokens input and 400-600 tokens output. Bedrock reflects Provisioned Throughput pricing; OpenAI reflects Enterprise-tier pricing with BAA.
The headline delta is modest — roughly 15% to 20% lower on Bedrock at this volume. After all, the savings on structured extraction are the most interesting line on that table, because we can route those requests to smaller, cheaper models for tasks that do not require frontier-model reasoning. The more important number the table cannot show is the cost of a compliance incident a public-internet inference path would expose you to — and that number is never modest.
Latency and Throughput Under Real Workloads
Clinical decision support is useless if it arrives after the clinician has already made the decision. Our latency budget is 2 seconds end-to-end for alerts and 5 seconds for full note generation. These are hard ceilings rather than soft targets — anything slower breaks the workflow, breaks clinician trust, and breaks the adoption curve that makes the platform economically viable.
What latency does clinical dental AI require?
Real-time clinical dental AI requires sub-2-second latency for decision-support alerts (contraindication warnings, missing-documentation flags) and sub-5-second latency for full clinical note generation. VPC-native deployment with Provisioned Throughput on AWS Bedrock delivers P95 latencies of 1.1 seconds for alerts and 3.8 seconds for note generation, compared to 1.4 seconds and 4.6 seconds through external API endpoints under comparable load.
Accordingly, Bedrock's Provisioned Throughput option earns its keep by guaranteeing dedicated compute capacity, which eliminates the latency spikes that shared endpoints exhibit at peak. When 200 practices finish morning huddles at 8:45 AM and simultaneously pull AI-generated day summaries, we cannot afford queuing delays, and a public multi-tenant endpoint cannot make that guarantee. Provisioned Throughput costs more per token, but it buys the consistency clinical workflows require — and in production, retry-avoidance alone more than pays for the uplift.
Fine-Tuning for Dental-Specific Terminology
Foundation models understand general medical language well, but dental terminology sits in a specialized corner of that space. CDT codes, tooth-numbering systems (Universal, Palmer, FDI), surface designations, periodontal classifications, and the abbreviation-heavy shorthand clinicians use in real notes all require domain adaptation before extraction accuracy clears the bar downstream validators expect.
In practice, Bedrock's custom model training keeps fine-tuning data inside the AWS account. The training dataset — de-identified dental clinical notes, annotated with entity labels and code mappings — never leaves the VPC. The resulting custom model is private to our account and cannot be accessed by other Bedrock customers or by AWS itself. This matters because dental training data is scarce and proprietary; routing it through a third-party fine-tuning pipeline would collapse the moat before the model ever served a request.
Fine-Tuning Impact on Dental Tasks
Fine-tuning on 12,000 de-identified dental clinical notes improved extraction accuracy by 12 to 20 percentage points across dental tasks, with the largest gains in periodontal classification where the base model frequently confused staging and grading terminology.
The Production Architecture We Actually Run
Diagrams on a slide deck can make any architecture look defensible. Implementation is where dental AI infrastructure either earns its compliance posture or quietly compromises it. Here is the actual production architecture we run for NexV's clinical AI platform, stage by stage, with the BAA boundary intact at every hop.
NexV Production Architecture
Step 1 — Ingestion Layer.
Clinical data enters through API Gateway with mutual TLS authentication. Each request is validated against the practice's API key and rate limits, and the payload — clinical note text, patient context, task type — is encrypted and placed on an SQS queue. Real-time decision-support requests bypass the queue and hit Lambda directly to preserve the sub-2-second latency budget.
Step 2 — Prompt Construction.
A Lambda function builds the prompt by combining clinical input with the system prompt for the task (note generation, code extraction, treatment planning, or clinical alert). Patient-history context is pulled from DynamoDB inside the VPC, and the full prompt is assembled inside the VPC boundary — no PHI touches any service outside the BAA-covered path at any point.
Step 3 — Bedrock Inference.
The prompt is sent to the Bedrock endpoint via VPC PrivateLink. The model — Claude for reasoning tasks, our fine-tuned specialized model for extraction tasks — processes the request and returns the output. Provisioned Throughput guarantees consistent latency regardless of platform-wide demand, and Bedrock Guardrails filter the output for obvious PII leaks or unsafe content before it hits the validation layer.
Step 4 — Post-Processing and Validation.
Model output passes through a validation layer that checks clinical consistency — are extracted tooth numbers valid (1-32 permanent, A-T primary)? Do CDT codes match documented surfaces? Is the treatment plan internally consistent with charted history? Failures trigger a retry or route the output to human review rather than letting clinically invalid content reach the provider's screen.
Step 5 — Delivery and Audit.
Validated output is returned to the practice's system and simultaneously logged to an encrypted S3 bucket for audit. CloudTrail captures every Bedrock invocation with timestamp, model version, token count, and request metadata. Clinical content is stored separately in the encrypted audit log, accessible only to authorized compliance personnel under IAM role-based access controls.
What Six Months in Production Has Taught Us
We have been running this architecture in production since October 2025. Six months at scale has surfaced a handful of lessons that no pre-deployment planning round could have predicted, because production clinical workloads break in ways that only production clinical workloads can break.
What lessons did NexV learn running Bedrock in production?
After six months processing clinical dental data at scale on AWS Bedrock, the key operational lessons are that model-version pinning is essential for clinical output stability, Provisioned Throughput eliminates the latency variability that breaks real-time workflows, prompt engineering outperforms fine-tuning for most clinical reasoning tasks, and the validation layer catches more clinically meaningful errors than the model itself introduces.
Production Lessons
Lesson 1 — Model Version Pinning Is Non-Negotiable.
A Claude model update once changed how the model formatted periodontal pocket-depth tables and broke our downstream parser in a way that was visually subtle and functionally catastrophic. Version pinning on Bedrock means we test every new version against our clinical validation suite before promoting it, and we run it in shadow mode for 72 hours against the pinned version before cutting the live workload over.
Lesson 2 — Provisioned Throughput Pays for Itself.
On-demand Bedrock pricing is cheaper per token, but P99 latency spikes in the morning-rush window (8:00-9:30 AM) produced timeouts that forced retries and effectively doubled cost-per-successful-request. Switching to Provisioned Throughput raised base cost 30 percent and eliminated retries, producing a net 12 percent reduction and a dramatically better clinician experience in the busiest hour of the day.
Lesson 3 — The Validation Layer Is the Product.
Our validation layer catches 3.2 percent of model outputs that contain clinical inconsistencies — an extracted CDT code mismatched to surface count, a tooth number outside the valid range, a plan referencing an already-extracted tooth, or a recommendation contradicting an active allergy. These catches matter more than raw model-accuracy improvements, because they prevent clinically meaningful errors from ever reaching the provider's screen.
Lesson 4 — Multi-Model Routing Reduces Cost Without Sacrificing Quality.
Not every request needs a frontier model. Routing simple extraction to smaller models and reserving Claude for complex reasoning reduced average inference cost 38 percent while maintaining accuracy benchmarks. The logic is straightforward — task type determines model — but it only works when you have multiple models inside the same compliant infrastructure, which a single-vendor API structurally cannot offer.
Lesson 5 — Prompt Engineering Outperforms Fine-Tuning for Reasoning.
Fine-tuning improved extraction accuracy, but for clinical reasoning tasks — treatment planning, contraindication detection, patient communication — well-crafted prompts with clinical few-shot examples outperformed fine-tuned models at a fraction of the operational overhead. We maintain a prompt library of 40-plus clinical task templates, version-controlled and regression-tested, and prompt updates ship in hours rather than the days a retraining cycle requires.
When OpenAI Is the Right Call
Engineering decisions are context-dependent. OpenAI's API is the better choice for dental AI applications that do not process PHI — marketing content generation, patient-education materials, or practice-management analytics on fully de-identified data. The API is simpler to integrate, the documentation is excellent, and the architectural friction of Bedrock would be wasted on those workloads.
Furthermore, for prototyping and research, OpenAI's playground and fine-tuning interface have a lower barrier to entry than Bedrock's setup, which requires AWS expertise on the team. For a solo developer exploring dental AI concepts before a production system, starting with OpenAI and migrating to Bedrock once you are ready to handle PHI at scale is a reasonable sequence — provided the migration plan is written down on day one, not the day before the first paying practice comes online.
What This Means for Practices Evaluating Dental AI
If your dental AI application processes PHI — and any application touching clinical notes, patient records, or treatment plans does, regardless of vendor positioning — the infrastructure decision should start with compliance and work backward to model selection. Bedrock's architecture makes compliance the default rather than an afterthought, which happens to make the vendor-evaluation exercise easier.
After all, the model-quality gap between providers has narrowed to the point where infrastructure, compliance, and operational characteristics matter more than benchmark scores at real clinical scale. The best-scoring model is the wrong choice if it cannot run inside a HIPAA-compliant architecture your legal team can defend in an audit — and it becomes the very wrong choice the moment a breach investigation begins and somebody has to reconstruct, request-by-request, where the PHI actually went.
To see how this architecture translates into a clinical workflow for your practice or DSO — from ambient capture through treatment planning to claims — book a technical demo. We walk through the actual infrastructure, not a slide deck. Explore the full platform capabilities or review pricing for your practice size.