The Stack I Use for a Brick and Mortar Business Website, Part 2

This is part two of the stack write-up. Part one covered the technical side: frontend, UI, content API, database, auth, uploads, deployment, and tooling. This post is about the operational decisions that sit on top of that stack: how the inquiry flow actually works, why I lean on CloudFront cache, the SLO and SLA I am willing to commit to, and the disaster recovery plan.

The shape of the project

The website is for a brick and mortar business that needs a proper web presence. The goal is quite practical:

Show catalog items, services, and offers clearly.
Let staff update content without touching code.
Let customers send an inquiry with enough details.
Generate a reference code so WhatsApp discussion is easier to track.
Keep the system small enough that I can operate it myself.

That last point is important. I do not want to run a big platform for a small business website. I want something that I can understand when it breaks.

Inquiry flow: form, database, WhatsApp

The inquiry form is the main conversion feature.

Customers can choose catalog item, offer, service, or custom request. They can fill in quantity, size, option details, deadline, urgency, delivery method, contact details, and notes.

When they submit, the API stores the inquiry and generates a reference code like REF-YYYY-XXXXXX. Then the frontend builds a WhatsApp message with the inquiry details and opens WhatsApp.

This is a small thing but very useful. The customer still talks through WhatsApp, which is normal for local business here, but the staff also gets a reference code and dashboard record. So the system does not force a new workflow. It improves the existing one.

The content API can also send Telegram notification for new inquiries, including customer details, reference code, department, description, and admin panel link.

Estimated cost

I think about cost the same way I think about reliability: small business numbers, not enterprise numbers. I run this project for one brick and mortar business, so the budget should fit a small monthly bill that I can pay myself without justifying it to a finance team.

Below is my rough estimate for steady state. These are USD prices in ap-southeast-1, and I round them to keep the table easy to read. Actual numbers depend on traffic, retention, and AWS price changes, so I check the bill once a month.

VPS for the content API. About USD 5 to 10 per month. A small VPS with 1 vCPU, 1 GB RAM, and 20 to 30 GB SSD is enough for Bun, SQLite, and Litestream. I treat this as a fixed cost.
AWS Lambda for the frontend (SST + TanStack Start). About USD 1 to 5 per month. With CloudFront in front, most requests are served at the edge. Lambda only runs for cache misses, SSR, and admin work.
CloudFront. About USD 1 to 3 per month for data transfer and requests at this scale. I do not provision a heavy origin shield unless traffic grows.
Route 53 hosted zone. About USD 0.50 per month. One hosted zone, small query volume.
S3 for Litestream backups. About USD 1 to 3 per month. A few GB of SQLite snapshots, infrequent restore requests, and standard storage class in the same region.
S3 for catalog and inquiry uploads. About USD 1 to 3 per month. A small number of images and documents, mostly read through CloudFront.
Cognito. Usually USD 0. The free tier covers the user pool size I have. I only pay if staff count grows past the free tier, which I do not expect.
WhatsApp handoff. USD 0. The customer opens WhatsApp directly with the prebuilt message. There is no WhatsApp Business API cost on my side.
Telegram notification. USD 0. A bot token sending to a private channel or group is free.

Add it up and the steady state cost is around USD 10 to 25 per month, with the VPS as the main fixed item. I round that to a working budget of about USD 30 per month so I have headroom for spikes, extra backup storage, or a small domain renewal.

There are a few things that can push the bill up, and I keep them in mind:

Traffic spike. If a campaign or viral post brings a lot of visitors, CloudFront data transfer and Lambda duration can grow. I do not fight that with bigger infrastructure, I let CloudFront absorb it.
Image storage growth. If staff keep uploading high resolution images without cleanup, the S3 bill creeps up. I can add a simple lifecycle rule later.
Backup retention. Litestream can keep a long history of snapshots. I keep the retention short by default and only extend it if the business asks for longer history.
Region drift. Everything is in ap-southeast-1. If I accidentally cross region for one resource, the bill grows. I keep the region consistent in SST and the Litestream config.

The exact cents will change. What matters is that the stack is small enough for me to estimate from memory, with traffic as the main variable. For a project of this size, I am happy with that shape.

Why CloudFront cache matters

This stack relies heavily on CloudFront cache, especially for public pages and assets.

For a brick and mortar business website, most visitors are reading. They open the home page, catalog pages, service pages, and maybe the inquiry form. They are not logging in. They are not doing heavy real-time work. So the best thing I can do is make the common path fast and cheap.

CloudFront helps with that:

static assets are served near the visitor
repeated public page requests do not always need to hit the origin
traffic spikes are absorbed at the edge
the API and VPS do less work for read-heavy pages
the site can still feel fast even if the origin is not very powerful

This is why I am okay with a small API server. The API should handle writes, admin work, and cache misses. It should not need to serve every public visitor from zero every time.

There is a trade off. Cached pages can be stale for a short while after staff update content. For this project, that is acceptable. If staff updates a description or offer, it does not need to appear globally within one second. I prefer cheaper and more reliable reads over instant cache invalidation for every small change.

For admin pages and inquiry submission, I do not rely on stale cache. Those flows need fresh API responses and proper writes. The caching is mainly for the public reading path.

Expected SLO and SLA targets

I think about reliability in two levels: what I aim for internally, and what I can honestly promise.

The internal SLO is:

public website availability: 99.9% monthly
inquiry submission availability: 99.5% monthly
admin availability: 99% monthly
public page p95 load time: under 2 seconds for cached pages
inquiry submission p95 response time: under 5 seconds
recovery point objective for database: under 5 minutes
recovery time objective for API restore: under 2 hours

Translated into normal language, it looks like this. I am using a 30-day month for the downtime budget.

Public website availability: 99.9%. About 43 minutes down per month. The public site can be down less than one lunch break per month.
Inquiry submission availability: 99.5%. About 3.6 hours down per month. The inquiry form has a bit more room because it depends on more pieces.
Admin availability: 99%. About 7.2 hours down per month. Staff tools can tolerate more downtime than the public website.
Public page p95 load time. 95% of cached page loads under 2 seconds. Most visitors should get a fast page when CloudFront has it cached.
Inquiry submission p95 response time. 95% of submissions under 5 seconds. Most customers should not feel stuck after pressing submit.
Database RPO. Lose at most about 5 minutes of data. If restore is needed, recent writes should be the only risky part.
API restore RTO. Restore service within 2 hours. If the API server dies, the target is same-day manual recovery.

These numbers are not enterprise numbers. They are small business numbers. The public website should almost always be available because CloudFront, S3, and Lambda are doing the heavy lifting. The inquiry flow has more moving parts: frontend, API, SQLite, S3 backup, WhatsApp, and sometimes Telegram notification. So I give it a slightly lower target.

For SLA, I would not promise 99.99%. I am one person operating this. There is no 24/7 ops team. The honest external promise is best effort, with a target around 99.5% monthly availability for the full dynamic system.

This distinction matters. SLO is what I design and monitor against. SLA is what I am willing to be accountable for. For this project, I can design for good reliability without pretending it is a bank.

Disaster recovery plan

The disaster recovery plan is simple because the system is simple.

For frontend failure:

Redeploy the last known good commit with SST.
If the latest deploy is bad, roll back to the previous commit and deploy again.
CloudFront can continue serving cached assets and pages while the origin is being fixed, depending on what failed.

For API server failure:

Provision a new small VPS.
Install Bun, SQLite, Litestream, and systemd service files.
Restore the SQLite database from Litestream/S3.
Pull or rsync the API code.
Start the service and point DNS or reverse proxy to the new server.

For database corruption or accidental bad write:

Stop the API.
Restore SQLite from the latest good Litestream snapshot or generation.
Run migrations if needed.
Start the API and verify key endpoints.

For S3 or upload issues, the website can still serve catalog content and accept inquiries that do not require file upload. Upload is useful, but it should not be the single point that makes the whole business unreachable.

The main weakness is manual restore. I need a documented restore drill and I should test it occasionally. Backup that is never restored is just hope. I am okay with manual recovery for now, but I should not pretend it is automatic failover.

What I like, what I am not happy with, and why this fits

The main thing I like is that each part has a clear job. TanStack Start renders the website and admin UI. Hono serves the API. SQLite stores the content. Drizzle gives schema and queries. Cognito handles staff identity. S3 stores files. Litestream backs up the database. SST deploys the frontend.

There is not much magic between them. The frontend calls VITE_CONTENT_API_URL. The API returns JSON. Admin calls include a Bearer token. Upload uses presigned URL. It is boring in a good way.

I also like that local development is fast. The frontend runs on port 3001. The content API runs on port 3000. With the dev login bypass, I can test admin features end to end without setting up a real user.

Some parts are still in progress though. The inquiry form works, but there is more conditional logic I can add later. The admin panel can edit the main content types, but add-ons, pricing ranges, turnaround times, and tags can be improved. Uploads work for images, but linking customer files to an inquiry can be better.

The infrastructure also has two worlds: SST for frontend, VPS/systemd for API. I am okay with that now, but it means deployment knowledge lives in two places. I need to document it properly so future me does not waste time.

The other trade off is cache invalidation. CloudFront makes public reads fast and resilient, but content updates may not be instantly visible everywhere. I can add targeted invalidation later, but I do not want to overbuild that before the business actually needs it.

SQLite also keeps the API simple, but it means I am choosing vertical scaling and backup discipline instead of managed database clustering. If the write workload grows a lot, this is the first part I would reconsider.

I think the theme is simple: I want boring pieces with TypeScript around them. I do not want to run Kubernetes for a small business website. I also do not want a no-code CMS where the moment I need custom inquiry logic, I start fighting the tool.

This stack gives me enough control:

React for the website and admin
Hono for a small API
SQLite for simple data
S3 for uploads
Cognito for login
WhatsApp for the actual sales conversation

It is not perfect, but I can reason about it. When something breaks, I know where to look. For this kind of project, that is more valuable than using the most popular stack.