Lift Completed, Now Shift: Making the Insurance Hub Kubernetes-Native (Enough)
The Insurance Hub migration continues. Following Phase 1’s “lift” —provisioning clusters and infrastructure—we are now focusing on the “shift” required to make the legacy system run on this new foundation. This article summarizes the targeted changes needed to move our existing Java services into Kubernetes.
This “shift” is intentionally pragmatic: services are not being rewritten yet, but the biggest blockers preventing the legacy Java stack from behaving like a well-mannered Kubernetes workload are being removed. This begins with storage and state. Services currently writing to the local filesystem—or hiding blobs in the database—need to speak S3 instead, using a unified, S3-compatible SDK pointed at the MinIO tenants. In parallel, the convenience of in-memory H2 will be replaced with real PostgreSQL instances running in the cluster, ensuring that each service can be validated with the same persistence model it will use beyond a developer laptop.
With those foundations in place, the rest of the work will be focused on making the platform
operationally boring (the highest compliment in infrastructure). Container images will be validated
and tightened, each microservice will be deployed with a repeatable Kustomize base plus environment
overlays, and Consul will be decommissioned in favor of Kubernetes-native service discovery (
internal DNS). Finally, the existing agent-portal-gateway service will be brought into the cluster
as the initial entry point, and its routes will be wired to the new service endpoints. The guiding
principle throughout is “minimal, configuration-driven change”: everything should be reproducible
via Make targets and verifiable locally by running services from IntelliJ while they interact with
MinIO/Postgres/MongoDB/Kafka from the local-dev Kind-based cluster, and verifiably exercised
through a properly functioning application in both the local-dev and qa Kubernetes environments.
Shift Scoping: Prioritizing Deliverables and Networking Surface
While the previous “lift” phase was characterized as infra-first and largely app-agnostic, the “shift” marks the transition to an app-first, infra-consuming approach. The foundational cluster work is complete; now, the legacy Java microservices must be modified to utilize these new cloud-native resources. This requires a deliberate move away from local-only shortcuts—such as in-memory databases and local file writes—toward durable, cluster-managed services.
To maintain momentum without introducing excessive instability, the shift has been organized into four primary deliverables. I have prioritized storage and state first, followed by networking and service discovery, ensuring that the most invasive changes are validated before the final application cutover.
| Ticket | Deliverable | Description |
|---|---|---|
| 7 | MinIO/S3 Integration | Modify payment-service and documents-service to use S3-compatible SDKs for file operations against MinIO. |
| 8 | PostgreSQL External Persistence | Transition services from in-memory H2 to dedicated PostgreSQL clusters running in the Kubernetes environments. |
| 9 | Gateway in K8s | Deploy the agent-portal-gateway as the initial ingress point, establishing the external routing boundary. |
| 10 | Per-service deploy + Consul removal | Decommission Consul; migrate inter-service communication to Kubernetes-native DNS and finalize per-service Kustomize deployments. |
The rationale for this risk-ordering is pragmatic: storage and database connectivity represent the highest points of failure. By resolving these first, the backend services can be validated as stable workloads. Once established, inter-service discovery and gateway routing are configured to wire the system together into a functional mesh.
Supporting a productive local development loop while interacting with a remote or Kind-based cluster
requires careful planning of the networking surface. Since all Micronaut-based Java services default
to port 8080, running multiple services locally via an IDE would inevitably lead to port
exhaustion and conflicts.
To mitigate this, a deterministic port assignment table was established. These mappings are
implemented through application-local.yml configuration files, ensuring each service has a
non-conflicting home on the developer’s localhost.
| Service | Local Port | Service | Local Port |
|---|---|---|---|
auth-service |
8081 | payment-service |
8086 |
chat-service |
8082 | pricing-service |
8087 |
dashboard-service |
8083 | product-service |
8088 |
document-service |
8084 | policy-search-service |
8089 |
policy-service |
8085 | agent-portal-gateway |
8090 |
A similar logic was applied to the stateful components running inside the cluster. Because a “dedicated cluster per microservice” approach was chosen for PostgreSQL, unique local ports must be exposed to allow IDE-based services to reach their respective data stores.
| Postgres Service | Cluster Port Mapping |
|---|---|
svc/local-dev-postgres-auth-rw |
5432 → localhost:5442 |
svc/local-dev-postgres-document-rw |
5432 → localhost:5452 |
svc/local-dev-postgres-payment-rw |
5432 → localhost:5462 |
svc/local-dev-postgres-policy-rw |
5432 → localhost:5472 |
svc/local-dev-postgres-pricing-rw |
5432 → localhost:5482 |
Finally, since two MinIO tenants were used (document and payment), local development MinIO endpoints were also assigned non-overlapping ports so that each service could be tested against its tenant deterministically during the S3 migration work.
| MinIO Tenant Service | Cluster Port Mapping |
|---|---|
svc/local-dev-minio-document-hl |
9000 → localhost:9001 |
svc/local-dev-minio-payment-hl |
9000 → localhost:9002 |
This upfront organization ensures that the environment’s infrastructure connectivity remains transparent, and the focus stays on the code modifications required for the shift.
Pragmatic Persistence: Adapting Storage for S3 and MinIO
The transition of stateful workloads began
with Ticket 7,
which centered on moving the storage layer from local filesystems and database blobs to
S3-compatible object storage. I targeted the documents-service and payment-service first, as
they represented the most significant dependencies on local persistence. Before applying any code
changes, I needed a stable local development loop; I added application-local.yml configurations to
ensure services could run in IntelliJ while interacting with the MinIO tenants in the local-dev
Kind cluster.
While Consul decommissioning was slated for a later phase, initial testing revealed an immediate
bottleneck. The micronaut-discovery-client was triggering startup failures by searching for a
non-existent server. I made the pragmatic choice to prune these dependencies immediately to unblock
the S3 refactoring.
Provisioning the MinIO tenants required more than just creating buckets; I chose to strictly adhere
to MinIO best practices to ensure security and scalability. This involved implementing
purpose-driven naming conventions, granular bucket sharding, and enforcing the principle of least
privilege through dedicated IAM policies. While these configurations can be managed via the MinIO
Console UI, doing so manually across local-dev and qa environments is error-prone and
inconsistent. To ensure deployments remained reproducible and “boring,” I automated the creation of
buckets, service users, and Kubernetes secrets via
new Makefile targets:
minio-svc-bucket-create- Provisions a purpose-specific bucket for a microservice.minio-svc-user-secret-create- Generates the Opaque Kubernetes secret for service credentials.minio-svc-user-with-policy-create- Creates the MinIO user and attaches the necessary S3 IAM policy.
In the Kotlin-based documents-service, I deliberately chose an architectural approach to avoid a
database migration. To keep the PolicyDocument
class unchanged, I adapted the bytes field to serve a dual purpose: storing raw PDF binary data
for legacy records or storing the UTF-8 encoded MinIO object key for new documents. I encapsulated
this logic in a new PolicyDocumentService
that uses a regex-based pattern check to determine whether to return the database bytes directly or
fetch the content from S3 via the MinioClient.
The Java-based payment-service underwent a similar overhaul. I refactored the InPaymentRegistrationService to replace legacy java.io logic with S3 operations, using the MinIO SDK to verify the existence of the bank statement and stream the CSV content for processing. After a successful import, files are now marked as processed by copying them to a “processed” prefix within the bucket. I also simplified the BankStatementFile class, stripping out file-manipulation logic to focus solely on object key construction. Both services were then integrated with a MinioClientFactory for client lifecycle management, completing the shift toward a storage-agnostic architecture. For a closer look at the diffs and testing evidence, consult the associated pull request.
Externalizing Relational Layer: From In-Memory H2 to Managed PostgreSQL
The transition from in-memory persistence to cluster-managed data stores, as outlined
in Ticket 8,
required a systematic overhaul of how the legacy Java services bootstrap their database connections.
I implemented a consistent architectural pattern across the auth, documents, payment,
policy, and pricing services by introducing a PostgresDatasourceConfigurationListener
and a corresponding PostgresDatasourceProperties
class. These components leverage Micronaut’s BeanCreatedEventListener to intercept the
DatasourceConfiguration at runtime—dynamically building the JDBC URL while supporting configurable
SSL modes and environment-specific credentials.
The configuration layer was split to support both cloud-native portability and a friction-free local
developer experience. In the main application.yml
files, I replaced hardcoded H2 connection strings with environment variable placeholders such as
${PG_HOST} and ${PG_PORT}. To allow Micronaut to still bootstrap the DataSource and JPA beans,
I kept a minimal default configuration that the listener subsequently overrides. Meanwhile, the new
application-local.yml
files define the specific port sharding mentioned earlier—mapping each service to its respective
local port (e.g., 8081 for auth, 8087 for pricing) and its dedicated PostgreSQL forward.
In the auth-service, I made a technical trade-off by refactoring the InsuranceAgent
component from a Java record to a standard @MappedEntity class using Lombok’s @Data and
@NoArgsConstructor. While records provide a clean syntax, the move to a persistent PostgreSQL
schema necessitated a more traditional entity structure to ensure full compatibility with Micronaut
Data JDBC and JPA mapping. I also implemented an explicit availableProductCodes helper to handle
serialization of semicolon-separated strings stored in the database, and updated the
InsuranceAgentsRepository
to remove its hardcoded H2 dialect reference.
The Kotlin-based documents-service saw similar adjustments to accommodate the relational storage
of binary objects. I updated the PolicyDocument
entity to include the @Lob annotation and a bytea column definition, ensuring that document
content is handled correctly by the PostgreSQL driver. Across all services, I updated the pom.xml
files to include the postgresql driver and micronaut-management dependencies. Crucially, I also
removed the micronaut-discovery-client dependencies; even though service discovery was a later
deliverable, the transition to external persistence forced an early decommission of the Consul-based
discovery logic to prevent startup failures. For a deep dive into the code changes, see
the associated pull request.
Deployment Readiness: Establishing Service Conventions
While the preceding “lift” work focused on standing up the cluster and its stateful dependencies,
the focus now shifts toward the deployment pipeline for the services themselves. When initially
planning the transition for the 11 legacy services, I considered creating individual tickets for
each to track granular progress. However, from a project management perspective, this threatened to
pollute the board with low-signal updates. I chose instead to consolidate the effort into two
primary deliverables: a dedicated ticket for the
agent-portal-gateway Ticket 9—given
its role as the critical entry point and architectural anchor—and a grouping ticket for all
remaining services Ticket 10.
This allowed me to establish patterns on the gateway first, then apply those validated recipes to
the rest of the stack via individual pull requests.
For the Insurance Hub, a standardized implementation sequence was established to ensure consistency.
Each service follows a five-step lifecycle: validating or updating the Docker image, building
locally, loading the image into the cluster, implementing Kustomize manifests, and finally
triggering the deployment. I adopted a semantic naming convention for these images:
insurance-hub-<service-name>-<component-type>-legacy:<tag>. By sorting by service name first
(e.g., insurance-hub-policy-api-legacy rather than insurance-hub-api-policy-legacy), the registry
remains organized and searchable. Initially, until a full Flux-based GitOps workflow is implemented,
all image tags default to latest to simplify local iteration.
The transition to Kubernetes-native manifests required a rigorous adherence to best practices to avoid the “configuration drift” common in manual deployments. I established several mandatory patterns for Kustomize base and overlay manifests:
- Metadata and Selectors: I standardized on the
app.kubernetes.iolabel set. Labels likeapp.kubernetes.io/name,component, andpart-ofare used consistently across Deployments and Services. I ensured that Deployment selectors exactly match the pod template labels to prevent orphaned pods—a common issue when manually editing manifests. - Networking and Probes: A “Service on 80, App on 8080” pattern was implemented, where the
Service
port: 80maps to atargetPort: 8080. Every container now explicitly declares itscontainerPort: 8080and utilizes the Micronaut/healthendpoint for liveness and readiness probes, with a measuredinitialDelaySecondsof 30–50s to account for JVM startup overhead. - Environment Configuration: Sensitive data is strictly externalized via Kubernetes Secrets. I
used Micronaut’s
MICRONAUT_ENVIRONMENTSenvironment variable to trigger profile-specific behavior (e.g.,local-dev), while environment-specific ConfigMaps are merged to override defaults.
Looking ahead to the Go migration, I faced a structural challenge: how to run Java and Go versions
of the same service side-by-side without creating a mess of manifests. I chose a directory structure
that separates these implementations at both the Kustomize base and overlay levels. For the auth
service, the legacy Java manifests reside in k8s/apps/svc/auth/base/legacy/, while the new Go
implementation occupies the parent k8s/apps/svc/auth/base/.
This layout enables a pragmatic “Strangler Fig” approach; I can deploy either version independently or both simultaneously by targeting the specific overlay. When the legacy gateway is eventually replaced by Envoy Proxy, traffic routing between these versions will be managed via Ingress rules rather than invasive manifest changes, ensuring the underlying infrastructure remains stable and operationally transparent.
Automated Orchestration: Standardizing Service Lifecycle
Deployments remain reliable only when they are automated end-to-end. To automate the five-step lifecycle mentioned above and to enable repeatable builds for Java services, a set of Makefile Docker targets was created. These handle both individual service builds and bulk operations across the entire legacy stack:
docker-java-svc-build– Builds a Docker image for a specific Java service.docker-java-svc-all-build– Builds Docker images for all Java services.docker-frontend-build– Builds the Vue frontend image.
To streamline service deployment across Kind (local-dev) and LXD/K3s (qa) environments, a
dedicated set of Makefile targets was created for loading and unloading service images and
orchestrating Kustomize deployments:
svc-image-local-dev-load– Loads a locally built image into thelocal-devKind cluster.svc-image-local-dev-unload– Removes an image from Kind cluster nodes.svc-image-qa-load– Loads an image into QA LXD cluster nodes.svc-deploy– Deploys a service via Kustomize overlay.svc-delete– Purges a service deployment.
Implementing a multi-namespace architecture for the Insurance Hub—where infrastructure components
like PostgreSQL and MinIO reside in qa-data while services occupy qa-svc—introduced a specific
challenge: secret accessibility. During the infrastructure deployment phase, service-specific
credentials, such as MinIO access keys or Postgres user secrets, are generated within the
infrastructure’s own namespace to keep them local to the resource they protect. However, Kubernetes
strictly enforces namespace boundaries, preventing a Deployment in qa-svc from directly mounting a
Secret stored in qa-data. I had to decide whether to implement complex RBAC for cross-namespace
ServiceAccount access or adopt a more pragmatic, portable solution.
To bypass namespace boundaries without complex RBAC, I implemented “copy-on-create” and
“copy-after-create” patterns in the Make targets. For example, the minio-svc-user-secret-create
target pipes the generated YAML through sed to re-apply it from qa-data to qa-svc. It is a
portable, low-overhead solution for secret sharing.
Before starting on implementing the deployment of the agent-portal-gateway, I established the
migration sequence. The migration should start with core auth and edge, then move outward to
lower‑risk or less-central services, wired via Kustomize overlays for local-dev and qa.
Gateway & Auth
| Seq. | Service | Description |
|---|---|---|
| 1 | agent-portal-gateway |
Single entry point to backend services; enables early routing smoke tests. |
| 2 | auth-service |
Required for most flows; external clients depend on it. |
| 3 | web-vue (frontend) |
Validates the full browser → gateway → auth path once core edge/auth are stable. |
Supporting Services
| Seq. | Service | Description |
|---|---|---|
| 4 | document-service |
Relatively isolated; not critical to core policy/payment flows. |
| 5 | product-service |
Provides reference data; used by others but off the main transaction path initially. |
| 6 | policy-search-service |
Read-only over Elasticsearch, no critical writes. |
| 7 | dashboard-service |
Primarily read-heavy; safe once upstreams (policy/product/search) are in place. |
| 8 | chat-service |
Can be validated independently (WebSocket + API). |
Core Transactional
| Seq. | Service | Description |
|---|---|---|
| 9 | policy-service |
Central domain logic; depends on product, document, and search. |
| 10 | payment-service |
Financially critical; should follow a stable policy path and upstreams. |
| 11 | pricing-service |
Critical but narrower; depends on product/policy and can be proven via internal APIs first. |
With the decommissioning of Consul, service discovery has transitioned to a Kubernetes-native model,
leveraging the cluster’s internal DNS (CoreDNS). In this architecture, manual registration is
replaced by deterministic FQDNs following the standard <service>.<namespace>.svc.cluster.local
pattern. This shift allows services to resolve their dependencies without an external agent; for
instance, the payment-service in the local-dev environment now reaches its database at
local-dev-postgres-payment-rw.local-dev-all.svc.cluster.local and its Kafka bootstrap server at
local-dev-kafka-kafka-bootstrap.local-dev-all.svc.cluster.local.
I chose to implement these mappings through Kustomize configMapGenerator literals, which inject
the correct DNS endpoints directly into the services’ environment variables. By using these
cluster-internal records, I’ve eliminated the overhead of managing a separate discovery sidecar
while gaining the reliability of Kubernetes’ built-in service abstractions. This approach is not
only more robust but also operationally cleaner, as a simple kubectl get svc -A combined with a
bit of awk formatting provides a complete, real-time map of the cluster’s internal routing
surface. For example, auth-service related internal DNS:
kubectl get svc -A --no-headers \
| awk '{printf "%s.%s.svc.cluster.local\n", $2, $1}' \
| grep auth
local-dev-auth-api-legacy.local-dev-all.svc.cluster.local
local-dev-postgres-auth-r.local-dev-all.svc.cluster.local
local-dev-postgres-auth-ro.local-dev-all.svc.cluster.local
local-dev-postgres-auth-rw.local-dev-all.svc.cluster.local
The technical “recipe” for a service deployment is best exemplified by the document-service
configuration in the QA environment. Within the k8s/overlays/qa/svc/document/legacy
folder, the kustomization.yaml
acts as the primary orchestrator. I configured it to target the qa-svc namespace and apply a qa-
name prefix and -legacy suffix, ensuring that these resources are distinct from the future Go
implementation. The accompanying deployment-patch.yaml
provides the environment-specific “glue”—it defines the resource requests and limits tailored for
the K3s cluster and mounts the necessary PostgreSQL and MinIO secrets. These secrets are accessed
locally within the qa-svc namespace, leveraging the “copy-on-create” Makefile automation to
maintain strict namespace isolation for the underlying data stores.
External dependency wiring is handled by a configMapGenerator in the overlay, which injects the
internal DNS FQDNs directly into the service’s environment. For the document-service, this
involves mapping variables like MINIO_TENANT_ENDPOINT and PG_HOST to their respective
cluster-internal records, such as qa-postgres-document-rw.qa-data.svc.cluster.local. I also used
this layer to manage environment-specific toggles, such as enabling Zipkin tracing and defining the
JSREPORT_HOST endpoint. By centralizing these connection strings and resource adjustments in the
overlay, the base manifests remain generic and portable, allowing the deployment logic to scale
across environments without invasive changes to the core application templates.
Edge and Logic: Bridging the Kubernetes Gap
While the transition to S3 and PostgreSQL addressed the stateful requirements of the “shift,” the final hurdles involved the services’ networking and runtime behavior. Moving from the flat network of Docker Compose to the segmented, DNS-heavy environment of Kind and K3s surfaced several edge cases where the legacy configuration didn’t translate.
Standardizing NGINX Path Matching and Resolution
In the web-vue module, the UI itself remained healthy, but the navigation flow broke down under
Kubernetes networking. Redirects triggered during authentication—and certain API calls routed
through the gateway—failed to land the browser back on the expected routes. The root cause was NGINX
behaving subtly differently in a containerized environment compared to the simpler, earlier setup,
particularly around URI matching and upstream resolution when using variables in proxy_pass. In
the original configuration, proxy locations were defined broadly as /api and /login. While this
looked sufficient, it left enough ambiguity for edge cases—like trailing slash handling and route
precedence—to bounce the browser to the wrong paths or trigger unexpected fallbacks to the SPA index
route.
I tightened the configuration to make routing deterministic. First, I made the proxy locations
prefix-explicit and assigned them higher priority via location ^~ /api/ and location ^~ /login.
This prevents accidental interactions with the generic SPA handler and removes the path ambiguity
that often causes upstreams to respond with unexpected redirects. Second, I introduced a
Kubernetes-aware DNS strategy using resolver kube-dns... valid=5s combined with
set $api_upstream. This forces NGINX to reliably resolve service DNS names when they are sourced
from variables—a common “gotcha” where variable-based proxy_pass fails to re-resolve without an
explicit resolver, manifesting as flaky routing.
Finally, I addressed a related UX issue by adding a map for $connection_upgrade and a dedicated
^~ /ws/chat/ location. Including the necessary WebSocket headers fixes the handshake failures that
typically occur when real-time features move behind an NGINX ingress. Collectively, these changes
transformed the UI into a stable edge gateway with correct protocol handling and path matching,
which is exactly what a consistent login flow requires once running behind Kubernetes. For more
details, see nginx-app.conf.
Mapping Micronaut Clients to Cluster DNS
After moving the legacy Micronaut 2 agent-portal-gateway into the cluster, routing failures
initially appeared to be a network-level issue. The product-service seemed unreachable from the
gateway, even though direct in-cluster connectivity tests succeeded. This pattern eventually
surfaced across all downstreams—document, policy, and policy-search. The clue was that while
Kubernetes DNS worked, the gateway couldn’t resolve logical service identifiers into actual URLs.
The root cause lay in how Micronaut’s declarative HTTP clients—configured with
@Client(id = "<service-id>")—behave. In Docker Compose, service discovery often “just happens” via
hostnames, but in Micronaut 2, the id is a logical name that requires either a discovery client or
an explicit URL mapping.
Initially, I tested the Micronaut Kubernetes discovery client. However, it required extensive RBAC
permissions to watch cluster resources, introducing more complexity than it solved. I switched to
explicit service URL mappings in application-local-dev.yml
and application-qa.yml
by defining micronaut.http.services entries along with activating a dedicated Micronaut
environment profile via MICRONAUT_ENVIRONMENTS in the gateway deployment for the
local-dev
and qa.
It is a “dumb-but-reliable” approach that made routing deterministic without security overhead.
Tuning JVM Footprint and DNS Fallbacks
While validating services in the local-dev Kind cluster, I observed an unsettling pattern: healthy
pods would restart every 10–15 minutes. The deployments had modest memory limits—typically 512
MiB—but metrics showed the Java processes steadily climbing toward 1 GiB before being terminated.
Kubernetes was performing routine OOM enforcement, but the “mystery” was why the memory usage was so
aggressive.
I had fallen into a classic “lift-and-shift” trap: I reused the legacy Dockerfiles unchanged,
running the JVM without container-aware guardrails. A plain java -jar command is dangerous in
Kubernetes—it allows the JVM to size its heap based on host memory rather than pod limits. I
modernized the runtime by switching to the eclipse-temurin:17-jre-alpine base image and setting
container-aware JVM options, specifically -XX:+UseContainerSupport and -XX:MaxRAMPercentagewhich
immediatly stabilized memory consumption. As a bonus, the move to Temurin 17 significantly reduced
idle footprint—dropping typical steady-state usage from roughly 300–350 MiB down to 150–200 MiB,
which made the services far more compatible with “developer laptop” cluster constraints.
However, the Alpine-based Temurin 17 image wasn’t a universal win. For the agent-portal-gateway
and policy-service, I noticed IPv6 fallback timeouts of 30–45 seconds on outbound calls. Alpine’s
musl libc resolver tends to prefer IPv6 and only falls back to IPv4 after a timeout if the AAAA
path isn’t satisfied. In these specific cases, I chose to retain the older
adoptopenjdk/openjdk14:jre-14.0.2_12-alpine base image. Its resolver behavior avoided the fallback
delay and restored instant connectivity. The result was a service-by-service compromise: Temurin 17
for memory stability where possible, and the legacy base image where musl DNS behavior proved to be
a latency trap.
Visibility and Automation: Closing the Operational Gap
To facilitate proper application functionality and debugging, I made several targeted improvements to visibility and deployment logic. Initially, the legacy services were “black boxes” regarding their internal events and external interactions; however, after testing the system in Kind, it became clear that moving to a distributed environment required far more explicit telemetry.
To improve operational transparency, I added a LoggingFilter
to the agent-portal-gateway. Implementing this as a Micronaut HttpServerFilter allows for
non-invasive interception of all traffic passing through the gateway. By logging the HTTP method and
URI for every incoming request—and the resulting status code for every outgoing response—it provides
a clear audit trail of how the UI interacts with the backend. This was particularly useful for
debugging the NGINX redirect issues mentioned earlier; having a high-level view of every
request/response cycle within the cluster helped verify that the gateway was indeed receiving and
correctly forwarding traffic to the intended downstreams. To avoid polluting logs, the filter
explicitly ignores /health probe requests.
I extended this strategy to the asynchronous layer by addressing the lack of visibility into Kafka
message publishing. In the legacy policy-service, domain events were being emitted without any
trace in the logs, making it nearly impossible to verify if a policy update had actually been
dispatched. I chose a non-intrusive approach by implementing an
EventPublisherLoggingInterceptor
using Micronaut AOP. By creating this cross-cutting interceptor for methods annotated with
@LogEventPublisher, I was able to centrally capture the specific policy identifier and event type
for every outbound message. This ensures that every event—including any failures—is logged, allowing
me to confirm that messages are reaching Kafka before troubleshooting downstream consumers.
Beyond telemetry, I addressed the manual, out-of-band step of adding PDF templates to the jsreport
instance. Managing these templates manually is a fragile requirement when handling multiple
ephemeral environments. To ensure the deployment was fully automated, I added a
JsReportTemplateProvisioner
to the document-service. This Singleton component hooks into the Micronaut ServerStartupEvent
and interacts with the jsreport API to validate the presence of the “POLICY” template. If missing,
the provisioner automatically loads the definition from a local resource and creates it. This makes
the document-service self-bootstrapping and environment-agnostic, ensuring it functions
immediately upon deployment without manual UI configuration.
To tie the “shift” together, I transitioned from manual commands to a fully orchestrated deployment workflow. Managing 11 legacy services and a dozen stateful components is inherently fragile without a deterministic sequence. I consolidated the logic into high-level Makefile targets that handle specific dependencies—ensuring operators are reconciled before data stores are provisioned, and the observability is live before services begin emitting telemetry.
In the QA environment, I prioritized using LXD-based snapshot targets at critical milestones. By capturing the state of the cluster nodes after standing up the monitoring and infrastructure layers, I’ve facilitated a nearly “instant” rollback capability. This effectively makes the cluster disposable; if a service deployment or configuration experiment causes a deadlock, I can revert to a verified baseline in seconds rather than rebuilding the stack from zero. The automated deployment sequence now follows these primary Makefile targets:
java-all-build- Orchestrates the sequential build of all legacy Java services and their Docker images.cluster-qa-monitoring-deploy- Provisions the production-like observability stack ( Elasticsearch, Prometheus, Grafana, and Zipkin).qa-nodes-snapshot- Captures a named recovery point of the cluster VMs for rapid recreation.cluster-infra-deploy- Deploys the foundational infrastructure, including Kafka, PostgreSQL clusters, and MinIO tenants.cluster-svc-deploy- Executes the final application rollout, automating image loading and Kustomize-based deployment.
AI Implementation: Optimizing Service Recipes and Agent-Assisted Refactoring
The “shift” work reinforced a lesson from the previous phase: AI is an effective accelerator for research and implementation, provided it is governed by a disciplined routine. During this stage, the AI tools were instrumental in establishing service conventions and finalizing the technical “recipe” for my K8s workloads. Once a pattern was validated for a single service—covering Kustomize bases, environment-specific overlays, and resource constraints—producing the manifests for the remaining ten became a routine of feeding the AI assistant the specific service context and the established template.
My current workflow relies on a multi-tool strategy to balance cost and capability. For general implementation and development within the IDE, I primarily use the JetBrains AI Assistant in chat mode, configured with the Gemini 3 Flash model. This provides a pragmatic balance of speed and response quality. While JetBrains recently increased the monthly Pro credits from 10.00 to 12.46, this remains a finite resource that requires careful management. When these credits are exhausted, or when I need to perform deep-dive technical research into Kubernetes idioms or SDK behaviors, I switch to Perplexity Pro. While Perplexity is superior for synthesizing documentation, I find it more cumbersome for implementation tasks, as source code files must be manually attached to the prompt.
I also explored the more autonomous “agent” modes available within the IDE. While JetBrains offers
Junie and Claude-based agents, their high credit consumption initially made me hesitant to use them
for routine tasks. However, I took advantage of a recent promotion for the OpenAI Codex-based agent
to implement several cross-cutting service improvements. I used the agent to scaffold and refine the
LoggingFilter for the gateway, the AOP-based EventPublisherLoggingInterceptor for Kafka
visibility, and the JsReportTemplateProvisioner logic.
The AI was particularly effective at resolving the “stubborn” issues that surfaced during cluster
validation—such as the NGINX path matching inconsistencies and the Micronaut 2 discovery client RBAC
restrictions mentioned earlier. By providing the AI with the specific error logs and my existing
manifests, I was able to quickly iterate on the application-<env>.yml mappings and the resolver
configurations. As with the “lift” phase, I continue to maintain a transparent log of these
interactions, ensuring that while the AI assists with the heavy lifting of boilerplate and
debugging, the architectural rationale remains firmly under my control.
Shift Work: Closing Thoughts
The completion of the “shift” marks a significant milestone in the Insurance Hub modernization: the legacy Java stack is no longer a guest in Kubernetes but a well-mannered citizen. By stripping away local-only shortcuts—moving from H2 to external PostgreSQL and from local filesystems to S3-compatible MinIO—I’ve removed the primary blockers to scalability and state persistence. This phase was intentionally pragmatic; the goal wasn’t to achieve architectural perfection, but to ensure the existing services remained stable, observable, and reproducible within their new environment.
The technical “recipe” established during this phase—standardized Kustomize overlays, deterministic port assignments, and explicit Micronaut client mappings—has transformed the deployment process into something operationally boring. I chose to prioritize consistency over cleverness, favoring explicit URL mappings and “copy-on-create” secret automation via Make targets over more complex discovery agents or cross-namespace RBAC configurations. This approach ensured that the transition from Docker Compose to Kind and K3s was driven by observed behavior and logs rather than assumptions about framework-level “magic.”
Validating this shift required a rigorous local development loop. Being able to run services
directly from IntelliJ while they interact with cluster-managed infrastructure provided the
immediate feedback necessary to resolve the NGINX path-matching flakiness and the JVM memory
footprint issues. With the addition of the LoggingFilter and the
EventPublisherLoggingInterceptor, the system has moved from a “black box” to a transparent mesh
where requests and Kafka events are traceable across the distributed environment.
While the current system is stable and functional, it still relies on a manual “push” model for deployments. Maintaining 11 services and their supporting infrastructure through individual Make commands is a liability I intend to eliminate as the project scales. In the next post, I will introduce Flux-based GitOps. By moving to a declarative, pull-based model, the cluster state will finally be anchored in Git—enabling automated deployments to the QA environment whenever changes are merged into the repository and establishing a robust, version-controlled release workflow directly from GitHub.
Continue reading the series “Insurance Hub: The Way to Go”: