Înapoi la Joburi
Publicat 1 săptămână în urmă
Endava

Senior Observability Engineer

Nespecificat
Estimare 1,350 - 3,650 EUR Brut / lună · Pe baza a 38 anunțuri similare
Timișoara, TM, Romania
La birou
Full-time

Tehnologii

Descriere job

Acest post este localizat în Timișoara, TM, Romania, Romania.

Căutăm un senior Senior Observability Engineer.

Oferim un post full-time.

Informații suplimentare

Job Description The Senior Observability Engineer designs, implements, and continuously improves observability for live services (bespoke apps), enabling reliable operations and faster change. The role establishes practical monitoring standards and telemetry (metrics, logs, traces, synthetics, events), integrates observability into delivery pipelines, and uses data to drive incident reduction, faster diagnosis, improved performance, and measurable service outcomes. This role operates under broad direction and is accountable for delivering improvements to reliability and operational performance, collaborating with engineering, platform, security, and service delivery stakeholders. The work aligns with Applications Management practices for live support, problem management, service performance, and continuous improvement. Responsibilities: - Define and implement observability strategy per service: SLIs/SLOs, telemetry standards, ownership model, alerting principles, and runbook requirements. - Build and maintain telemetry across services and platforms: instrumentation, dashboards, alerts, and automated detection of abnormal behavior. - Improve incident and problem outcomes: reduce MTTR via better signals, correlation, and actionable alerts; support RCA and trend analysis; drive prevention and backlog items. - Operational performance and capacity: establish performance baselines, detect regressions, support capacity planning, and measure availability/performance against targets. - Embed observability into delivery: integrate instrumentation and quality gates into CI/CD; define “ready for ops” acceptance criteria related to monitoring and supportability. - Enable teams: coach engineers and support teams on using observability tooling, operational diagnostics, and effective on-call practices. - Stakeholder management: communicate clearly with technical and non-technical stakeholders on service health, risks, trade-offs, and improvement plans. Ways of working and behaviors expected: - Analytical and structured approach to problem solving; comfortable working with incomplete data and iterating toward clarity. - Strong written and verbal communication and able to present service health and risks to both technical and non-technical audiences. - Proactive ownership: identifies gaps, drives improvement actions, and follows through to measurable outcomes. - Collaborative mindset: works effectively across engineering, platform, and service management functions. Qualifications At least 3 years of relevant hands-on experience in the following areas: Observability engineering: - Ability to design an observability approach that covers metrics, logs, traces, and synthetics with clear intent (detection, diagnosis, prediction, and validation). - Strong capability in alert design (signal vs noise), routing, escalation, and maintaining an actionable on-call experience (runbooks, playbooks, ownership). - Practical experience defining SLIs/SLOs, error budgets, and service health reporting that supports operational decision-making. Operations and service management alignment: - Strong experience working in (or closely with) live production support, including incident management, major incidents, and problem management. - Ability to translate operational pain into measurable improvement plans, and to track improvements using service metrics and trends. - Working knowledge of change/release impact and how observability supports safe deployment and early-life support. Technical foundations: - Cloud and container platforms: hands-on with at least one major cloud provider and modern runtime patterns (Kubernetes, containers, managed services). - Distributed systems fundamentals: latency, saturation, errors, throughput, dependencies, and failure modes. - Automation / scripting: ability to automate repetitive diagnostics and observability configuration (e.g., Python, Bash, PowerShell, or similar). - Security awareness: safe handling of telemetry, access controls, data retention, and secure-by-default configuration. Tooling (examples, not an exhaustive list): - Experience with one or more observability stacks such as:- Metrics: Prometheus, Grafana, Datadog, CloudWatch/Azure Monitor, etc. - Logs: ELK/OpenSearch, Splunk, Loki, etc. - Tracing/APM: OpenTelemetry, Jaeger, Tempo, New Relic, Dynatrace, Datadog APM, etc. - Experience integrating observability with ITSM / collaboration tooling (ex: ServiceNow/Jira, ChatOps, incident paging). Nice-to-have: - Experience with observability-as-code (dashboards/alerts via Terraform, GitOps, Helm, API-driven configuration). - Experience with synthetics and RUM (web performance/user journey monitoring). - Experience building golden signal dashboards, dependency maps, or automated correlation. - Experience supporting regulated environments (ex: ISO controls, auditability, change governance). Additional Information Discover some of the global benefits that empower our people to become the best version of themselves: - Finance: Competitive salary package, share plan, company performance bonuses, value-based recognition awards, referral bonus; - Career Development: Career coaching, global career opportunities, non-linear career paths, internal development programmes for management and technical leadership; - Learning Opportunities: Complex projects, rotations, internal tech communities, training, certifications, coaching, online learning platforms subscriptions, pass-it-on sessions, workshops, conferences; - Work-Life Balance: Hybrid work and flexible working hours, employee assistance programme; - Health: Global internal wellbeing programme, access to wellbeing apps; - Community: Global internal tech communities, hobby clubs and interest groups, inclusion and diversity programmes, events and celebrations. At Endava, we’re committed to creating an open, inclusive, and respectful environment where everyone feels safe, valued, and empowered to be their best. We welcome applications from people of all backgrounds, experiences, and perspectives—because we know that inclusive teams help us deliver smarter, more innovative solutions for our customers. Hiring decisions are based on merit, skills, qualifications, and potential. If you need adjustments or support during the recruitment process, please let us know. Company Description Technology is our how. And people are our why. For over two decades, we have been harnessing technology to drive meaningful change. By combining world-class engineering, industry expertise and a people-centric mindset, we consult and partner with leading brands from various industries to create dynamic platforms and intelligent digital experiences that drive innovation and transform businesses. From prototype to real-world impact - be part of a global shift by doing work that matters.

Despre Companie Endava

Endava is a leading provider of next-generation technology services, dedicated to enabling its customers to accelerate growth, tackle complex challenges and thrive in evolving markets.
Moduri de lucru
La birou
Birouri în: Iași, IS, Romania, Cluj-Napoca, CJ, Romania, Brașov, BV, Romania, Timișoara, TM, Romania

Compensație

Nespecificat
Estimare 1,350 - 3,650 EUR Brut / lună
Pe baza a 38 anunțuri similare

Detalii contract

Tip angajare Full time
Tip contract Angajat full-time

Checklist înainte de aplicare

Verifică rapid dacă anunțul are informațiile esențiale, ca să compari corect ofertele.

  • Salariul este brut sau net și pentru ce perioadă?
  • Este CIM (angajat) sau B2B (PFA/SRL)?
  • Ce înseamnă „remote/hibrid” concret (zile la birou, overlap)?
  • Este clar scope-ul și nivelul de senioritate?

Semnalează dacă lipsesc date sau există erori în anunț.

Cum evaluezi acest job (dincolo de titlu)

O aplicație bună nu înseamnă doar “știu stack-ul”. Înseamnă dovada că poți livra rezultate în contextul specific: setup-ul echipei, constrângeri, așteptări de senioritate și cum se măsoară succesul. Folosește checklist-ul ca să decizi dacă aplici și ce să evidențiezi.

Clarifică scope-ul și așteptările

Multe anunțuri sunt intenționat generale. Rolul tău este să identifici responsabilitățile de bază și dacă se potrivesc cu punctele tale forte acum.

  • Caută semnale de ownership: “design”, “arhitectură”, “lead”, “on-call”, “mentoring”.
  • Verifică dacă rolul este feature delivery vs platform/infra vs mentenanță.
  • Dacă descrierea e scurtă, folosește mărimea companiei + industrie + stack ca să deduci ziua de lucru tipică.

Validează tipul de lucru și colaborarea

Etichetele remote/hibrid/la birou nu sunt suficiente. Constrângerile reale sunt orele de overlap, zilele la birou și stilul de comunicare.

  • Confirmă dacă “remote” e global/UE/doar România și dacă sunt ore obligatorii de overlap.
  • Pentru hibrid, întreabă câte zile pe săptămână și dacă sunt fixe sau flexibile.
  • Verifică cerințele de limbă și dependențele cross-team (product, design, stakeholders).

Compară compensația realist

Ca să compari două oferte, normalizează totul pe același baseline și tip de contract. Dacă salariul nu e afișat, construiește un interval orientativ și validează devreme.

  • Normalizează brut vs net și lună vs an înainte să compari.
  • Pentru B2B, ia în calcul taxele, contabilitatea, zilele libere neplătite și riscul.
  • Folosește datele de piață ca “sanity check”, apoi negociază cu dovezi (impact, scope, senioritate).

Link-uri utile pentru decizie

Paginile de mai jos te ajută să verifici intervalele salariale și alegerile de contract (mai ales când treci între CIM și B2B).