Observability Analyst

Apply Now

Observability Analyst

  • R0016465
  • Pune, Maharashtra, India
  • Observability
  • Information Technology
  • Full_time

Job Ad

Observability Analyst

Location: Bangalore or Pune
 

Department: Data, Technology & Security
 

We’re seeking an experienced and proactive Observability Analyst (Level 2) to join Procore’s Data, Technology & Security team in our Bangalore or Pune office. In this role, you’ll play a critical part in enhancing our monitoring and observability practices across global IT and business systems. You will work closely with engineering, network, and application teams to ensure performance, reliability, and transparency across our enterprise platforms.

Reporting to the Director of End User Services & ITSM, Ganesh Annaswamy, you'll use your technical, analytical, and troubleshooting skills to maintain our IT services. Your contributions will be integral to our incident handling processes and overall team success.

What You’ll Do

Monitoring, Automation & Optimization

  • Design, implement, and maintain advanced monitoring and alerting solutions for cloud infrastructure, networks, and enterprise SaaS platforms (Workday, Salesforce, NetSuite, etc.)
  • Build and optimize dashboards to provide actionable insights into system health and performance
  • Automate repetitive monitoring tasks using scripts or API integrations (Python, PowerShell, or equivalent)
  • Continuously improve alert thresholds, correlation rules, and incident triage logic to reduce noise and improve detection accuracy

Incident Management & Root Cause Analysis

  • Act as the second line of defense for complex performance incidents, collaborating with Cloud, Network, and Application teams for resolution
  • Lead post-incident reviews and contribute to problem management processes
  • Develop and implement monitoring enhancements to prevent recurrence of major incidents
  • Provide mentorship and technical guidance to Level 1 Observability Associates

Performance Analysis & Continuous Improvement

  • Conduct trend analysis on performance metrics and system logs to identify potential bottlenecks and capacity issues
  • Partner with service owners and technical SMEs to improve observability coverage and service reliability
  • Propose and implement metric-based service-level indicators (SLIs) and service-level objectives (SLOs)
  • Evaluate and onboard new observability tools or features to enhance monitoring maturity

Documentation & Knowledge Sharing

  • Maintain up-to-date runbooks, SOPs, and architecture diagrams for observability systems
  • Develop internal knowledge articles and training materials for cross-functional teams
  • Contribute to continuous service improvement (CSI) initiatives within the ITSM framework

What We’re Looking For

  • Education:

Bachelor’s degree in computer science, IT, or related discipline (or equivalent professional experience).

  • Experience:

5+ years of experience in IT Operations, NOC, or Observability roles, with at least 2 years in a Level 2 capacity.

Demonstrated experience managing observability for hybrid (cloud/on-premises) environments

  • Technical Skills:
    • Proficiency with monitoring and observability tools: Prometheus, Grafana, Datadog, New Relic, Splunk, ELK, or similar.
    • Strong understanding of networking, cloud infrastructure (AWS/Azure/GCP), and SaaS application monitoring.
    • Familiarity with APM (Application Performance Monitoring) and synthetic monitoring.
    • Scripting knowledge in Python, PowerShell, or Bash for automation and data processing.
    • Experience integrating observability tools with incident management systems (ServiceNow, Jira, PagerDuty, Opsgenie).
  • Problem-Solving:
    • Strong analytical and troubleshooting abilities
    • Ability to prioritize and manage tasks in a fast-paced environment

  • Professionalism:
    • Excellent analytical and problem-solving skills with a proactive mindset
    • Strong communication skills with the ability to convey technical insights to non-technical stakeholders
    • Proven ability to operate in a fast-paced, 24x5 global support environment
    • Working knowledge of ITIL/ITSM processes (Incident, Change, and Problem Management)
  • Collaboration:
    • Ability to work cross-functionally to identify trends and improve IT services
  • Work Environment:
    • Willingness to work in a 24x5 support environment

Learn about our applicant and candidate privacy policy and about creating a profile on My Settings.