Logo of Huzzle

Principal Engineer, Systems Reliability

image

T-Mobile

3mo ago

Applications are closed

  • Job
    Full-time
    Senior & Expert Level
  • Software Engineering
  • Atlanta, +1

Requirements

  • Experience Working In An Agile And Devops Environment. (Preferred)
  • Experience With Devops Tools, Such As, Ansible, Chef, Puppet, Etc. Experience In Docker, Kubernetes, Etc. (Preferred)
  • Experience In Migrating To Cloud Or Cloud Native Environments Experience (Preferred)
  • Cloud Computing (Preferred)
  • 7-10 Years: Validated Experience. (Required)
  • Experience In One Or More Of: C, C#, Java, Perl, Python, Go, Or Scripting Experience In Shell And Perl. (Required)
  • Experience In Continuous Integration/Continuous Delivery Tools: Jenkins, Cloudbees, Etc., and Other Automation Tools. (Required)
  • Experience In APM Tool, Like, Appdynamics, Logging Tool, Like Splunk. (Required)
  • Experience in A Cloud Environment (Public/Private). (Required)
  • Devops (Required)
  • Education:
  • Bachelor's degree in Computer Science or related field
  • Master's/Advanced Degree: In Lieu Of Master’s Degree, Equivalent Proven Experience May Be Considered. (Required)
  • At least 18 years of age
  • Legally authorized to work in the United States

Responsibilities

  • Strategy:
  • Ideates and assists in crafting new designs, architectures, standards, repeatable processes and methods for delivering software, and managing operations better, resulting in increased customer experience by continuous improvement of the operations of the applications.
  • Relationship and Leadership:
  • Leads and mentors a team of Systems Reliability Engineers, vendor resources, and leads improvement work or POCs as projects.
  • Technology and System:
  • Utilizes expert knowledge and skill in emerging DevOps-centric automation tools and technologies for CICD, configuration management, etc. for non-prod and production environments.
  • Contributes in future improvement of software delivery processes and operations, e.g., cloud enablement, use of microservices with containerization.
  • Delivers software to improve the availability, scalability, latency, and efficiency of T-Mobile’s services.
  • Performs environment management, automated server provisioning (VMs), pipeline configuration.
  • Builds, manages, and uses dashboard for continuous monitoring and health check of applications, and the underlying infrastructure, improve the quality of services using the monitoring feedback for non-production and production environment.

FAQs

What is the primary purpose of the Principal Engineer, Systems Reliability position?

The primary purpose of this position is to lead and mentor a team of Systems Reliability Engineers, to design and implement innovative strategies that improve software delivery and operational processes, ultimately enhancing customer experience.

What are the key responsibilities of the Principal Engineer, Systems Reliability?

Key responsibilities include strategizing and crafting new designs and processes for software delivery, leading and mentoring a team, utilizing DevOps-centric automation tools, improving software delivery operations, managing environment provisioning, and building dashboards for monitoring application health.

What specific technologies and tools should candidates be familiar with for this role?

Candidates should be familiar with DevOps tools such as Ansible, Chef, Puppet, Docker, Kubernetes, Continuous Integration/Continuous Delivery tools like Jenkins, APM tools like AppDynamics, and logging tools like Splunk. Experience in cloud computing environments is also essential.

What programming languages or scripting skills are required for this position?

Candidates should have experience in one or more of the following programming languages: C, C#, Java, Perl, Python, and Go, along with scripting experience in Shell and Perl.

What is the preferred educational background for candidates applying for this role?

The preferred educational background includes a Bachelor's degree in Computer Science or a related field. A Master's degree or equivalent proven experience may be considered in lieu of a Master’s degree.

How much experience is required for this position?

Candidates are required to have 7-10 years of validated experience in systems reliability and related fields.

Is experience in an Agile and DevOps environment necessary?

Yes, experience working in an Agile and DevOps environment is preferred for this position.

Are candidates required to be legally authorized to work in the United States?

Yes, candidates must be legally authorized to work in the United States to be considered for this position.

What leadership qualities are expected from someone in this role?

Candidates are expected to demonstrate strong leadership and mentorship capabilities, as they will be responsible for leading a team of Systems Reliability Engineers and guiding improvement projects.

What is the significance of cloud computing experience for this role?

Experience in cloud computing is significant as it relates to the continued improvement of software delivery processes and operations, including migrating to cloud or cloud-native environments.

Be Unstoppable With Us!

Telecommunications
Industry
10,001+
Employees

Mission & Purpose

T-Mobile US, Inc. (NASDAQ: TMUS) is America’s supercharged Un-carrier, delivering an advanced 4G LTE and transformative nationwide 5G network that will offer reliable connectivity for all. T-Mobile’s customers benefit from its unmatched combination of value and quality, unwavering obsession with offering them the best possible service experience and undisputable drive for disruption that creates competition and innovation in wireless and beyond. Based in Bellevue, Wash., T-Mobile provides services through its subsidiaries and operates its flagship brands, T-Mobile and Metro by T-Mobile.