Engineer, Site Reliability - TPS at Standard Bank Group

eg. Accountant or Accounting or Kempinski

Our website is made possible by displaying online advertisements to our visitors.
Please consider supporting us by disabling your ad blocker.

A Must Read Article: 10 checks to identify fraudulent or scam job offers

Job Alerts: Click here to join us on Telegram

1. Patiently scroll down and read the job description below.

2. Scroll down and find how to apply or mode of application for this job after the job description.

3. Carefully follow the instructions on how to apply.

4. Always apply for a job by attaching CV with a Cover Letter / Application Letter.

Standard Bank is a firm believer in technical innovation, to help us guarantee exceptional client service and leading-edge financial solutions. Our growing global success reflects our commitment to the latest solutions, the best people, and a uniquely flexible and vibrant working culture. To help us drive our success into the future, we are looking for an experienced Resilience Engineer to join our team. Standard Bank is a leading African banking group focused on emerging markets globally. It has been a mainstay of South Africa's financial system for 150 years and now spans 16 countries across the African continent.

Job Purpose

Contribute to the resilience of Group Information Technology by improving availability, reliability and performance of business-critical customer-facing systems, whilst building sustainable capability. This complex task is delivered in conjunction with the CIO and CTO communities

Key Responsibilities/Accountabilities

Analyse, Report and structure strategies to ensure optimal systems stability

  • Design and document systems, including writing and reviewing code, to automate solutions across Group IT
  • Undertake measured, strict, troubleshooting of complicated systems and resolve time-critical incidents
  • Develop software for the purposes of automating, monitoring and maintaining deployed infrastructure and services across Group IT
  • Solve problems relating to mission-critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions
  • Influence and create new designs, architectures, standards and methods for large-scale distributed systems across Group IT
  • Responds, anticipates monitoring and adapts processes, thresholds, and configuration continuously.
  • Collaborate with product developers to ensure new features have the proper operational support and maintainability - provide deep technical guidance to development teams



Optimise, Provide Governance and Test Solutions to support systems stability

  • Manage readiness and preparedness of the network to deliver a resilient operating environment
  • Review and provide recommendations for and documentation of physical resiliency test plans, test harnesses and substantiation
  • Review and approve network standards as the network resilience subject expert Facilitation of resilience exercises and follow-ups with management and engineers
  • Design and utilize network diagnostic tools to review network incidents for validation of resiliency working as designed
  • Collaborate with network modelling experts to perform availability studies

Monitor, Design and deliver on plans to support and continuously build systems resilience

  • Responsible to maximise network performance by monitoring performance, troubleshooting network problems and outages, scheduling upgrades, and collaborating with network architects on network optimization
  • Drive the undertaking of data network fault investigations in local and wide area environments, using information from multiple sources
  • Secure network systems by establishing and enforcing policies, and defining and monitoring access
  • Support and administration of firewall environments in line with IT Security Policy
  • Update job knowledge by participating in educational opportunities, reading professional publications, maintaining personal networks and participating in professional organizations
  • Report network operational status by gathering and prioritizing information
  • Configure routing and switching equipment and firewalls
  • Remote troubleshooting and fault finding if issues occur

Establish and mature the Group Information Technology Resilience and operational readiness capability to ensure system stability and adequate remediation

  • Establish and mature the organisational capability to sustain operational readiness by identifying the principles of IT resilience, doing predictive analysis (technical and mathematical techniques, across all domains) and planning for automation of environment discovery, systems mappings and remediation tasks to establish/re-establish resilience for key banking IT systems
  • Create the critical banking ecosystem view by identifying the key customer-facing systems and IT services in consultation with the CIO structures, Architecture community and technical functions.
  • Develop and maintain the application maturity map, in conjunction with the architects, system developers and/or vendor community that supply these systems. Influencing methods and processes required to raise (demand) and implement actions to align these critical systems to the target maturity state through coding, configuration and upgrade/replacement programmes and projects
  • Assess the maturity of the monitoring and management capability to support these key systems by assessing the landscape of tools implemented to establish this capability for appropriateness. Create a roadmap, in conjunction with architecture to redress any gaps, shortcomings or overlaps
  • Establish and Expand the monitoring capability for critical customer-facing systems by engaging with the relevant architecture and support functions and advising on, and motivating for, new tools and automation to be implemented, as appropriate
  • Drive out and develop the deployment of targeted monitoring tools as part of the critical systems remediation programme
  • Contribute to the stability of critical systems and services by driving the vertical assessment capability that will assess the system technical landscape of critical banking services for defects in deployment, configuration and coding issues
  • Research emerging tools, trends and industry norms and inject learnings and findings into the Group IT environment, frameworks, methods and tools to assist in targeting, achieving and maintaining resilience within targeted critical banking IT systems (Predictive analytics modelling and Preventative process modelling)
  • Grow technically capable resources and give direction to the team



Establish and manage the Critical Systems Remediation Programme portfolio and delivery roadmap to ensure system stability

  • Establish and manage the programme portfolio that focuses on stability, recoverability, availability, operational security and performance of critical customer-facing systems
  • Identify portfolios and programmes that will deliver resilience functionality and features as part of their scope, position with relevant stakeholders and include them in the backlogs (‘tagged’ projects)
  • Create and maintain the IT Resilience programme roadmap (s) by identifying and analysing dependencies between initiatives
  • Plan and drive delivery of remediation actions or enhancements where suitable projects or initiatives do not already exist, by creating projects and work streams in the programme, or adding it to the remediation backlog
  • Monitor progress of the delivery by other programmes managing detail resilience plan dependencies through engagement with other portfolios and programmes, reporting on progress to the CIO communities
  • Forecast, secure and owns the budget associated with the scope of the programmes by performing detailed planning and presenting to CIO teams, as appropriate for funding validation
  • Participate in the identification, selection and management of 3rd party suppliers that will support the IT resilience programme portfolio deliverables and objectives.

Lead and co-ordinate the end to end Critical Systems Remediation programme execution to ensure system stability

  • Direct and lead the vertical assessment of systems and incident data analysis (triage assessments) in order to identify the issues with critical systems
  • Establish, resource and maintain continuous improvement remediation and enhancement component team to prioritise, plan and manage the delivery of findings, issues, fixes and enhancements required to move critical banking IT systems to acceptable target states of resilience
  • Set up structures to ensure correct issues analysis, documentation and prioritisation of infrastructure and application remediation actions
  • Ensure consistent socialisation of issues backlog and priorities with stakeholders and escalate contention between business priorities to the BIO/CIO teams, as well as the project steering committee
  • Collaborate with other functions to identify point fixes and drive out required actions to incorporate fixes and enhancements into relevant feature or component teams.
  • Review the quality of deliverables ensuring alignment with the roadmap, quality and timeframe expectations
  • Manage the analysis of technical issues and challenges faced during project execution, including causal analysis of system fragility, including actions required to restore resilience or incorporate enhancements required, in consultation with the technical stakeholders and CIO community
  • Establish and implement effective execution practices ensuring cross-functional, stream interaction and collaboration aligning to the ways of work in Group IT
  • Provide leadership and direction to teams involved in the programme

Manage the effective realisation of the objectives and benefits of the Resilience programme portfolio to ensure remediation

  • Strengthen Standard Bank’s ability to provide and maintain an acceptable level of service in the face of technical faults and challenges to normal operation by recommending appropriate priorities and remediation actions in consultation with the relevant CIOs, Architects and Chief Technical Officer
  • Contribute to operational information security by reviewing the information security findings and collaborating on remediate technical actions required to address control weaknesses and vulnerabilities
  • Drive the resilience of Group IT by directing the team to find significant issues with all business-critical systems across the ecosystem

Provide Leadership and subject matter expertise to build communities of engineering excellence

  • Work closely with the architecture community understanding application maturity and technical lifecycle plans, therefore contributing to the operational readiness of critical systems
  • Direct and Influence decisions relating to stability, recoverability, availability, performance and operational security by managing key stakeholder relationships in technical functions (Infrastructure, Operations and Support), Information Security, Architecture and CIO communities through engagement
  • Research and stay current on new developments and trends regarding IT resilience in order to provide leadership to the Group IT executive and management teams
  • Ensure strategic alignment of the portfolio of programmes to the objective of stabilising the critical systems and services for the Standard Bank Group by creating strategic partnerships within the business, Group IT and vendor management
  • Coach and mentor individuals looking to pursue a technical career path focusing on areas relating to IT resilience



Minimum Qualification and Experience

  • Site Reliability Engineering Training and skills
  • Complete understanding of CI/CD (from SCM tool, build, code quality, orchestration, monitoring and testing).
  • Versatile with scripting languages, Java (desirable)
  • AWS/Services, Jenkins, Terraform, Bamboo, GIT, Bit Bucket.
  • Kubernetes, Docker / Containers, Observerability Tools (i.e. Promethues / Grafana)
  • Microservices / Event-driven Microservices (desirable)
  • Testing framework
  • Deployment Orchestration (Chef, Docker Compose or with Bamboo).
  • CI/CD automation experience.
  • Intermediate Linux admin skills (To maintain our environments and AWS instances).
  • AWS experience preferably certified.
  • Understanding of Java and Angular/node (To maintain our builds).
  • Experience in handling Prod and environment incidents.
  • Team lead experience guiding the team and understanding what strength of each team member is.
  • More than 10 years, Experience in Analysis and design - Experience Description: Experience in transformational projects with a strong technology platform component, demonstrating the realisation of the business objectives and affecting client experience. Experience in working with cross-functional business stakeholder groups in order to facilitate ideation and solution design, ensuring that initiatives have client and business relevance and can technically be solved using emerging technologies. Experience in ensuring the commercial viability of solution and creating value for clients, shareholders and business.
  • 5-7 Years, experience in Application Development and Support - Experience Description: Experience as a software engineer or operations engineer, Experience using large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management. Experience designing and executing small to medium scale systems automation projects with strong autonomy.
  • Familiar with and enthusiastic for software engineering best practices such as testing, continuous integration and continuous delivery. A strong focus on instrumentation and observability with experience with monitoring and metrics collection tools such as AppDynamics, Prometheus, Nagios and Graphite
  • In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work)
  • Familiarity with algorithms, data structures and complexity analysis Familiarity with systems and configuration management tools (e.g. Chef and Terraform) Knowledge and experience with Software Version Control systems: SVN, GIT Experience maintaining automated build systems such as Bamboo and Jenkins Experience implementing Continuous Integration or Continuous Delivery processes in engineering teams. Experience leading and integrating test automation into various points in a deployment pipeline Linux system administration experience: ssh, monitoring processes, attaching storage, cleaning disk space, tailing logs.
  • A desire to write tools and applications to automate work rather than do everything by hand Proven problem-solving skills focused on quick outage recovery and root cause analysis Excellent communication skills in English, both oral and written.
  • A high aptitude for both learning and teaching Systematic problem-solving approach, coupled with a strong sense of ownership and drive
  • 5-7 Years, experience in Technology Business Partnering - Experience Description: Broad experience in translating business and functional requirements into technical specifications. Experience in engaging with delivery partners both internal and external to the organisation with a focus on optimising partner performance.



How To Apply

This Job Listing Has Expired

JobSearch South Africa Job Widget

Display job vacancies in South Africa on your website or blog for FREE!!

Get Started !!


  • Do not pay any fee to any Recruiter.
  • The Recruiter may amend, delete or expire jobs at any time without notification.
  • The Recruiter reserves the right not to proceed with filling the position.
  • An application will not in itself entitle the applicant to an interview.


JobSearch South Africa is your most reliable website for latest jobs in South Africa today. If you are interested in getting genuine and reviewed job vacancies in South Africa from the best companies, then you are in the right place. Browse For DPSA vacancies, Government jobs & More. You can find LinkedIn jobs and Indeed jobs, here


Job search is not an easy journey especially for job vacancies in South Africa. That is why we have engaged experts to write seasoned articles to guide your job search in South Africa. We cover interviews, cover letters, CVs, aptitude tests, workplace life, entrepreneurship, personal finance and more. Check out our career articles page today!


© 2020 - 2021 JobSearch South Africa - Jobs In South Africa

Powered by Yaks Baker