Engineer, Site Reliability - TPS at Standard Bank Group

Apply Now

Engineer, Site Reliability - TPS at Standard Bank Group

IT / Telecom Jobs In South Africa 2024

Jobs In Gauteng 2024

A Must Read Article: 10 checks to identify fraudulent or scam job offers

Free Tuition Scholarships to Study in UK, US etc.

Click here to join us on Telegram

1. Patiently scroll down and read the job description below.

2. Scroll down and find how to apply or mode of application for this job after the job description.

3. Carefully follow the instructions on how to apply.

4. Always apply for a job by attaching CV with a Cover Letter / Application Letter.

Standard Bank is a firm believer in technical innovation, to help us guarantee exceptional client service and leading-edge financial solutions. Our growing global success reflects our commitment to the latest solutions, the best people, and a uniquely flexible and vibrant working culture. To help us drive our success into the future, we are looking for an experienced Resilience Engineer to join our team. Standard Bank is a leading African banking group focused on emerging markets globally. It has been a mainstay of South Africa's financial system for 150 years and now spans 16 countries across the African continent.

Job Purpose

Contribute to the resilience of Group Information Technology by improving availability, reliability and performance of business-critical customer-facing systems, whilst building sustainable capability. This complex task is delivered in conjunction with the CIO and CTO communities

Key Responsibilities/Accountabilities

Analyse, Report and structure strategies to ensure optimal systems stability

Design and document systems, including writing and reviewing code, to automate solutions across Group IT
Undertake measured, strict, troubleshooting of complicated systems and resolve time-critical incidents
Develop software for the purposes of automating, monitoring and maintaining deployed infrastructure and services across Group IT
Solve problems relating to mission-critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions
Influence and create new designs, architectures, standards and methods for large-scale distributed systems across Group IT
Responds, anticipates monitoring and adapts processes, thresholds, and configuration continuously.
Collaborate with product developers to ensure new features have the proper operational support and maintainability - provide deep technical guidance to development teams

CONTINUE READING BELOW

Optimise, Provide Governance and Test Solutions to support systems stability

Manage readiness and preparedness of the network to deliver a resilient operating environment
Review and provide recommendations for and documentation of physical resiliency test plans, test harnesses and substantiation
Review and approve network standards as the network resilience subject expert Facilitation of resilience exercises and follow-ups with management and engineers
Design and utilize network diagnostic tools to review network incidents for validation of resiliency working as designed
Collaborate with network modelling experts to perform availability studies

Monitor, Design and deliver on plans to support and continuously build systems resilience

Responsible to maximise network performance by monitoring performance, troubleshooting network problems and outages, scheduling upgrades, and collaborating with network architects on network optimization
Drive the undertaking of data network fault investigations in local and wide area environments, using information from multiple sources
Secure network systems by establishing and enforcing policies, and defining and monitoring access
Support and administration of firewall environments in line with IT Security Policy
Update job knowledge by participating in educational opportunities, reading professional publications, maintaining personal networks and participating in professional organizations
Report network operational status by gathering and prioritizing information
Configure routing and switching equipment and firewalls
Remote troubleshooting and fault finding if issues occur

Establish and mature the Group Information Technology Resilience and operational readiness capability to ensure system stability and adequate remediation

Establish and mature the organisational capability to sustain operational readiness by identifying the principles of IT resilience, doing predictive analysis (technical and mathematical techniques, across all domains) and planning for automation of environment discovery, systems mappings and remediation tasks to establish/re-establish resilience for key banking IT systems
Create the critical banking ecosystem view by identifying the key customer-facing systems and IT services in consultation with the CIO structures, Architecture community and technical functions.
Develop and maintain the application maturity map, in conjunction with the architects, system developers and/or vendor community that supply these systems. Influencing methods and processes required to raise (demand) and implement actions to align these critical systems to the target maturity state through coding, configuration and upgrade/replacement programmes and projects
Assess the maturity of the monitoring and management capability to support these key systems by assessing the landscape of tools implemented to establish this capability for appropriateness. Create a roadmap, in conjunction with architecture to redress any gaps, shortcomings or overlaps
Establish and Expand the monitoring capability for critical customer-facing systems by engaging with the relevant architecture and support functions and advising on, and motivating for, new tools and automation to be implemented, as appropriate
Drive out and develop the deployment of targeted monitoring tools as part of the critical systems remediation programme
Contribute to the stability of critical systems and services by driving the vertical assessment capability that will assess the system technical landscape of critical banking services for defects in deployment, configuration and coding issues
Research emerging tools, trends and industry norms and inject learnings and findings into the Group IT environment, frameworks, methods and tools to assist in targeting, achieving and maintaining resilience within targeted critical banking IT systems (Predictive analytics modelling and Preventative process modelling)
Grow technically capable resources and give direction to the team

CONTINUE READING BELOW

Establish and manage the Critical Systems Remediation Programme portfolio and delivery roadmap to ensure system stability

Establish and manage the programme portfolio that focuses on stability, recoverability, availability, operational security and performance of critical customer-facing systems
Identify portfolios and programmes that will deliver resilience functionality and features as part of their scope, position with relevant stakeholders and include them in the backlogs (‘tagged’ projects)
Create and maintain the IT Resilience programme roadmap (s) by identifying and analysing dependencies between initiatives
Plan and drive delivery of remediation actions or enhancements where suitable projects or initiatives do not already exist, by creating projects and work streams in the programme, or adding it to the remediation backlog
Monitor progress of the delivery by other programmes managing detail resilience plan dependencies through engagement with other portfolios and programmes, reporting on progress to the CIO communities
Forecast, secure and owns the budget associated with the scope of the programmes by performing detailed planning and presenting to CIO teams, as appropriate for funding validation
Participate in the identification, selection and management of 3rd party suppliers that will support the IT resilience programme portfolio deliverables and objectives.

Lead and co-ordinate the end to end Critical Systems Remediation programme execution to ensure system stability

Direct and lead the vertical assessment of systems and incident data analysis (triage assessments) in order to identify the issues with critical systems
Establish, resource and maintain continuous improvement remediation and enhancement component team to prioritise, plan and manage the delivery of findings, issues, fixes and enhancements required to move critical banking IT systems to acceptable target states of resilience
Set up structures to ensure correct issues analysis, documentation and prioritisation of infrastructure and application remediation actions
Ensure consistent socialisation of issues backlog and priorities with stakeholders and escalate contention between business priorities to the BIO/CIO teams, as well as the project steering committee
Collaborate with other functions to identify point fixes and drive out required actions to incorporate fixes and enhancements into relevant feature or component teams.
Review the quality of deliverables ensuring alignment with the roadmap, quality and timeframe expectations
Manage the analysis of technical issues and challenges faced during project execution, including causal analysis of system fragility, including actions required to restore resilience or incorporate enhancements required, in consultation with the technical stakeholders and CIO community
Establish and implement effective execution practices ensuring cross-functional, stream interaction and collaboration aligning to the ways of work in Group IT
Provide leadership and direction to teams involved in the programme

Manage the effective realisation of the objectives and benefits of the Resilience programme portfolio to ensure remediation

Strengthen Standard Bank’s ability to provide and maintain an acceptable level of service in the face of technical faults and challenges to normal operation by recommending appropriate priorities and remediation actions in consultation with the relevant CIOs, Architects and Chief Technical Officer
Contribute to operational information security by reviewing the information security findings and collaborating on remediate technical actions required to address control weaknesses and vulnerabilities
Drive the resilience of Group IT by directing the team to find significant issues with all business-critical systems across the ecosystem

Provide Leadership and subject matter expertise to build communities of engineering excellence

Work closely with the architecture community understanding application maturity and technical lifecycle plans, therefore contributing to the operational readiness of critical systems
Direct and Influence decisions relating to stability, recoverability, availability, performance and operational security by managing key stakeholder relationships in technical functions (Infrastructure, Operations and Support), Information Security, Architecture and CIO communities through engagement
Research and stay current on new developments and trends regarding IT resilience in order to provide leadership to the Group IT executive and management teams
Ensure strategic alignment of the portfolio of programmes to the objective of stabilising the critical systems and services for the Standard Bank Group by creating strategic partnerships within the business, Group IT and vendor management
Coach and mentor individuals looking to pursue a technical career path focusing on areas relating to IT resilience

CONTINUE READING BELOW

Minimum Qualification and Experience

Site Reliability Engineering Training and skills
Complete understanding of CI/CD (from SCM tool, build, code quality, orchestration, monitoring and testing).
Versatile with scripting languages, Java (desirable)
AWS/Services, Jenkins, Terraform, Bamboo, GIT, Bit Bucket.
Kubernetes, Docker / Containers, Observerability Tools (i.e. Promethues / Grafana)
Microservices / Event-driven Microservices (desirable)
Testing framework
Deployment Orchestration (Chef, Docker Compose or with Bamboo).
CI/CD automation experience.
Intermediate Linux admin skills (To maintain our environments and AWS instances).
AWS experience preferably certified.
Understanding of Java and Angular/node (To maintain our builds).
Experience in handling Prod and environment incidents.
Team lead experience guiding the team and understanding what strength of each team member is.
More than 10 years, Experience in Analysis and design - Experience Description: Experience in transformational projects with a strong technology platform component, demonstrating the realisation of the business objectives and affecting client experience. Experience in working with cross-functional business stakeholder groups in order to facilitate ideation and solution design, ensuring that initiatives have client and business relevance and can technically be solved using emerging technologies. Experience in ensuring the commercial viability of solution and creating value for clients, shareholders and business.
5-7 Years, experience in Application Development and Support - Experience Description: Experience as a software engineer or operations engineer, Experience using large scale production systems and technologies, for example load balancing, monitoring, distributed systems, microservices, and configuration management. Experience designing and executing small to medium scale systems automation projects with strong autonomy.
Familiar with and enthusiastic for software engineering best practices such as testing, continuous integration and continuous delivery. A strong focus on instrumentation and observability with experience with monitoring and metrics collection tools such as AppDynamics, Prometheus, Nagios and Graphite
In-depth knowledge of operating systems (processes, threads, concurrency issues, locks, mutexes, semaphores, monitors and how they work)
Familiarity with algorithms, data structures and complexity analysis Familiarity with systems and configuration management tools (e.g. Chef and Terraform) Knowledge and experience with Software Version Control systems: SVN, GIT Experience maintaining automated build systems such as Bamboo and Jenkins Experience implementing Continuous Integration or Continuous Delivery processes in engineering teams. Experience leading and integrating test automation into various points in a deployment pipeline Linux system administration experience: ssh, monitoring processes, attaching storage, cleaning disk space, tailing logs.
A desire to write tools and applications to automate work rather than do everything by hand Proven problem-solving skills focused on quick outage recovery and root cause analysis Excellent communication skills in English, both oral and written.
A high aptitude for both learning and teaching Systematic problem-solving approach, coupled with a strong sense of ownership and drive
5-7 Years, experience in Technology Business Partnering - Experience Description: Broad experience in translating business and functional requirements into technical specifications. Experience in engaging with delivery partners both internal and external to the organisation with a focus on optimising partner performance.

« Go back to the jobs list

SHARE THIS ON:

HOW TO APPLY

Click here to apply >>

RELATED JOBS >> CLICK A JOB BELOW TO VIEW & APPLY

Senior IT Technician: Cape Town - Vista Group

Junior Data Analyst - IQbusiness

Data Warehouse Developer - Mr Price Group

Data Analyst - Mr Price Logistics

JobSearch South Africa Job Widget

Display job vacancies in South Africa on your website or blog for FREE!!

Get Started !!

Disclaimer

Do not pay any fee to any Recruiter.
The Recruiter may amend, delete or expire jobs at any time without notification.
The Recruiter reserves the right not to proceed with filling the position.
An application will not in itself entitle the applicant to an interview.

Engineer, Site Reliability - TPS at Standard Bank Group

Engineer, Site Reliability - TPS at Standard Bank Group

HOW TO APPLY

RELATED JOBS >> CLICK A JOB BELOW TO VIEW & APPLY

<< Previous Job

Next Job>>

.Net Software Engineer - Absa Bank

JobSearch South Africa Job Widget

Jobs In South Africa By Locations

DSPA VACANCIES & GOVERNMENT JOBS IN SOUTH AFRICA TODAY

TOP CAREER ARTICLES FROM EXPERTS

COMPANY

Employer

Job Seeker

Similar Job Sites