Job Description : Sr. Hadoop Platform Admin:
Roles & Responsibilities:
- Responsible for managing a large Hadoop Cluster & 100% Components Availability.
- a) HDFS - Check Name Node UI for under replicated / corrupted blocks) & Data Nodes availability;
- b) Yarn - Resource Manager availability & Node Manager availability;
- c) Storm - Supervisor availability & Nimbus availability;
- d) Hbase - Hbase Master & Region Servers availability, Phoenix Servers;
- e) Hive Server 2 availability, 2. Check Jstat & Heap size observation to determine the response time & health)
- f) RabbitMQ - check the inflow & processing speed. Any slowness to be addressed
- g) Zookeeper / Journal node
- h) Ambari;
- i) Grafana;
- j) Spark - Kafka;
- Cluster maintenance, including addition and removal of nodes.
- Addition of nodes, installation of services for the new nodes.
- Performance tuning (for eg. Yarn is slow, Tez jobs are slow, Slow Data loading) & maintain platform integrity.
- Industry best practices & recommendations review and roll out as appropriate
- Managing the alerts on the Ambari page & take corrective & preventive actions
- HDFS Disk space management
- HDFS Disk Utilization. Weekly utilization report for Capacity planning.
- User access management. Setup new Hadoop users.
Security Admin:
- Manage & maintain layered access through Authentication, Authorization, and Auditing.
- Addition and maintenance of user access for both new and existing users.
- Maintain & manage High Availability.
- Manage permissions and roll over Ranger KMS keys.
- Monitor the Automated Audit forwarding Job.
- Audit Log Clean up as directed by security information and event management (SIEM) system.
- Management and coordination of trouble tickets related to Hadoop with HortonWorks
- New switch configuration on the FTP servers.
- Setting up the folders & Permission on the FTP servers
- Monitor & Manage file transfer from FTP & writes onto HDFS.
- Monitor & Manage data transfer from HDFS to Hive through RabbitMQ using Storm Processing.
- Monitor the Dashboard to ensure Data Loading completion before Aggregation kick off.
- Point of Contact for Vendor escalation
- Familiarity with open source configuration management and deployment tools such as Puppet or Chef and Linux scripting.
Skills Required:
- Good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.
- Experience in managing the HortonWorks Distribution.
- Hadoop skills like HBase, Hive, Pig, Mahout, etc.
- Experience in deploying Hadoop cluster, add /remove nodes, keeping track of jobs, monitoring critical parts of the cluster.
- Good knowledge of Linux as Hadoop runs on Linux.
- Knowledge of Troubleshooting Core Java Applications is a plus.