Principal Engineer - Gen AI training and inferencing
Job Description
Overview
PRINCIPAL SOFTWARE DEVELOPMENT ENGINEER — The AI Models and Applications team at AMD is looking for a specialized Principal level engineer who is passionate about enabling innovative and efficient Generative AI training/inferencing at scale. You will be part of a core team of highly talented specialists and work on scaling training and inference for the latest Generative AI models.
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
The Role
The AI Models and Applications team at AMD is looking for a specialized Principal level engineer who is passionate about enabling innovative and efficient Generative AI training/inferencing at scale. You will be part of a core team of incredibly talented specialists and work on scaling training and inference for the latest Generative AI models.
The Person
The ideal candidate has deep technical understanding of the latest generative AI applications like large language models (LLMs), large multimodal models (LMMs), image/video generation, has experience training models at scale and is passionate about innovating efficient approaches to enable distributed training and inference at scale on AMD devices.
Why Join Us?
- Exciting Opportunities: As a Senior member on the team, you will be at the forefront of innovation, working with the latest Gen AI models and algorithms. You will have the opportunity to shape the future of AI model training and inference optimizations across a variety of applications.
- Talented Team: Join a team of highly skilled industry specialists who are passionate about pushing the boundaries of AI. Collaborate with like-minded professionals and learn from the best in the field.
- Cutting-edge Technology: Work with state-of-the-art GenAI algorithms and software enabling you to stay ahead of the curve and drive advancements in AI model training at scale and deployment.
- Impactful Work: Your contributions will directly influence how cutting-edge Gen AI models across the industry are efficiently trained at scale as well as inferencing deployed to serve millions of customers, making a significant difference in various industries and applications.
Key Responsibilities
- Propose and apply innovative techniques to support both training and inferencing including transformer architectures, parallelism strategies to train on large clusters, and low-precision training.
- Implement novel efficient architectures for Generative AI models for training and inference and showcase benefits on AMD.
- Work with open-source frameworks and communities (e.g., PyTorch, JAX, Hugging Face) to integrate AMD optimized models, libraries and publish training recipes.
- Collaborate with software and hardware teams to E2E co-optimize performance on current and future AMD solutions.
- Publish and promote your work within AMD and at external venues.
Preferred Experience
- Strong technical expertise in Gen AI model training and inference, and familiarity with deep learning frameworks like PyTorch/JAX.
- Strong technical expertise in algorithmic innovation toward efficient Gen AI applications for both training and inference.
- Experience with publications in areas such as efficient model architectures, optimized training, or innovative parallelism strategies; plus if publications include conferences like NeuRIPS, CVPR, ECCV/ICCV, ICML, ICLR, etc.
- Experience productizing generative AI models and training foundation models at scale.
- Excellent written, verbal, and presentation skills; ability to coordinate internally and externally.
- Several years of experience in AI, deep learning and related software development.
Academic Credentials
PhD or master’s degree or above; with major in CS, EE, Mathematics, or a related field.
Location
Markham, Ontario, Canada (Hybrid). Can also consider Calgary, Alberta, Canada. #HYBRID
Benefits
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here. This posting is for an existing vacancy.
#J-18808-Ljbffr
How to Apply
Ready to start your career as a Principal Engineer - Gen AI training and inferencing at AMD?
- Click the "Apply Now" button below.
- Review the safety warning in the modal.
- You will be redirected to the employer's official portal to complete your application.
- Ensure your resume and cover letter are tailored to the job description using our AI tools.
Frequently Asked Questions
Who is hiring?▼
This role is with AMD in Markham.
Is this a remote position?▼
This appears to be an on-site role in Markham.
What is the hiring process?▼
After you click "Apply Now", you will be redirected to the employer's official site to submit your resume. You can typically expect to hear back within 1-2 weeks if shortlisted.