小米视觉图像算法工程师-实习
实习兼职地点:北京状态:招聘
任职要求
1. 计算机科学、信息工程、电子工程、机器人学等专业,有C++/python/Java开发经验; 2. 熟练掌握深度学习相关理论,包括BP算法、神经网络、CNN、RNN、LSTM、Transformer等, 对大模型结构(Bert,GPT等)有应用经验的优先; 3. 掌握图像处理基本知识,例如图像滤波,压缩和缩放等算法,熟悉图像质量评估的主客观评价指标; 4. 了解机器学习相关理论,包括SVM、KNN、决策树、随机深林、朴素贝叶斯、概率统计等; 5. 熟练掌握至少一种深度学习框架,如Tensorflow、PyTorch等; 6. 了解主流网络模型, 例如ResNet、 MobileNet和UNet等,在人脸识别,OCR等领域有经验者优先。
工作职责
1. 负责手机上图像算法的开发,例如人脸识别,文本检测OCR,视觉SLAM 2. 在手机上优化和部署算法,包括模型的压缩、量化和加速,并使用手机上的各种计算单元包括CPU、GPU和NPU; 3. 负责训练数据的预处理工作,包括如何收集图像数据,标注数据,数据增强和数据的清洗工作; 4. 参与创新技术的预研和产品化工作,紧跟业界领先的算法,设计更加优秀的算法, 并撰写相关论文,专利。
包括英文材料
C+++
https://www.learncpp.com/
LearnCpp.com is a free website devoted to teaching you how to program in modern C++.
https://www.youtube.com/watch?v=ZzaPdXTrSb8
Python+
https://liaoxuefeng.com/books/python/introduction/index.html
中文,免费,零起点,完整示例,基于最新的Python 3版本。
https://www.learnpython.org/
a free interactive Python tutorial for people who want to learn Python, fast.
https://www.youtube.com/watch?v=K5KVEU3aaeQ
Master Python from scratch 🚀 No fluff—just clear, practical coding skills to kickstart your journey!
https://www.youtube.com/watch?v=rfscVS0vtbw
This course will give you a full introduction into all of the core concepts in python.
Java+
https://www.youtube.com/watch?v=eIrMbAQSU34
Master Java – a must-have language for software development, Android apps, and more! ☕️ This beginner-friendly course takes you from basics to real coding skills.
深度学习+
https://d2l.ai/
Interactive deep learning book with code, math, and discussions.
算法+
https://roadmap.sh/datastructures-and-algorithms
Step by step guide to learn Data Structures and Algorithms in 2025
https://www.hellointerview.com/learn/code
A visual guide to the most important patterns and approaches for the coding interview.
https://www.w3schools.com/dsa/
LSTM+
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Humans don’t start their thinking from scratch every second.
https://d2l.ai/chapter_recurrent-modern/lstm.html
The term “long short-term memory” comes from the following intuition.
https://developer.nvidia.com/discover/lstm
A Long short-term memory (LSTM) is a type of Recurrent Neural Network specially designed to prevent the neural network output for a given input from either decaying or exploding as it cycles through the feedback loops.
https://www.youtube.com/watch?v=YCzL96nL7j0
Basic recurrent neural networks are great, because they can handle different amounts of sequential data, but even relatively small sequences of data can make them difficult to train.
Transformer+
https://huggingface.co/learn/llm-course/en/chapter1/4
Breaking down how Large Language Models work, visualizing how data flows through.
https://poloclub.github.io/transformer-explainer/
An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.
https://www.youtube.com/watch?v=wjZofJX0v4M
Breaking down how Large Language Models work, visualizing how data flows through.
大模型+
https://www.youtube.com/watch?v=xZDB1naRUlk
You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers.
https://www.youtube.com/watch?v=zjkBMFhNj_g
GPT+
https://www.youtube.com/watch?v=kCc8FmEb1nY
We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3.
图像处理+
https://opencv.org/blog/computer-vision-and-image-processing/
This fascinating journey involves two key fields: Computer Vision and Image Processing.
https://www.geeksforgeeks.org/python/image-processing-in-python/
Image processing involves analyzing and modifying digital images using computer algorithms.
https://www.youtube.com/watch?v=kSqxn6zGE0c
In this Introduction to Image Processing with Python, kaggle grandmaster Rob Mulla shows how to work with image data in python!
机器学习+
https://www.youtube.com/watch?v=0oyDqO8PjIg
Learn about machine learning and AI with this comprehensive 11-hour course from @LunarTech_ai.
https://www.youtube.com/watch?v=i_LwzRVP7bg
Learn Machine Learning in a way that is accessible to absolute beginners.
https://www.youtube.com/watch?v=NWONeJKn6kc
Learn the theory and practical application of machine learning concepts in this comprehensive course for beginners.
https://www.youtube.com/watch?v=PcbuKRNtCUc
Learn about all the most important concepts and terms related to machine learning and AI.
TensorFlow+
https://www.youtube.com/watch?v=tpCFfeUEGs8
Ready to learn the fundamentals of TensorFlow and deep learning with Python? Well, you’ve come to the right place.
https://www.youtube.com/watch?v=ZUKz4125WNI
This part continues right where part one left off so get that Google Colab window open and get ready to write plenty more TensorFlow code.
PyTorch+
https://datawhalechina.github.io/thorough-pytorch/
PyTorch是利用深度学习进行数据科学研究的重要工具,在灵活性、可读性和性能上都具备相当的优势,近年来已成为学术界实现深度学习算法最常用的框架。
https://www.youtube.com/watch?v=V_xro1bcAuA
Learn PyTorch for deep learning in this comprehensive course for beginners. PyTorch is a machine learning framework written in Python.
OCR+
https://www.ibm.com/think/topics/optical-character-recognition
Optical character recognition (OCR) is a technology that uses automated data extraction to quickly convert images of text into a machine-readable format.
https://www.youtube.com/watch?v=or8AcS6y1xg
Optical character recognition (OCR) is sometimes referred to as text recognition.
相关职位
实习
1、参与图像、视频生成相关领域研发工作,探索视觉生成领域前沿方向 2、参与图像画质增强、视频可控生成、多模态视觉生成、视觉生成领域强化学习等方向研究 3、分析和解决算法产品化过程中出现的效果、性能等问题 4、参与学术研究,产出影响行业的科研成果
更新于 2025-05-23
实习
1、参与图像、视频生成相关领域研发工作,探索视觉生成领域前沿方向 2、参与图像画质增强、视频可控生成、多模态视觉生成、视觉生成领域强化学习等方向研究 3、分析和解决算法产品化过程中出现的效果、性能等问题 4、参与学术研究,产出影响行业的科研成果
更新于 2025-02-19
实习
1、参与图像、视频生成相关领域研发工作,探索视觉生成领域前沿方向 2、参与图像生成与编辑、视频可控生成、多模态视觉生成、视觉生成领域强化学习等方向研究 3、分析和解决算法产品化过程中出现的效果、性能等问题 4、参与学术研究,产出影响行业的科研成果
更新于 2025-09-01
实习
1.前沿算法研发 •主导计算机视觉与AIGC核心算法研发(检测/分割/生成/多模态等),推动超分、修复、美化等技术在业务场景落地,实现效果与效率双优化。 •探索Stable Diffusion等生成式模型的应用创新,结合业务需求优化图像生成、智能编辑(如文本驱动编辑、语义修复)等关键技术。 2.工程化落地 •完成算法从原型到产品的全链路开发,解决模型压缩(量化/剪枝)、推理加速(TensorRT/MNN部署)、跨平台适配等工程挑战。 •构建高精度、低延迟的CV pipeline,覆盖图像矫正、去噪、OCR等实际需求。 3.技术前瞻性研究 •跟踪CVPR/ICML等顶会技术动态,针对性研发Diffusion Models、Vision Transformer等前沿模型,建立技术壁垒。
更新于 2025-08-21