logo of bytedance

字节跳动AI Security Engineer | AI安全研究员-隐私创新实验室-筋斗云人才计划

校招全职A79045A地点:新加坡状态:招聘

任职要求


1. Got doctor degree, majoring in artificial intelligence, computer science, mathematics, etc. are preferred;
2. Have a solid foundation and coding ability in generating AI, and those who have published papers in top journals and conferences such as ICLR/NeurIPS/ICML are preferred;
3. Familiar with the industry trends in the direction of large models, with a quick learning …
登录查看完整任职要求
微信扫码,1秒登录

工作职责


Team Introduction:
The Privacy Innovation Lab focuses on delving into the latest technologies and theories in data privacy and security. It offers technology consulting services that offer valuable perspectives on industry trends and innovative tech solutions, which are crucial for our business. In the realm of data security, the Privacy Innovation Lab has a long term vision and determination, including focuses on digital sovereignty, and protecting personal privacy data in large-scale models. With privacy compliance regulations getting stricter, and the concept of multi-polar digital sovereignty is emerging, our team draws more on practical knowledge from academia and industry. By introducing state-of-the-art technologies and theories, we offer comprehensive and efficient data privacy and security safeguards for Internet services with a large user base and vast amounts of data to drive continuous business innovation.

团队介绍:
隐私创新实验室,致力于探索数据隐私安全领域的前沿技术和理论,为字节跳动全球业务的高速发展提供洞悉行业趋势的技术咨询和创新性的技术解决方案。隐私创新实验室在数据安全领域拥有长期愿景与决心,研究方向覆盖数字主权、合规智能、大模型个人隐私数据保护等。在日益趋严的隐私合规监管的历史时刻,多极化的数字主权意识逐步觉醒,我们更需要融合学术界和产业界的经验智慧,引入前沿技术和理论,为承载海量用户和海量数据的互联网业务提供高效完备的数据隐私安全保障,化解合规壁垒,支持业务持续突破创新。

课题背景:
生成式AI技术在创意产业、教育、医疗、法律等领域展现了巨大的潜力。
然而,随着这些技术的发展,隐私问题也逐渐浮出水面。生成式AI模型通过学习大量的训练数据来生成新的内容,其中可能包含大量敏感的个人信息。如果训练数据或者模型训练过程没有进行足够的隐私保护,生成的内容可能泄露训练数据中的私人信息。例如,生成的文本可能无意中包含了训练数据中个人的敏感细节,图像生成模型可能会重构出真实世界的个人面孔或位置,甚至生成个人的生物特征。
因此,如何在不泄露个人隐私的前提下,利用生成式AI模型的强大能力,成为了一个亟待解决的关键问题。如何设计既能保证隐私保护,又能保持生成效果和模型性能的生成式AI,正成为该领域的前沿研究方向。

课题挑战:
1. 隐私泄露风险: 生成式AI模型的训练依赖于大量的数据,尤其是在自然语言处理和图像生成领域。训练过程中,模型可能会记忆训练数据的某些特定信息,这些信息可能会被生成模型复现。举例来说,GPT类语言模型可能会无意间生成包含训练数据中某个人身份信息、地址或其他敏感数据的文本。如何确保生成模型不会泄露这些信息,成为隐私保护中的一大挑战。
2. 数据扰动与模型质量: 为了防止隐私泄露,常用的隐私保护技术(如差分隐私)通常需要对训练数据进行扰动或噪声注入。然而,这种扰动可能导致生成模型失去对数据的精确建模能力,从而影响生成内容的质量。尤其在生成任务中,模型的质量直接决定了输出内容的实用性和创造性,因此,如何在保护隐私的同时,尽可能地保持生成结果的高质量,是一个亟需解决的问题。
3. 模型的“记忆”与“复用”问题: 生成式AI模型通过学习大量的数据来建立生成规则,但是它们也可能在训练过程中“记住”数据的细节。这个问题在某些情况下可能表现为“记忆泄露”,即模型输出内容可能无意间重现训练集中的某些特定片段,尤其是在小样本或高敏感度的数据集上。如何防止生成式AI模型“记忆”并复用具体的个人信息,而只是学习到数据的“规律”或“特征”,是设计隐私保护机制时必须要考虑的重要问题。                
4. 合规性与跨境数据流动: 各国对隐私保护有不同的法律规定,例如GDPR、CCPA等都对如何处理和传输个人数据提出了严格要求。对于跨境数据流动,如何确保在进行生成式AI训练时遵守不同地区的数据隐私法规,特别是在涉及敏感个人信息时,成为了一个复杂的法律和技术挑战。此外,生成式模型可能涉及多个数据源和多个国家的用户数据,如何在这些环境下平衡隐私保护与合规性,也是值得关注的问题。
5. 生成内容的透明性与可解释性: 尽管生成式AI模型的生成能力令人惊叹,但它们往往缺乏足够的透明性,导致用户难以理解生成结果背后的原因。在隐私保护背景下,如何使生成模型具备更好的可解释性,能够让用户理解模型是如何生成特定内容的,且该内容是否涉及隐私信息,是增强用户信任的关键。这一挑战不仅仅是技术问题,也是伦理和社会问题。
包括英文材料
NeurIPS+
还有更多 •••
相关职位

logo of bytedance
校招A238623

Team Introduction: TikTok Content Security Algorithm Research Team The International Content Safety Algorithm Research Team is dedicated to maintaining a safe and trustworthy environment for users of ByteDance's international products. We develop and iterate on machine learning models and information systems to identify risks earlier, respond to incidents faster, and monitor potential threats more effectively. The team also leads the development of foundational large models for products. In the R&D process, we tackle key challenges such as data compliance, model reasoning capability, and multilingual performance optimization. Our goal is to build secure, compliant, and high-performance models that empower various business scenarios across the platform, including content moderation, search, and recommendation. Research Project Background: In recent years, Large Language Models (LLMs) have achieved remarkable progress across various domains of natural language processing (NLP) and artificial intelligence. These models have demonstrated impressive capabilities in tasks such as language generation, question answering, and text translation. However, reasoning remains a key area for further improvement. Current approaches to enhancing reasoning abilities often rely on large amounts of Supervised Fine-Tuning (SFT) data. However, acquiring such high-quality SFT data is expensive and poses a significant barrier to scalable model development and deployment. To address this, OpenAI's o1 series of models have made progress by increasing the length of the Chain-of-Thought (CoT) reasoning process. While this technique has proven effective, how to efficiently scale this approach in practical testing remains an open question. Recent research has explored alternative methods such as Process-based Reward Model (PRM), Reinforcement Learning (RL), and Monte Carlo Tree Search (MCTS) to improve reasoning. However, these approaches still fall short of the general reasoning performance achieved by OpenAI's o1 series of models. Notably, the recent DeepSeek R1 paper suggests that pure RL methods can enable LLM to autonomously develop reasoning skills without relying on the expensive SFT data, revealing the substantial potential of RL in advancing LLM capabilities. 团队介绍: 国际化内容安全算法研究团队致力于为字节跳动国际化产品的用户维护安全可信赖环境,通过开发、迭代机器学习模型和信息系统以更早、更快发掘风险、监控风险、响应紧急事件,团队同时负责产品基座大模型的研发,我们在研发过程中需要解决数据合规、模型推理能力、多语种性能优化等方面的问题,从而为平台上的内容审核、搜索、推荐等多项业务提供安全合规,性能优越的基座模型。 课题介绍: 课题背景: 近年来,大规模语言模型(Large Language Models, LLM)在自然语言处理和人工智能的各个领域都取得了显著的进展。这些模型展示了强大的能力,例如在生成语言、回答问题、翻译文本等任务上表现优异。然而,LLM 的推理能力仍有很大的提升空间。在现有的研究中,通常依赖于大量的监督微调(Supervised Fine-Tuning, SFT)数据来增强模型的推理性能。然而,高质量 SFT 数据的获取成本高昂,这对模型的开发和应用带来了极大的限制。 为了提升推理能力,OpenAI 的 o1 系列模型通过增加思维链(Chain-of-Thought, CoT)的推理过程长度取得了一定的成功。这种方法虽然有效,但在实际测试时如何高效地进行扩展仍是一个开放的问题。一些研究尝试使用基于过程的奖励模型(Process-based Reward Model, PRM)、强化学习(Reinforcement Learning, RL)以及蒙特卡洛树搜索算法(Monte Carlo Tree Search, MCTS)等方法来解决推理问题,然而这些方法尚未能达到 OpenAI o1 系列模型的通用推理性能水平。最近deepseek r1在论文中提到通过纯强化学习的方法,可以使得 LLM 自主发展推理能力,而无需依赖昂贵的 SFT 数据。这一系列的工作都揭示着强化学习对LLM的巨大潜力。 课题挑战: 1、Reward模型的设计:在强化学习过程中,设计一个合适的reward模型是关键。Reward模型需要准确地反映推理过程的效果,并引导模型逐步提升其推理能力。这不仅要求对不同任务精准设定评估标准,还要确保reward模型能够在训练过程中动态调整,以适应模型性能的变化和提高。 2、稳定的训练过程:在缺乏高质量SFT数据的情况下,如何确保强化学习过程中的稳定训练是一个重大挑战。强化学习过程通常涉及大量的探索和试错,这可能导致训练不稳定甚至模型性能下降。需要开发具有鲁棒性的训练方法,以保证模型在训练过程中的稳定性和效果。 3、如何从数学和代码任务上拓展到自然语言任务上:现有的推理强化方法主要应用在数学和代码这些CoT数据量相对丰富的任务上。然而,自然语言任务的开放性和复杂性更高,如何将成功的RL策略从这些相对简单的任务拓展到自然语言处理任务上,要求对数据处理和RL方法进行深入的研究和创新,以实现跨任务的通用推理能力。 4、推理效率的提升:在保证推理性能的前提下,提升推理效率也是一个重要挑战。推理过程的效率直接影响到模型在实际应用中的可用性和经济性。可以考虑利用知识蒸馏技术,将复杂模型的知识传递给较小的模型,以减少计算资源消耗。另外,使用长思维链(Long Chain-of-Thought, Long-CoT)技术来改进短思维链(Short-CoT)模型,也是一种潜在的方法,以在保证推理质量的同时提升推理速度。

更新于 2025-05-26新加坡
logo of nvidia
社招

N/A

更新于 2025-09-24上海|北京|深圳
logo of microsoft
社招Technolo

• Drive technical sales with decision makers using demos and PoCs to influence solution design and enable production deployments. • Lead hands-on engagements—hackathons, code-with sessions, and architecture workshops—to accelerate adoption of Microsoft’s developer tools and cloud platforms. • Build trusted relationships with developers and platform leads, co-designing secure, scalable architectures and solutions • Resolve technical blockers and objections, collaborating with engineering to share insights and improve products. • Maintain deep expertise in AI Foundry & App architecture (Agentic AI framework, Semantic Kernel, Foundry SDK, Responsible AI) and App architecture/cloud native dev (APIs, containerization, microservices, event-driven, Python, Java or .NET). • Maintain and grow expertise in AI Management & Security (Gen AI Ops, Sentinel, orchestrator, monitoring). • Represent Microsoft through thought leadership in developer communities and customer forums

更新于 2025-09-26深圳
logo of microsoft
社招Technolo

Build StrategyBuilds competitive knowledge, documents compete patterns, and shares within the community to drive change and escalations for competitive selling strategies. Acts as a subject matter expert on a particular competition. Delivers competitive knowledge back to product and engineering teams.Works with local account and marketing teams to shape strategic win and customer success plans and tailors to audience for the local markets using knowledge of Microsoft offerings, their context in the competitive landscape, and broader market trends. Where applicable, builds consumption plans with moderately complex requirements in coordination with Partner and Industry Solutions Delivery teams after customer sign-off.Proactively develops strategic cross-workload/subsidiary and account level responses to specific market threats by identifying market patterns and delivering feedback to business group on trends and needs. EducationBuilds readiness plans for peers and proactively identifies gaps and new opportunities for learning. Provides insight onto Corporate, business and product groups, sales strategy, and business reviews for impact.Acts as a technical thought leader by sharing best practices (e.g., architectures, materials) and regularly delivering content at Microsoft events (e.g., TechReady). Provides insight into how to identify opportunities to increase solutions/portfolio understanding.Monitors, responds to, and acts as a thought leader on internal tech community posts, establishes and leads vibrant tech communities including community calls, sessions, hackathons, etc., and acts as a mentor to the community. Leverage Partner EcosystemScales wins through partner in a sell-with environment by promoting the partner within the Microsoft ecosystem (e.g., account teams) and developing deep partner relationships.Supports partner technical capacity by monitoring and analyzing resources through interactions, communicating with managers, and identifying new partnership opportunities to build subsidiary strategy. Scale Customer EngagementsLeverages knowledge of resources (e.g., roles, Microsoft Technology Center [MTC], demo sites, virtual sites, Value Based Delivery [VBD], Customer Success Unit [CSU) and proactively engages product teams (e.g., engineering) to remediate escalated technical blockers by conveying impact and anticipating and addressing future potential blockers based on needs.Leads and ensures complex technical wins (e.g., cross-workload, cross-team, cross-geo, subsidiary-level impact) by establishing rules of engagement (e.g., role boundaries, handoff strategies), coaching others (e.g., technical sellers, account teams), leveraging knowledge of processes (e.g., Managed Service Provider [MSP], Managed Certified Professional [MCP]), tools, and programs (e.g., FastTrack, End Customer Investment Funds [ECIFs]). Ensures alignment of Microsoft technologies with future sector standards and requirements by working with industry boards and driving customer case studies and references.Proactively identifies and engages with key customer technical decision makers and influencers while engaging sales team and helping lead sales strategy.Uses knowledge of customer context, and deep technical, domain, and industry knowledge to build credibility with customers. Solution Design and ProofDemonstrates and oversees demonstrations (e.g., architectural design sessions, and proof of concept [POC] sessions, pilots, hackathons) of solutions based on multiple products and position solutions against competitors. Leverages partner/customer teams as needed to prove capabilities and programmatic framework for re-use by the business.Adapts and extends architecture patterns to accommodate complex customer requirements and drive integration solutions for industry flavor. Delivers assets that can be leveraged by others in the business.Applies advanced sales methodologies (e.g., challenger sales) to guide customers through digital transformation solutions and uses innovation to challenge solutions against changing technology (e.g., Power Apps).

更新于 2025-10-10北京