![Binance logo](https://static.remoteliz.com/static/companies/company-binance-logo.jpeg)
Data Scientist (NLP)
BinanceJob Summary
We are seeking a Data Scientist (NLP) to join Binance's Risk AI team. As a data scientist, you will utilize internal data to train models and develop applications based on these models. You will collaborate with engineers, data analysts, business operations, and product/marketing managers to define and build solutions, features, algorithms, and products. Your responsibilities will include applying Natural Language Processing (NLP) techniques to preprocess, analyse, and extract insights from large textual datasets. You will also develop and fine-tune Large Language Models (LLMs) and multimodal models to derive actionable insights and enhance business decision-making processes. We are looking for someone with a Master's degree or higher in Computer Science, Data Science, Statistics, Mathematics, Computational Linguistics, or a related field, with at least 3 years of relevant industry experience in AI/ML and Natural Language Processing. You should be proficient in big data technologies such as Apache Spark, Apache Hadoop and Apache Kafka and VectorDB, and have a solid understanding of modern machine learning techniques and mathematical underpinning. Additionally, you should have demonstrated experience in handling severely imbalanced datasets and proficiency in programming languages such as Python, Java, or similar, with experience in machine learning (ML), natural language processing (NLP) libraries, and deep learning frameworks.
Responsibilities:
- Apply Natural Language Processing (NLP) techniques to preprocess, analyse, and extract insights from large textual datasets. Develop and fine-tune Large Language Models (LLMs) and multimodal models to derive actionable insights and enhance business decision-making processes.
- Work closely with business units to identify opportunities for leveraging company data and AI models to drive innovative business solutions and improve decision-making processes.
- Perform data cleaning, transformation, and preprocessing to create high-quality datasets for analysis and modeling. Ensure data integrity and consistency throughout the process.
- Conduct exploratory data analysis to uncover patterns, trends, and relationships within the data. Generate visualisations and summaries to effectively communicate findings to stakeholders and support data-driven decision-making.
- Stay abreast of the latest developments in artificial intelligence, with a particular focus on advancements in multimodal AI, to ensure the integration of cutting-edge technologies and methodologies into our data-driven solutions.
- Develop and apply feature engineering techniques to create meaningful features that improve the performance of models. This includes deriving new features from raw data, selecting relevant features, and transforming existing features to enhance model accuracy and efficiency.
Requirements:
- Holds a Master's degree or higher in Computer Science, Data Science, Statistics, Mathematics, Computational Linguistics, or a related field.
- A minimum of 3 years of relevant industry experience in AI/ML and Natural Language Processing is required. Experience in multimodal AI is highly preferred.
- Proficient in big data technologies such as Apache Spark, Apache Hadoop and Apache Kafka and VectorDB.
- Deep understanding of modern machine learning techniques and mathematical underpinning, such as classifications, neural networks, hyperparameter optimisation, etc.
- Solid understanding and practical experience with deep learning architectures, including transformer models (e.g., BERT, GPT). Ability to implement, optimize, and fine-tune these models for various tasks using techniques such as LoRA.
- Proficiency in programming languages such as Python, Java, or similar, with experience in machine learning (ML), natural language processing (NLP) libraries, and deep learning frameworks such as TensorFlow, PyTorch, Scikit-learn, SpaCy, and NLTK.
- Demonstrated experience in handling severely imbalanced datasets. Knowledge of techniques and strategies to address imbalances in data.