
CSO @ habiliAI
6 Followers
#dailychallenge Deepseek r1 - Reinforcement learning-based(only) Open-source LLM/ + MoE - CoT - Trained with AI scoring AI without no human interference - Enabling low-cost - high-performance LLM
#dailychallenge day19 PondAI - A sleeper AI infra project - Crypto AI layer by running incentive-driven competitions for onchain prediction models that empower smarter DeFAI, security, and trading agents - The best models receive incentives, and model developers retain ownership, provided they are integrated with real-time data and inference infrastructure. Gaianet - A decentralized computing infrastructure that enables everyone to create, deploy, scale, and monetize their own AI agents that reflect their styles, values, knowledge, and expertise. - Supports AI-powered dApps & smart contracts - Open-source framework
#dailychallenge CoT - Chain-of-Thought (CoT) is a reasoning technique where a model breaks down complex problems into intermediate steps before arriving at a final answer. - In an easy way, thinking step by step, not deriving answers directly. - This enhances reasoning capabilities in large language models (LLMs), especially for multi-step problems. - ChatGPT o1 and o3 use this, and DeepSeek R1 also uses this methodology. - This makes the model to - improve logical reasoning and problem-solving skills - Help with mathematical, coding, and reasoning-based tasks - enhances interpretability by showing intermediate steps reasoning - Recent trend is not to train, but to reason - A reasoning framework where a model decomposes complex tasks into a logical sequence of intermediate steps before reaching a conclusion. - This enhances the model’s ability to solve problems that require multi-step thinking rather than direct retrieval
#dailychallenge LLM Fine-tuning Methodology - LoRA Tuning, P-Tuning, Instruction Tuning - Pre-trained model gets fine-tuned with task-based data - LoRA is best for parameter-efficient fine-tuning - P-Tuning optimizes prompt engineering - Instruction Tuning focuses on making models better at understanding and executing human instructions - LoRA Tuning& P-Tuning are both PEFT(Parameter-Efficient Fine-Tuning). - This means it doesn’t change the whole parameter but trains additional small parameters. - Instruction Tuning is used to enhance the model itself. - It uses a dataset of ‘instructions and answers’ - While PEFT is a methodology to adjust the model to fit specific tasks, instruction tuning is used for performance enhancement. - PEFT is Tuning. Instruction Tuning is Training