This document details a reinforcement learning approach for enabling humanoid robots with multi-fingered hands to perform dexterous manipulation tasks based on visual input. The core challenges addressed include bridging the simulation-to-real-world gap, designing effective reward functions, improving policy learning sample efficiency, and handling object perception. The research introduces several novel techniques such as an automated real-to-sim tuning module, a generalizable reward scheme using contact and object goals, strategies for more sample-efficient policy learning through initialization and distillation, and a combined approach to object representation using both sparse and dense features. These methods demonstrate promising results on real-world tasks like grasping, lifting boxes, and bimanual handover, highlighting the potential for robust generalization and high performance without requiring human demonstrations.