Reinforcement learning (RL), inspired by learning behaviour in nature, is a goal-oriented learning strategy wherein the agent learns the policy to optimize a pre-defined reward by interacting with the environment. For being data-driven, effectiveness in reaching optimal behavior, and adaptiveness to uncertain environment, RL has undergone rapid progress in control community. In this talk, we shall first discuss RL based disturbance rejection control for uncertain nonlinear systems with known nominal part. An extended state observer is first designed to estimate the system state and the total uncertainty. Based on the output of the observer, the control compensates for the total uncertainty in real time, and simultaneously, online approximates the optimal policy for the compensated system using a simulation of experience based RL technique. The approach does not require PE condition or probing signals. We further extend the study to systems with unknown nominal part, where a novel concurrent adaptive extended observer is developed to jointly estimate the parameters of the systems and the state, and a simulation of experience based RL is used to approximate the optimal policy.