Deep reinforcement learning (DRL), which integrates the strengths of reinforcement learning (RL) with deep neural networks, has garnered significant attention in the context of job shop scheduling problems (JSP), flexible job shop scheduling problems (FJSP), and their various extensions. This paper provides a comprehensive review and critical commentary on recent advancements in the application of DRL to JSP, FJSP, and related variants. It summarizes the design approaches for state and action spaces, with a focus on the selection of neural network architectures such as multi-layer perceptron (MLP), convolutional neural network (CNN), recurrent neural network (RNN), attention mechanisms, Transformer, and graph neural network (GNN) for effective state feature extraction. Additionally, the paper introduces the concept of the alignment model (e.g., pointer network) for end-to-end learning. Furthermore, the paper categorizes benchmark-based training into three main approaches: instance-by-instance, instance class, and size-agnostic, and provides a comparative analysis of results from classical JSP studies. Finally, it discusses future research directions and emerging trends. This review serves as a valuable reference for further research on DRL-based production scheduling problems, particularly in the selection of neural network architectures, design of state and action spaces, definition of RL methods, and experimental setups.