In this paper a distributed multiagent, multiband reinforcement learning based sensing policy for cognitive radio ad hoc networks is proposed. The proposed sensing policy employs secondary user (SU) collaboration through local interactions. The goal is to maximize the amount of available spectrum found for secondary use given a desired diversity order, i.e. a desired number of SUs sensing simultaneously each frequency band. The SUs in the cognitive radio network make local decisions based on their own and their neighbors' local test statistics or decisions to identify unused spectrum locally. Thus, the network builds a locally available map of spectrum occupancy of its geographical area. Simulation results show that the proposed sensing policy provides a significant increase in the amount of available spectrum found for secondary use compared to a random sensing policy.