In this paper we present a vision-based method for instant global localization from a given aerial image. The approach mimics how humans localize themselves on maps using spatial layout of semantic elements on the map. Unlike other matching and localization methods that use visual appearance or feature matching, our method relies on robust and consistently detectable semantic elements that are invariant to illumination, temporal variations and occlusions. We use the buildings on the map and on the given aerial query image as our semantic elements. Spatial relations between these elements are efficiently stored and queried under a hierarchical semantic version of the Geometric Hashing algorithm that is inherently rotation and scale invariant. We also present a method to obtain building locations from a given query image using image classification and processing techniques. Overall this approach provides fast and robust localization over large areas. We show our experimental results for localizing satellite image tiles from a 16.5 km sq dense city map with over 7,000 buildings.