The object people perceive in an image can depend on its orientation relative to the scene it is in (its reference frame). For example, the images of the symbols × and + differ by a 45 degree rotation. Although real scenes have multiple images and reference frames, psychologists have focused on scenes with only one reference frame. We propose an ideal observer model based on nonparametric Bayesian statistics for inferring the number of reference frames in a scene and their parameters. When an ambiguous image could be assigned to two conflicting reference frames, the model predicts two factors should influence the reference frame inferred for the image: The image should be more likely to share the reference frame of the closer object (proximity) and it should be more likely to share the reference frame containing the most objects (alignment). We confirm people use both cues using a novel methodology that allows for easy testing of human reference frame inference.