This paper addresses the determination of the rigid transformation between camera and object reference frames from a pair of intensity images and a known scene model. Two difficult parts of this problem that deserve particular attention are the matching between image and model features and the matching of image-features between stereo views. We propose the use of planar regions as features, what make both problems simpler. The former is handled by an invariant-based approach, for which a less complex base can be adopted, and the latter, by applying the epipolar constraint for inferior and superior bounds of region coordinates. The presented approach may be useful in many applications where camera-based tracking requires automatic initialization.