Recognizing objects classes in real world image is a long standing goal in computer vision.conceptually, this is challenging due to large appearance variations of objects instances belongs to the similar group. Additionally, distortions from background clutter, scale and viewpoint variation can render appearances of even the same objects instance to be vastly different .Further challenges arise from interclass similarity in witch instance from different class appear vey similarly .Model for object class must be flexible enough to sieve out true object instance in cluttered pictures.
The identification of the object in an picture probably start with techniques of image processing like as unwanted noise removal, followed by (low-level) extracting of features to locate lines, regions and some areas with certain textures.
The clever bit is to interpret collections of these shapes as single objects, e.g. cars on a road, boxes on a conveyor belt or cancerous cells on a microscope slide. A major drawback in AI problem is that an object can appear very different when viewed from different angles or under different lighting. Another drawback is deciding what feature belonging to what object and which are background or shadows etc. The human visual system performs these tasks mostly unconsciously but a computer requires skillful programming and lots of processing power to approach human performance. This image is usually interpreted as two-dimensional array of the brightness values, and is most familiarly represented by such patterns as those of a photographic print, slide, television screen, or movie screen. Any image can be processed optically or digitally with a computer.
The generic nature of line segments and ellipses gives images an ability to represent complex shapes and structures. While individually less distinctive, by combining a number of these primitives, combination is sufficiently discriminated. Here, each combination is a two-layer abstraction of primitives: pairs of primitives (termed shape tokens) at the first layer, and a learned number of shape tokens at the second layer. There is no constraint to have a fixed number of shape-tokens, but it is allowed to automatically and flexibly adapt to an object class. This number influences a combination’s ability to represent shapes, where simple shapes favor fewer shape-tokens than complex ones. Consequently, discriminative combinations of varying complexity can be exploited to represent an object class. Shape constraints describe the aspect visually of shape tokens, while geometric constraints describe its spatial layout (configurations). Structural constraints enforce possible poses/structures of an object by the relationship between shape-tokens.