Supported file types in Data Security Investigations

Microsoft Purview Data Security Investigations supports Optical Character Recognition (OCR) text extraction for supported image file types. The following table lists the currently supported image file types and indicates if a file type is supported for file identification, metadata extraction, and OCR text extraction.

Image

Extraction of images is part of adding items to an investigation scope and automatic OCR processing isn't an additional charge for your organization.

Mime type File identification Metadata extraction OCR text extraction Possible Extensions
image/bmp Yes Yes Yes .bmp
image/emf Yes Yes Yes .emf
image/gif Yes Yes Yes .gif
image/jpeg Yes Yes Yes .jpeg; .jpg
image/png Yes Yes Yes .png
image/svg+xml Yes Yes Yes .svg
image/tiff Yes Yes Yes .tif
image/vnd.dwg Yes Yes Yes .dwg; .dxf
image/wmf Yes Yes Yes .wmf

Note

The OCR text extraction column indicates that you can extract text from these image formats when data is automatically vectorized. OCR text extraction occurs automatically during data preparation and the extracted text is vectorized for use in AI-based analysis tools.