- October 25, 2022
- Qiming Wang, SEO Specialist
As they say themselves, Google works to organize all information in the world and make it accessible to everyone. Of course, Google encounters files that aren’t HTML, such as spreadsheets, presentations, and PDFs during this long-term endeavor. In those cases, they work to find the appropriate content and index it for search results. Not surprisingly, these file types can be difficult to index. What guidelines should they follow? What happens if you don’t want Google to index them?
Google has compiled the most frequently asked questions regarding PDF indexing and here’s what they have to say:
Q. Can Google index PDF files?
A: Google can generally index textual content written in any language from PDF files. However, they must not be password protected or encrypted. Google can process images embedded with text using OCR algorithms. Google should be able to index text that you can copy from PDF documents into standard text documents.
Q. How are PDF documents treated with links?
A: Links in PDF files can pass PageRank and other indexing signals and Google may follow them once it has crawled the file. It is currently not possible to embed nofollow hyperlinks within a PDF file.
Q. How can I stop my PDF files appearing in search results?
A: It’s easy to stop PDF documents appearing in search results. To do this, add the X-Robots Tag: noindex to the HTTP header that will serve the file. If the PDF documents are already indexed, they will disappear over time if the X-Robots-Tag is used with the noindex directive. You can also use the URL Removal Tool in Google Webmaster Tools for faster removal.
Q. Can PDF files rank high in search results?
A: Sure! They will rank like other websites. All PDF documents returned by the Google search engine rank high due to their content and how they are embedded and linked to other pages.
Q. Is it duplicate content if I have my pages in both HTML and PDF?
A: Google recommends that you only serve one copy of your content whenever possible. If you are unable to do so, ensure that you specify your preferred version. This can be done by including the URL in your sitemap, or by specifying your canonical version either in the HTTP Headers or in the PDF resource’s HTML.
Q. How can I change the title of my PDF document that appears in search results?
A: Google uses two elements to determine the title displayed: the title metadata contained in the file and the anchor text of any links pointing at the PDF file. Google recommends updating both to give our algorithms strong signals about the correct title to use.