Duplicate post check? | CyberSEO Pro

Duplicate post check?

Topic Rating: 0 (0 votes)

February 6, 2025
1:34 am

egeekbiz

Member

Members

Forum Posts: 9

Member Since:
April 15, 2024

Offline

Hey there,

I recognize that the duplicate post detection is only on a feed-by-feed level, but is there any way to adopt that across all feeds? For example, if Feed A and Feed B have the same article, but titled slightly different, I end up with both articles. As far as I know, there's no way to pass parameters into the Content Filtering step. Is that correct?

Open to any suggestions on how I may avoid this. Just causing a lot of unnecessary clutter and duplications with multiple feeds in the same industry.

Thanks!

February 6, 2025
8:34 am

CyberSEO

Admin

Forum Posts: 4077

Member Since:
July 2, 2009

Offline

The uniqueness of the imported article is determined by the uniqueness of the link to the original text. Thus, if you have two feeds containing the same article, the plugin will add it only once, and will not create a copy. As an alternative or in addition to checking for uniqueness by link, you can also check for uniqueness by article title.

February 10, 2025
9:54 am

egeekbiz

Member

Members

Forum Posts: 9

Member Since:
April 15, 2024

Offline

Unfortunately, unless the title is exactly the same, the system doesn't understand the article itself. For example, "New Jurassic Park Trailer" and "Watch Jurassic Park Trailer" would be interpreted as two separate articles, despite being the same thing.

I've semi-solved it for it using a custom PHP code, but it happens after the AI generation (and token usage) so there's a fair amount of waste there. Is there a way to do that kind of check before AI is engaged to rewrite?

February 10, 2025
10:19 am

CyberSEO

Admin

Forum Posts: 4077

Member Since:
July 2, 2009

Offline

If you have a vision of how this can be done in practice, please describe the algorithm for checking article text for uniqueness. Just keep in mind that the script can't check the text of each imported article against the text of each post available in your WP database, because such a resource-intensive check can easily overload your server, since your database may contain many thousands of quite large articles, the texts of which must be checked against the text of each individual post from each imported feed.

Most users run the plugin on shared hostings, and even checking for uniqueness by title or link separately can put a heavy load on virtual hardware in case if there are many items to check. This is actually the reason why it is recommended to do such a simplified check rather than do both title and link checks in one pass, if possible.

February 11, 2025
6:20 pm

egeekbiz

Member

Members

Forum Posts: 9

Member Since:
April 15, 2024

Offline

Oh I understand. I definitely don't expect anything to be able to absorb an entire article to compare, but it would be helpful if the duplicate check had a sliding scale (right now, it's 100% match. If it were a 70-80% match, I'd be in better shape).

For example, here's the code I'm running now. It basically filters anything that uses 3 of the same words. It's not ideal, and needs some tweaking, but it's been helpful so far. However, as I said, it's running AFTER generating the AI rewrite, and it'd be more helpful if I could run it on the initial check.

Login to see the quote

February 12, 2025
6:09 am

CyberSEO

Admin

Forum Posts: 4077

Member Since:
July 2, 2009

Offline

egeekbiz said
Oh I understand. I definitely don't expect anything to be able to absorb an entire article to compare, but it would be helpful if the duplicate check had a sliding scale (right now, it's 100% match. If it were a 70-80% match, I'd be in better shape).

This can't be done with a standard MySQL query, so the text comparison will take even more time than a full-text comparison.

As for the snippet. If you want to tweak it a bit, I suggest you to use our GPT assistant which is familiar with the documentation: Login to see this link

All RSS

Forum Timezone: Europe/Amsterdam

Most Users Ever Online: 541

Currently Online:
24 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.