This is not a problem of CyberSEO but a problem of WordPress itself. When the plugin syndicates a feed in auto mode, it does it via WordPress pseudo cron, which activates when somebody opens your blog in browser. The problem arises when a few instances of your blog get opened simultaneously. Unfortunately WordPress pseudo cron does not protected against its simultaneously executions. Thus one instance doesn't know that another one is already syndicating the same post as it going to syndicate itself.
However there is an easy solution to avoid this problem. Just switch the RSS pull mode to "by cron job or manually" and setup a real cron job at your host to make CyberSEO pull the feeds say once a hour.
Yes it does. More examples from Login to see this link:
Examples
The following line specifies that the Apache error log is to be cleared at one minute past midnight (00:01) of every day of the month, of every day of the week, assuming that the default shell for the cron user is Bourne shell compliant:
1 0 * * * printf > /www/apache/logs/error_log
The following line causes the user program
Login to see the code
– possibly a Perl script – to be run every two hours, namely at midnight, 2am, 4am, 6am, 8am, and so on:
0 */2 * * * /home/username/test.pl
Predefined scheduling definitions
There are several special predefined values which can be used to substitute the CRON expression.
Entry | Description | Equivalent To |
---|---|---|
Login to see the code | Run once a year, midnight, Jan. 1st | Login to see the code |
Login to see the code | Run once a month, midnight, first of month | Login to see the code |
Login to see the code | Run once a week, midnight on Sunday | Login to see the code |
Login to see the code | Run once a day, midnight | Login to see the code |
Login to see the code | Run once an hour, beginning of hour | Login to see the code |
Login to see the code | Run at startup |
* * * * * command to be executed ┬ ┬ ┬ ┬ ┬ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └───── day of week (0 - 6) (0 is Sunday, or use names) │ │ │ └────────── month (1 - 12) │ │ └─────────────── day of month (1 - 31) │ └──────────────────── hour (0 - 23) └───────────────────────── min (0 - 59)
3:50 am
Al right. Well I just switched to cron and set it to run every 30 minutes (the feeds are checked every 60 minutes, but since he usually can't do all of them at once I set it to run twice an hour) and I'm getting a lot of duplicates (10 - 20 per day).
I have tried all settings in the feeds: Guid + Title, Guid only and Title only. Nothing seems to fix it.
Right now I have them set to title only, since the duplicate checker finds them very accurately when I check for duplicates with the 'search by title' setting.
So, what am I missing here?
I am also using the 'schedule-function' to randomly publish the grabbed posts within 0 and 120 minutes from syndication time. Could that have something to do with it? Does CyberSEO only 'see' published posts and only checks for duplicates in the posts that are published?
-----
I have 'Cloack GUID's' enabled. Could that have something to do with it? What are the consequences for for example Pagerank and or position of syndicated posts if the GUID's are not cloaked?
First of all, you may use the built-in "Post date adjustment range" feature to randomly publish the grabbed posts within 0 and 120 minutes from syndication time.
The CyberSEO does check the new post for duplicates in all posts including the scheduled and deleted ones (those that were moved to a trash can).
I can't say why you are still getting the duplicates if as you said above you are using cron. Please email me with login/password to your blog's control panel and I'll take a look at your settings.
Login to see the quote
No, the duplicate checker does see the unpublished posts without any problems. You may do a simple experiment. Pull a post from any feed, change its status to "Pending Review" or set its date to "future", then pull the feed again. You'll see that post will not be syndicated again.
I've checked your blog for duplicated posts but don't see any. I mean there are almost similar posts but they have some differences (slightly different titles etc).
It seems you are pulling the feeds from other blogs so often. Thus sometimes the posts get pulled before they were re-written/modified/replaced in the original feed which you are syndicating from. This doesn't look like a problem with the plugin.
4:50 am
Al right. So I think I fixed it by doing the following:
1) First I did a run in PhpMyAdmin to remove all the "empty" files that were added over time by crappy, free plugins that I used to use to auto create featured images and auto schedule posts: DELETE FROM `wp_posts` WHERE `post_title` = ''
2) Then I set the duplicates check for every post to "Guid and Title"
3) Then I DISABLED the Fake GUID's option from the CyberSEO > Tools-screen.
It's been running for 3 hours now and did 6 crons and pulled multiple feeds. I've seen 6 posts being added and none of them were duplicates. So I'm pretty sure it's fixed now.
Some other good tips (from CyberSEO himself):
Lower the max. number of posts to add to 1 or 2 (I have it set at 2) and randomize the time windows in which all feeds should be pulled, in order to not have them pulled all together.
Thanks again for all your help, you've been a great help.
-----
Sidenote: The fact that the Fake GUID doesn't work. Is that because of the feeds? Or some bug? I'm running a similar site (same setup, server, plugins, everything, only different feeds) on the same server and there I have it enabled without ever having duplicates. Your thoughts?
I don't think that "Fake GUIDs" may affect the post syndication routine. The "Fake GUIDs" option is a run-time only and it affects your own feeds only. When it's enabled, all the post GUID's it your feed get cloaked. When disabled, the actual post GUID's in the feed will be shown (this may reveal the original post source to search engines).
6:27 am
Okay, so with Fake GUID's enabled and duplicates check set to GUID only, my database was flooded with duplicates. So I guess the system has a hard time comparing the Fake GUID's to the actual GUID's. Does he 'remember' the original GUID's of each post? If not, then this result isn't surprising.
Could it be that my site's having so much duplicates because it has a giant database (10.000+ posts)? In that case, wouldn't it be handy to have a setting to limit the search for duplicates to a period, e.g. max. 1 week ago, 1 month ago, etc.
Right now, I have set duplicates check to 'title only' since that seemed to give the best results before. I will let you know how it goes...
No, the plugin does not compare any fake GUID's and they are not stored anywhere. As I said above it's just a run-time thing. When fake GUID's are enabled and when your RSS feed is get rendered (e.g. when somebody opened yoursite.com/feed in the browser) the plugin replaces the actual syndicated post GUID's by fake ones. It does not change anything in database and it does not use these fake GUID's anywhere else.
Most Users Ever Online: 541
Currently Online:
26 Guest(s)
Currently Browsing this Page:
2 Guest(s)
Top Posters:
ninja321: 84
s.baryshev.aoasp: 68
Freedom: 61
Pandermos: 54
MediFormatica: 49
B8europe: 48
Member Stats:
Guest Posters: 337
Members: 2855
Moderators: 0
Admins: 1
Forum Stats:
Groups: 1
Forums: 5
Topics: 1642
Posts: 8357
Newest Members:
samuel2288, comercios.cercademi, wanmarkets, torontomark48, info.ckmedianetwork, contact.mybeautystarAdministrators: CyberSEO: 3949