May 4, 2011

duplicates | CyberSEO Pro | Support Forum

Avatar

Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

No permission to create posts
sp_Related Related Topics sp_TopicIcon
duplicates
Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 Topic Rating: 0 (0 votes) 
March 25, 2012
10:36 pm
Avatar
null
Member
Members
Forum Posts: 22
Member Since:
March 25, 2012
sp_UserOfflineSmall Offline

Hi,

I use CyberSEO, it's nice, but not much different from feedWordpress. I bought cyberseo because i hoped

it wouldn't import as many dupliactes (from 1 rss feed, sometimes it makes 3-4 exact same posts)

I have GUID + title checks enabled, anyone have a better/extra solution?

thanks!

March 25, 2012
10:57 pm
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

This is not a problem of CyberSEO but a problem of WordPress itself. When the plugin syndicates a feed in auto mode, it does it via WordPress pseudo cron, which activates when somebody opens your blog in browser. The problem arises when a few instances of your blog get opened simultaneously. Unfortunately WordPress pseudo cron does not protected against its simultaneously executions. Thus one instance doesn't know that another one is already syndicating the same post as it going to syndicate itself.

However there is an easy solution to avoid this problem. Just switch the RSS pull mode to "by cron job or manually" and setup a real cron job at your host to make CyberSEO pull the feeds say once a hour.

March 25, 2012
11:09 pm
Avatar
null
Member
Members
Forum Posts: 22
Member Since:
March 25, 2012
sp_UserOfflineSmall Offline

Yes i have sometimes 100-150 visitors at same time

i used directadmin to set cron; does this pull it every hour?

Minute Hour Day of
Month
Month Day of
Week
Command Select
0 * * * * /usr/bin/curl –silent Login to see this link..7824678xds
March 25, 2012
11:58 pm
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

Yes it does. More examples from Login to see this link:

Examples

The following line specifies that the Apache error log is to be cleared at one minute past midnight (00:01) of every day of the month, of every day of the week, assuming that the default shell for the cron user is Bourne shell compliant:

1 0 * * *  printf > /www/apache/logs/error_log

The following line causes the user program

Login to see the code

– possibly a Perl script – to be run every two hours, namely at midnight, 2am, 4am, 6am, 8am, and so on:

0 */2 * * *  /home/username/test.pl

Predefined scheduling definitions

There are several special predefined values which can be used to substitute the CRON expression.

Entry Description Equivalent To
Login to see the code Run once a year, midnight, Jan. 1st Login to see the code
Login to see the code Run once a month, midnight, first of month Login to see the code
Login to see the code Run once a week, midnight on Sunday Login to see the code
Login to see the code Run once a day, midnight Login to see the code
Login to see the code Run once an hour, beginning of hour Login to see the code
Login to see the code Run at startup  

 

*    *    *    *    *  command to be executed
┬    ┬    ┬    ┬    ┬
│    │    │    │    │
│    │    │    │    │
│    │    │    │    └───── day of week (0 - 6) (0 is Sunday, or use names)
│    │    │    └────────── month (1 - 12)
│    │    └─────────────── day of month (1 - 31)
│    └──────────────────── hour (0 - 23)
└───────────────────────── min (0 - 59)
March 26, 2012
12:25 am
Avatar
null
Member
Members
Forum Posts: 22
Member Since:
March 25, 2012
sp_UserOfflineSmall Offline

set the cron, ill let you know how it went.

Thanks anyway for your super fast responses!

March 26, 2012
8:00 pm
Avatar
null
Member
Members
Forum Posts: 22
Member Since:
March 25, 2012
sp_UserOfflineSmall Offline

Tested for 1 day, 0 duplicates. usually i had about 10-20 duplicates every day so it seems cron is the way to go if you have a slightly busy site (15,000-20,000 visitors per day)

thanks again

March 26, 2012
8:20 pm
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline

You are welcome. With cron duplicates are absolutely ruled out.

August 12, 2012
3:50 am
Avatar
Downloadz Portal
Guest
Guests

Al right. Well I just switched to cron and set it to run every 30 minutes (the feeds are checked every 60 minutes, but since he usually can't do all of them at once I set it to run twice an hour) and I'm getting a lot of duplicates (10 - 20 per day).

I have tried all settings in the feeds: Guid + Title, Guid only and Title only. Nothing seems to fix it.

Right now I have them set to title only, since the duplicate checker finds them very accurately when I check for duplicates with the 'search by title' setting.

So, what am I missing here?

I am also using the 'schedule-function' to randomly publish the grabbed posts within 0 and 120 minutes from syndication time. Could that have something to do with it? Does CyberSEO only 'see' published posts and only checks for duplicates in the posts that are published?

-----

I have 'Cloack GUID's' enabled. Could that have something to do with it? What are the consequences for for example Pagerank and or position of syndicated posts if the GUID's are not cloaked?

August 15, 2012
6:21 am
Avatar
Downloadz Portal
Guest
Guests

I have tried different settings since the last post and nothing seems to work.

Could you please help me out here?

August 16, 2012
12:21 am
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline
10sp_Permalink sp_Print
0

First of all, you may use the built-in "Post date adjustment range" feature to randomly publish the grabbed posts within 0 and 120 minutes from syndication time.

The CyberSEO does check the new post for duplicates in all posts including the scheduled and deleted ones (those that were moved to a trash can).

I can't say why you are still getting the duplicates if as you said above you are using cron. Please email me with login/password to your blog's control panel and I'll take a look at your settings.

August 16, 2012
3:04 am
Avatar
Downloadz Portal
Guest
Guests
11sp_Permalink sp_Print
0

I have e-mailed you the login-details to my control panel and have made you an admin-account in wordpress of which you should've received the login-details in your inbox.

Hope you can figure it out and thanks for all your help!

August 18, 2012
2:07 am
Avatar
Downloadz Portal
Guest
Guests
12sp_Permalink sp_Print
0

Have you had any chance of looking into it yet?

I already made an account for you.

August 18, 2012
4:53 am
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline
13sp_Permalink sp_Print
0

Yes, I did and I sent you an email about it where I'm asking which exactly feed makes the duplicates at your blog (there are many of them).

August 19, 2012
10:32 am
Avatar
Downloadz Portal
Guest
Guests
14sp_Permalink sp_Print
0

The crazy thing is, that the post-slugs (only the post-ID's are different) are exactly the same. And still he doesn't recognize them as duplicates.

Btw. if the posts aren't published, the duplicates checker doesn't 'see' them.

August 19, 2012
9:19 pm
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline
15sp_Permalink sp_Print
0

Login to see the quote

No, the duplicate checker does see the unpublished posts without any problems. You may do a simple experiment. Pull a post from any feed, change its status to "Pending Review" or set its date to "future", then pull the feed again. You'll see that post will not be syndicated again.

I've checked your blog for duplicated posts but don't see any. I mean there are almost similar posts but they have some differences (slightly different titles etc).

It seems you are pulling the feeds from other blogs so often. Thus sometimes the posts get pulled before they were re-written/modified/replaced in the original feed which you are syndicating from. This doesn't look like a problem with the plugin.

August 21, 2012
4:50 am
Avatar
Downloadz Portal
Guest
Guests
16sp_Permalink sp_Print
0

Al right. So I think I fixed it by doing the following:

1) First I did a run in PhpMyAdmin to remove all the "empty" files that were added over time by crappy, free plugins that I used to use to auto create featured images and auto schedule posts: DELETE FROM `wp_posts` WHERE `post_title` = ''

2) Then I set the duplicates check for every post to "Guid and Title"

3) Then I DISABLED the Fake GUID's option from the CyberSEO > Tools-screen.

It's been running for 3 hours now and did 6 crons and pulled multiple feeds. I've seen 6 posts being added and none of them were duplicates. So I'm pretty sure it's fixed now.

Some other good tips (from CyberSEO himself):

Lower the max. number of posts to add to 1 or 2 (I have it set at 2) and randomize the time windows in which all feeds should be pulled, in order to not have them pulled all together.

Thanks again for all your help, you've been a great help.

-----

Sidenote: The fact that the Fake GUID doesn't work. Is that because of the feeds? Or some bug? I'm running a similar site (same setup, server, plugins, everything, only different feeds) on the same server and there I have it enabled without ever having duplicates. Your thoughts?

August 21, 2012
11:09 pm
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline
17sp_Permalink sp_Print
0

I don't think that "Fake GUIDs" may affect the post syndication routine. The "Fake GUIDs" option is a run-time only and it affects your own feeds only. When it's enabled, all the post GUID's it your feed get cloaked. When disabled, the actual post GUID's in the feed will be shown (this may reveal the original post source to search engines).

August 21, 2012
11:20 pm
Avatar
Downloadz Portal
Guest
Guests
18sp_Permalink sp_Print
0

Well, we don't want that. I turned it back on and see how it goes.

I checked this morning for duplicates, and it found three. So, I guess my fix didn't work entirely.

August 22, 2012
6:27 am
Avatar
Downloadz Portal
Guest
Guests
19sp_Permalink sp_Print
0

Okay, so with Fake GUID's enabled and duplicates check set to GUID only, my database was flooded with duplicates. So I guess the system has a hard time comparing the Fake GUID's to the actual GUID's. Does he 'remember' the original GUID's of each post? If not, then this result isn't surprising.

Could it be that my site's having so much duplicates because it has a giant database (10.000+ posts)? In that case, wouldn't it be handy to have a setting to limit the search for duplicates to a period, e.g. max. 1 week ago, 1 month ago, etc.

Right now, I have set duplicates check to 'title only' since that seemed to give the best results before. I will let you know how it goes...

August 22, 2012
7:59 am
Avatar
CyberSEO
Admin
Forum Posts: 3949
Member Since:
July 2, 2009
sp_UserOfflineSmall Offline
20sp_Permalink sp_Print
0

No, the plugin does not compare any fake GUID's and they are not stored anywhere. As I said above it's just a run-time thing. When fake GUID's are enabled and when your RSS feed is get rendered (e.g. when somebody opened yoursite.com/feed in the browser) the plugin replaces the actual syndicated post GUID's by fake ones. It does not change anything in database and it does not use these fake GUID's anywhere else.

No permission to create posts
Forum Timezone: Europe/Amsterdam

Most Users Ever Online: 541

Currently Online:
26 Guest(s)

Currently Browsing this Page:
2 Guest(s)

Top Posters:

ninja321: 84

s.baryshev.aoasp: 68

Freedom: 61

Pandermos: 54

MediFormatica: 49

B8europe: 48

Member Stats:

Guest Posters: 337

Members: 2855

Moderators: 0

Admins: 1

Forum Stats:

Groups: 1

Forums: 5

Topics: 1642

Posts: 8357

Newest Members:

samuel2288, comercios.cercademi, wanmarkets, torontomark48, info.ckmedianetwork, contact.mybeautystar

Administrators: CyberSEO: 3949