Yahoo and the great SPAM ripoff

In my private life I run a mail server, have done so for almost 20 years.  Recently (as in the last 12 months) we moved to a VPS because I no longer had (or wanted) the infrastructure to run a server in-house.  Since then I've been fighting a losing battle to get legitimate email to users on Yahoo particularly and some other ISPs.  Why? Well Yahoo won't tell me. Apparently they don't really want to fix the SPAM issue, they just want me to go away.  I have yet to get one piece of information that I could actually use to figure out why I'm being blocked.  Yet the biggest spammers I see hitting my SPAM mailbox are from Yahoo.  So while they don't mind their own users sending SPAM (and OK, some of it is just masquerading as coming from Yahoo, but still...) they will block legitimate mail to their users with apparently no real recourse apart from telling you to check everything you've already checked and fix the problems.

Here's a hint Yahoo.  If you don't tell me what the problems are, I can't fix them.  A basic tenet of support. If you keep me in the dark then I will have to start banning users with yahoo.com* addresses as it is just too hard to deal with you.  I'll have to tell them all to get a real mail account, like Gmail, so they don't have to bitch about not getting important emails. Perhaps this is why you are losing users?

While I was researching the possible reasons for by blocking I found that there are some really dodgy practices going on out there.  I checked my Reputation Score with a number of sources, and found that it was pretty good, except for one, Cisco's SenderBase.  And I can't send email to them about the problem from the problematic domain. Mind you one had information on email that went back 4 years and showed no recent activity - and I didn't own that IP address 4 years ago.   There is one, Barracuda, that suggests you sign up at EmailReg.org for $20 per year to bypass their reputation filter - um, that is uncomfortably close to blackmail for my liking.  Then there is the abuse.net system.  It appears that many reputation scores will reduce your reputation if you haven't taken the trouble to register a contact address for your domains at abuse.net.  WTF? The internet standard is that there is always a postmaster@ address that must go to a real person.  In recent times it has also become common practice to have an abuse@ address.  Since when does it become mandatory to register this somewhere?  How does that in any way contribute to the removal of SPAM?

Abuse.net also has some quirks.  With the rise of SPAM many people who set up a new domain take the precaution of using a hidden registration option offered by many domain registrars, to avoid having your contact details harvestable by spammers.  And guess what, you can't register these at abuse.net because they believe that you are likely to be a spammer if you want anonymity. Catch 22.

So the upshot appears to be don't send email - use a carrier pigeon instead.

See you at LCA2012

Linux.conf.au this year kicks off next week in Ballarat, just down the road from me.  I'll be there and even have a speaking gig, not in the main conference but in the HA and Distributed Storage mini-conf before the main event.

I'll be talking about MySQL®, the companies that now exist to support it, and the third party products that are starting to proliferate in what appears to be a community effort to address perceived shortcomings in the Oracle offerings.  Many of these offerings are in the HA space, and there have been some pretty amazing developments recently.

So if you want to find some history of Oracle's effect on MySQL's development (and a hint, it started well before the Sun acquisition), or you want to find out what is out there to assist you in developing HA and Distributed database systems, come along.  I'll be wearing my SkySQL shirt so I should be easy to pick.

MySQL 101 - Replication

So far we've looked at many aspects of MySQL, not in any great depth, but hopefully with enough information to get you started and whet your appetite for more.  Now we start to look into areas that aren't in the basic tutorials.

Replication is the technology that allows data to be stored on multiple servers. Typically this is used in "scale out" applications.  "Scale out" is used in contrast to "Scale up" where to scale a solution you buy a bigger box to run it on, where "scale out" means you buy more boxes.  Each has its benefits and drawbacks, with the usual benefit of scale out being that you get more bang for your buck.

The way replication works in MySQL is pretty simple.  One server is identified as the master, and writes every transaction to a file, the binary log.  Other servers (and there may be many) act as slaves and request information from the master. The slave keeps track of where it got up to and asks the master for the next transaction in the file.  In general the master doesn't know or care where the slave is up to, it just sends out the requested transaction to whoever has the right credentials to ask for it.

On the slave there are two threads running, one that requests events from the master binlog (binary log) and writes them to the relay log (which in reality is just another binlog), and the second thread that reads the relay log and executes the queries. In order to avoid non-deterministic outcomes the SQL thread is just that, a single thread.  Long running events can result in the replication lagging well behind the state of the master.

This is the traditional asynchronous replication.  Newer versions of MySQL now support semi-synchronous replication as well.  In this mode the master will not return control back to the querying process until at least one slave reports that it has received the transaction events and written them to its own relay log.  Note that this is written to the relay log but not necessarily written to the database.

You may wonder under what circumstance replication would be of use.  If your application is mainly read-intensive (as many web applications are) then replication gives you the ability to spread reads across multiple slaves to reduce contention.  If your application is mainly write intensive, then replication to multiple slaves will not, of itself, give you any relief.  Indeed, because the replication is single-threaded and all writes on the master must be written to the slaves, you can in fact hurt performance with a replicated setup.

What about more than one master?  This is possible but requires careful thought and planning.  MySQL replication keeps track of which server initiated the transaction, so "circular replication" where a slave acts also as a master should not cause problems, however there are some serious issues to consider.  Take for instance a customer table with the customer ID being auto incremented.  What happens if there are writes to both masters in a master<->master replication setup?  Both will grab the next ID in sequence, then the other will pull that transaction and try and will end up with a duplicate index error, causing replication to fail.  You can get around this by setting the initial auto_increment value on each server and setting the auto_increment_increment value to the number of masters in use.  Remebering however that both masters now have to write all traffic that goes to the pair, there is not a great deal of benefit in such an arrangement.  Other technologies, such as sharding, turn out to offer better scale out opportunities for multi-master deployments.

There are, however, some tools to make multi-master work more easily.  MMM (Multi-Master Replication Manager for MySQL) is one.  MHA (Master High Availability for MySQL) is somewhat different, allowing easy management of master failover.

That is probably enough theory, lets look at how to build a replication system.

For the master, the only requirement is that it writes its transactions to a binary log, and that it has a server ID set.  To do this we need to add the following to the my.cnf file:

server-id = 1
log_bin = /var/log/mysql/mysql-bin.log

The server-id is an integer that must be unique for all servers in the same replication set.  This is because of the tracking of transactions to avoid circular replication errors.  The log_bin is the path to the binary log file to use.  Each file will be appended with a sequence number to allow the server to rotate log files when they grow too large, or a flush request is received.

One other value that is useful is to set an expiry on the binlogs so that they don't take over all of your disk.  You need to ensure that you have enough binlog capacity for normal replication lag and for handling backup/recovery, but beyond that you can get MySQL to automatically delete the older binlogs.

expire_log_days = 14

Now the master will at least be writing transactions to the binary logs.  Now we have to set up to allow slaves to replicate.  First step is to create a user that the slaves will connect with.  This user must have REPLICATION SLAVE permissions to be able to read the binlog.  Note that this user is created on the master. As a privileged user, in the MySQL client:

CREATE USER 'replicator'@'192.168.%' IDENTIFIED BY 'mypass';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'192.168.%';

The first command creates our user, the second grants the required permissions.  Note that I've used a wildcarded network address, you can also use a hostname wildcard.

For the slave we need a few things, first we need to set a server-id that is not the same as the master (or any other slave we are creating). Then we need a copy of the data on the master at the point at which the slave is to be started, and we need to then tell the slave about the master.

We can create the server-id in the same manner as we did for the master - but we don't need to specify a binlog (unless we are acting as a relay master).

server-id = 2

To get our first copy of the master data, the best idea is to use mysqldump.  Before we do, we need to stop the master from writing to the binary log and record its current position.  From the mysql command line:

FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000092 | 106 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

We need to keep this session open so that the lock remains in place, and from another terminal run a mysqldump.

mysqldump --all-databases > all_db.sql

Once the backup is completed we can either drop the lock session or issue:

UNLOCK TABLES;

Great, we now have a backup and we have the position in the binary log at the point at which the backup occurred.  These two together provide us with all we need to create our slave.

First step is to load the above SQL file into the slave.   On the slave you can use the mysql command line client with the SQL file copied from the master.

mysql < all_db.sql

On the slave server we now need to issue a CHANGE MASTER command to tell it where the master is, and where to start replicating from.  This information will be written to the master.info file that is read at startup and updated by the replication process so that the slave will be able to resync with the master on restart.

CHANGE MASTER TO MASTER_HOST = 'master'
MASTER_USER = 'replicator'
MASTER_PASSWORD = 'mypass'
MASTER_LOG_FILE = 'mysql-bin.000092'
MASTER_LOG_POS = 106;

The MASTER_HOST, MASTER_USER and MASTER_PASSWORD must match the master and the user created above with replication slave permission.  The MASTER_LOG_FILE and MASTER_LOG_POS come from the SHOW MASTER STATUS used when we did the backup of the master.

Now all that is left to do is to start the slave processes:

START SLAVE;

We can use the same backup and the same processes above to create as many slaves as we need.

If you have to set up a complex replication scenario that is not covered by the above, you may find the SkySQL Reference Architecture provisioning system useful.  This service creates master/slave high availability configurations and packages software and configurations together so that you can easily install a system from scratch that meets all your replication needs.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.

From MySQL to SkySQL - Reflections

This month marked SkySQL's first anniversary.  It also marks my 6 months with the company, after being with MySQL AB (and then Sun and finally Oracle) for over 5 years. As good a time as any for a look back and some musings of the future.

In late 2005 I took a position in the web team (a part of Marketing) at MySQL AB.  It was a great company to work for, with a great team of people and a truly great spirit.  The idea that they were making a difference every day was palpable. I then saw it taken over by Sun Microsystems and shortly after by Oracle Corporation.  We went from 500 employees to 30,000 and then to 100,000.  My job satisfaction started to decline.  I felt more like I was battling for every concession rather than revelling in what we were able to achieve.  I needed a change.

When I joined SkySQL it was like going home - I knew pretty much everyone at the company and that great spirit of family, that sense of being able to achieve great things, the coming together of like minds with a shared vision was what drew me there and continues to inspire me.  It is, after all, about the people.  The team that created the world class support that MySQL AB was renowned for, the team that developed the training courses that brought MySQL knowledge to thousands, the management that inspired me in the early days at MySQL AB were all here, doing what they do best.

One of the prime reasons I moved from Oracle was that I saw from the inside the change in focus on MySQL. I saw that the team that manages the mysql.com domains cut from 6 down to 2. I saw the desperate backroom struggles to ensure the MySQL community sites were not taken down. I saw some truly stupid management decisions based solely on personal animosity rather than sound business sense. I saw the sales team starved of leads because of that stupidity. I saw the inter-departmental rivalry that instead of being a force for excellence was a block to cooperation and a stifling of innovation. I saw great talent leaving the company in an air of pessimism.

What I see in SkySQL is an optimism and a sense of shared goal, a vitality and a willingness to go beyond the normal to achieve excellence. I see what we used to call the "can-do" spirit.  That ethos is what keeps me showing up at work each day, trying to do my part in the bigger picture and making SkySQL the first place you think of when you think open databases.

There are some exciting things happening at SkySQL and the future is looking very bright.  I feel honoured to be a part of it all.

MySQL 101 - More Transactions

In our last episode we looked at transactions and how to create them.  In this episode I'll look at some of the implications of transactions, especially in a web application.

Transactions and Replication

We will discuss replication in depth later in the series, however it is sufficient for the moment to say that replication allows you to copy data in near real-time between MySQL servers and keep them synchronised.  What gets transferred are the changes that are made to your tables and data. So what about rolled-back (aborted) transactions?  Since the state after the rollback is essentially the same as the state before the transaction started, there seems little point in replicating those statements.  Indeed they are not replicated. Only completed transactions are.

I briefly mentioned that there is an autocommit setting that means that without using a transaction block, every query becomes its own transaction.  In the light of replication this makes a lot of sense.

Transaction Duration

Transactions may need to lock resources so that concurrent changes don't clobber each other, while allowing other queries access to the data that may be subject to a transaction, in a consistent manner. How this is done depends on the transaction isolation level, but in all levels you can be sure that inserting into a primary key will lock that index value out from other transactions doing the same thing.  In general this means that the second transaction will block until the first transaction is either committed or rolled back.  For web, having a locked resource hanging around for any length of time is a real killer.  This can be avoided by keeping transactions as short as possible, and to be careful of using range-based operations.

Let us say that we have two people trying to buy the same book.  As you saw from the previous episode the transaction we've defined does a number of operations, all of which need to complete before the book can be bought.  They are:

  • Create a purchase order record
  • Create purchase order item records for the book(s) purchased
  • Update the summary field on the purchase order record
  • Decrement the stock count for the book(s) purchased

If you look at those steps, you could see how a web application designer might get themselves into trouble.  If you created a purchase order and allow multiple books to be bought, then you may be tempted to keep the transaction open while the user browses and gets all their books purchased.  This would be a complete disaster.  It would mean there was an open-ended time the transaction was held, with locks on things like the book record (for decrementing the count) and therefore the entire system would almost certainly become deadlocked at some point.  This is where you need to be very aware of what the logic of the application is and how to bound the transactions.

In the above case we could handle this in a number of ways.  The easiest would be to create the purchase order record as a separate transaction, then for each book purchased, do the remaining three steps in another transaction.  Another way would be to separate the actual purchase order creation from the selection process.  You'll have seen this in process in a number of sites - where you have a Shopping Cart that you add books to and it isn't until checkout that it all ends up as a single purchase order.

No matter which way you do it - remember that you should never create a transaction that is unbounded in time. And on the web that means never creating a transaction that needs user interaction within the transaction.

Short Circuited Transactions

There are some operations that, if placed within a transaction, will perform an implicit commit on the statements that have preceeded it, effectively short-circuiting the transaction and creating a multiple transaction set.  Things like DDL (Data Definition Language) statements that modify the table structure or add indexes will be run in their own transaction, causing an implicit commit of any statements preceeding them.  DDL statements, in general, cannot be rolled back as they run in their own transaction.

More information on the caveats with transactions can be found in the online manual reference.

Disclaimer

I am employed by SkySQL Ab, which provides support, training and services for MySQL, MariaDB and Drizzle and is staffed by many ex-MySQL AB employees. While I try to be unbiased in all of my public opinions, and my opinions are all my own and not those of my employer, I work for SkySQL because I believe in what they are doing and therefore cannot be truly unbiased.

MySQL is a registered trademark of Oracle Corporation,  MariaDB is a trademark of MontyProgram, and Drizzle is a trademark.

:: Next >>