Benvenuti!

sette.jpg


Newsfeeds
Planet MySQL
Planet MySQL - https://planet.mysql.com

  • Using a trigger to parse non-conforming data in MySQL
    In another post, I explained how to use the MySQL Query Re-write Plugin to manipulate data that didn’t exactly match SQL standards of MySQL. In this post, I am going to give you another example on how to use a trigger to parse non-conforming data so the data can be written to a MySQL database. A customer came to me with a problem. They were using third-party software which produced multiple rows of comma-separated data (like a .csv file), but all of the data wasn’t on a single row. Instead, the data consisted of multiple rows of data, separated by commas and with line feeds after each row of data. Here is a shortened example of the data (the original data was 32-lines long): Header information which may, or may not, contain one or more commas Item Name,Type,Description,Value Potato Chips - Sealed Bag,,, Manifest Number,Number,MANIFEST_NUMBER,8480014 Manufacturer Code,Number,BRAND_CODE,80375993 Information 1,Number,INFO_ONE,15869563 Information 2,Number,INFO_TWO,6569569665 Information 3,Number,INFO_THREE,562 Planned Start Date,String,PROD_START_DATE,9/13/2018 Planned Start Time,String,PROD_START_TIME,081234 For some reason, the output couldn’t be separated, but the software could use an ODBC connection. They needed to be able to insert this data into a MySQL database, but they only needed certain values – they didn’t need a lot of the descriptive information. Of the example above, they only wanted to insert the information in yellow: “Header information which may, or may not, contain one or more commas.” Item Name Type Description Value Potato Chips – Sealed Bag Manifest Number Number MANIFEST_NUMBER 8480014 Manufacturer Code Number BRAND_CODE 80375993 Information 1 Number INFO_ONE 15869563 Information 2 Number INFO_TWO 6569569665 Information 3 Number INFO_THREE 562 Planned Start Date String PROD_START_DATE 9/13/2018 Planned Start Time String PROD_START_TIME 081234 At first, I suggested using a script (Python, Perl, PHP, etc.) to parse the data before inserting into the database. But, the software which produced the data had the ability to insert the data directly into a MySQL database – but all of the data could only be inserted into one field. They didn’t have an option to add an extra step outside of the database connection – and they were new to MySQL, so they needed some help. I decided to write a trigger for the table, in order to parse the data as it came into the database, but before it was inserted into a row. I also wanted to keep the original data in a column as well. The first step was to create a database and a table. I mapped the column names to the first value of each row, skipping the header information which wasn’t needed. The column names are in blue, and the data is in yellow: “Header information which may, or may not, contain one or more commas.” Item Name Type Description Value Potato Chips – Sealed Bag Manifest Number Number MANIFEST_NUMBER 8480014 Manufacturer Code Number BRAND_CODE 80375993 Information 1 Number INFO_ONE 15869563 Information 2 Number INFO_TWO 6569569665 Information 3 Number INFO_THREE 562 Planned Start Date String PROD_START_DATE 9/13/2018 Planned Start Time String PROD_START_TIME 081234 I can then match the columns I want to the data values: “Header information which may, or may not, contain one or more commas.” Item Name Potato Chips – Sealed Bag Manifest Number 8480014 Manufacturer Code 80375993 Information 1 15869563 Information 2 6569569665 Information 3 562 Planned Start Date 9/13/2018 Planned Start Time 081234 Now, I can create the database and the table. I substituted the spaces in the field names with underscores (_). I also added a primary key column (id_MANIFEST_DATA) and a proper DATETIME column – as I want to combine the two columns PROD_START_DATE and PROD_START_TIME into one column to match MySQL’s DATETIME format. This way, they can perform a search on this column later (if necessary). mysql> create database MANIFEST_DATA_DB; Query OK, 1 row affected (0.00 sec) mysql> use MANIFEST_DATA_DB; Database changed mysql> CREATE TABLE `MANIFEST_DATA_DB`.`MANIFEST_DATA_TABLE` ( -> `id_MANIFEST_DATA` INT NOT NULL AUTO_INCREMENT, -> `MANIFEST_DATA` VARCHAR(4096) NULL, -> `ITEM_NAME` VARCHAR(1024) NULL, -> `MANIFEST_NUMBER` INT NULL, -> `MANUFACTURER_CODE` VARCHAR(1024) NULL, -> `INFO_ONE` CHAR(32) NULL, -> `INFO_TWO` CHAR(32) NULL, -> `INFO_THREE` CHAR(32) NULL, -> `PROD_START_DATE` CHAR(10) NULL, -> `PROD_START_TIME` CHAR(6) NULL, -> `PROD_TIMESTAMP` DATETIME NULL, -> PRIMARY KEY (`id_MANIFEST_DATA`)) -> AUTO_INCREMENT = 1000000; Query OK, 0 rows affected (0.00 sec) The initial “non-conforming data” will be inserted into the MANIFEST_DATA field, so all I have to do is to create a trigger to parse this field before the data is inserted. Even though the data is on separate lines, parsing this data will be relatively easy, since the data is comma-delimited. I can use the SUBSTRING_INDEX function to create an array to store all of the data. But, since the first line of the data may or may not contain a comma, instead of counting commas from the beginning of the data, I will start at the end. Also, in this example, they don’t need the first line of data, as it is header information. Let’s take a look at why I want to count backwards. Here are three rows of data – where the first column may or may not contain a comma – or it might contain two or more commas. I really only want to capture the last two columns of data. "Hello, it's me",12345,ABCDE "Hello it's me",67890,FGHIJ "Hello, yes, it's me",112233,A1B2C3 If I parse the data based upon commas and start at the beginning, I will get different results when counting commas from the beginning of the data when the first line of data contains a comma. And I only want the data in green: 1 2 3 4 5 “Hello it’s me” 67890 FGHIJ “Hello it’s me” 12345 ABCDE “Hello yes it’s me” 112233 A1B2C3 But if I count backwards, I will get the same result set regardless of how many commas are in the first line of data: -5 -4 -3 -2 -1 “Hello it’s me” 67890 FGHIJ “Hello it’s me” 12345 ABCDE “Hello yes it’s me” 112233 A1B2C3 In the actual data I want to sue, I don’t want to store the first row of data anyway – so it can be ignored. If I did need the first line, I would have to search for the quotes and parse that column separately. Since the initial data will contain a line feed, I will want to replace the line feeds with a comma, so I can have all of the data on a single line and be able to use the SUBSTRING_INDEX function. Here is the original data again, but this time, I have added a column for the line feeds. Also, I am going to count backwards from the last data value. The numbers are in (bold): (-37)“Header information which may, or may not, contain one or more commas.” (-36) Item Name (-35) Type (-34) Description (-33) Value -line feed- (-32) Potato Chips – Sealed Bag (-31) (-30) (-29) -line feed- (-28) Manifest Number (-27) Number (-26) MANIFEST_NUMBER (-25) 8480014 -line feed- (-24) Manufacturer Code (-23) Number (-22) BRAND_CODE (-21) 80375993 -line feed- (-20) Information 1 (-19) Number (-18) INFO_ONE (-17) 15869563 -line feed- (-16) Information 2 (-15) Number (-14) INFO_TWO (-13) 6569569665 -line feed- (-12) Information 3 (-11) Number (-10) INFO_THREE (-9) 562 -line feed- (-8) Planned Start Date (-7) String (-6) PROD_START_DATE (-5) 9/13/2018 -line feed- (-4) Planned Start Time (-3) String (-2) PROD_START_TIME (-1) 081234 /tr> Of course, you don’t have to map out the fields like I did – you can just count backwards. The SUBSTRING_INDEX function works similar to grabbing a value out of an array – except the count value is where you want to stop grabbing data. You specify the string you want to parse, the delimiter you want to use, and the count of the value from the string where you want to stop grabbing data. The format is: SUBSTRING_INDEX(string,delimiter,count) Note: When the count value is a positive number, the count value is the ending position within the string. It doesn’t select the element itself. Here is an example using the phrase “Hello, I must be going, see you later, bye.” which contains four values separated by commas. I want to capture the third value of “see you later”. 1 2 3 4 Hello I must be going See you later bye To select “see you later”, I choose the third column as the place where I will stop grabbing data – but, since this is the stopping point, I will get the rest of the phrase up until the third column. mysql> select SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",3) as PARSE; +---------------------------------------+ | PARSE | +---------------------------------------+ | Hello, I must be going, see you later | +---------------------------------------+ 1 row in set (0.00 sec) I don’t want all of the phrase, I only want “see you later”. With a positive count value of three, I am getting the ending point – it stops at the third delimited value. If I use a negative count value, then I am getting the starting point of the string. If I count backwards, which would give me a negative count value (and I want to do this anyway because the first line of data in the main example may have multiple commas), I can use the value of negative two (-2) and get this: mysql> select SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",-2) as PARSE; +----------------------+ | PARSE | +----------------------+ | see you later, bye. | +----------------------+ 1 row in set (0.00 sec) That gives me a starting point for grabbing “see you later”, but I don’t want any data (the single word “bye”) after this. So, I can wrap an additional SUBSTRING_INDEX around the first one, and then only choose the first delimited set of data from the inner SUBSTRING_INDEX. It looks like this: The statement select SUBSTRING_INDEX(“Hello, I must be going, see you later, bye.”,”,”,-2) grabs both of these columns in blue: 1 2 3 4 Hello I must be going See you later bye But then I can use the second SUBSTRING_INDEX command to only select the first column of the remaining two columns: SUBSTRING_INDEX(SUBSTRING_INDEX(“Hello, I must be going, see you later, bye.”,”,”,-2), “,”, 1) 1 2 See you later bye I can test it: mysql> select SUBSTRING_INDEX(SUBSTRING_INDEX("Hello, I must be going, see you later, bye.",",",-2), ",", 1) as PARSE; +----------------+ | PARSE | +----------------+ | see you later | +----------------+ 1 row in set (0.01 sec) This is essentially how I am going to parse the long line of data (after I convert the line feeds to commas). I will go backwards to the starting point, and then only select the first column beginning at that starting point. I can then use the table I created to map put the data values I want to extract. Column Name Backwards Value ITEM_NAME -32 MANIFEST_NUMBER -25 MANUFACTURER_CODE -21 INFO_ONE -17 INFO_TWO -13 INFO_THREE -9 PROD_START_DATE -5 PROD_START_TIME -1 Example: To extract the PROD_START_TIME, which is at position -1, I will use this in the trigger: SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); I can now create my trigger to parse all of the data being inserted into the MANIFEST_DATA column. In the trigger, I will use some temporary variables to parse the MANIFEST_DATA, and I will also use the SUBSTRING function to parse the PROD_START_DATE and PROD_START_TIME fields to rearrange them into the correct MySQL DATETIME format. I will use the CONCAT function to combine them into a new DATETIME field. I have some notes in the stored procedure to help you figure out what I am doing. DELIMITER $$ CREATE TRIGGER _convert_MANIFEST_DATA BEFORE INSERT ON MANIFEST_DATA_DB.MANIFEST_DATA_TABLE FOR EACH ROW BEGIN # Declare temporary variables # This is the temporary field which contains all of the data DECLARE _parse_MANIFEST_DATA varchar(4096); # The temporary date and time fields needed to create the DATETIME field named PROD_TIMESTAMP DECLARE _parse_TIME_DATE_DATE char(10); DECLARE _parse_TIME_DATE_TIME char(10); DECLARE _parse_TIME_DATE_date_final char(10); DECLARE _parse_TIME_DATE_hour char(2); DECLARE _parse_TIME_DATE_minutes char(2); DECLARE _parse_TIME_DATE_sec char(2); # Take the incoming large dataset which has line feeds and # Replace the line feeds with a comma set _parse_MANIFEST_DATA = replace(NEW.MANIFEST_DATA, "\n", ","); # Parse each new column from the temporary field SET NEW.ITEM_NAME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -35), ",", 1); SET NEW.MANIFEST_NUMBER = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -25), ",", 1); SET NEW.MANUFACTURER_CODE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -21), ",", 1); SET NEW.INFO_ONE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -17), ",", 1); SET NEW.INFO_TWO = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -13), ",", 1); SET NEW.INFO_THREE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -9), ",", 1); SET NEW.PROD_START_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Get the values from these two fields in order to combine them into a DATETIME field SET _parse_TIME_DATE_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET _parse_TIME_DATE_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Convert the date from MM/DD/YYYY to YYYY-MM-DD SET _parse_TIME_DATE_date_final = STR_TO_DATE(_parse_TIME_DATE_DATE, '%c/%d/%Y'); # Parse the time so we can add colons between the hour, minutes and seconds SET _parse_TIME_DATE_hour = SUBSTRING(_parse_TIME_DATE_TIME, 1, 2); SET _parse_TIME_DATE_minutes = SUBSTRING(_parse_TIME_DATE_TIME, 3, 2); SET _parse_TIME_DATE_sec = SUBSTRING(_parse_TIME_DATE_TIME, 5, 2); # Combine the newly-formatted date with the newly-formatted time so it conforms to the DATETIME format SET NEW.PROD_TIMESTAMP = CONCAT(_parse_TIME_DATE_date_final, " ", _parse_TIME_DATE_hour, ":", _parse_TIME_DATE_minutes, ":", _parse_TIME_DATE_sec) ; END$$ DELIMITER ; Now I can insert the sample data into the database – where all of the data will go into the MANIFEST_DATA column, and the trigger will populate the rest of the columns by parsing the data. mysql> INSERT INTO MANIFEST_DATA_TABLE (MANIFEST_DATA) VALUES ('Header information which may, or may not, contain one or more commas '> Item Name,Type,Description,Value '> Potato Chips - Sealed Bag,,, '> Manifest Number,Number,MANIFEST_NUMBER,8480014 '> Manufacturer Code,Number,BRAND_CODE,80375993 '> Information 1,Number,INFO_ONE,15869563 '> Information 2,Number,INFO_TWO,6569569665 '> Information 3,Number,INFO_THREE,562 '> Planned Start Date,String,PROD_START_DATE,9/13/2018 '> Planned Start Time,String,PROD_START_TIME,081234'); Query OK, 1 row affected (0.07 sec) When I look at the contents of the table, I can see the trigger executed successfully. All of the “non-conforming” data is stored in the MANIFEST_DATA field, but the other fields were populated: mysql> select * from MANIFEST_DATA_TABLE\G *************************** 1. row *************************** id_MANIFEST_DATA: 1000000 MANIFEST_DATA: Header information which may, or may not, contain one or more commas Item Name,Type,Description,Value Potato Chips - Sealed Bag,,, Manifest Number,Number,MANIFEST_NUMBER,8480014 Manufacturer Code,Number,BRAND_CODE,80375993 Information 1,Number,INFO_ONE,15869563 Information 2,Number,INFO_TWO,6569569665 Information 3,Number,INFO_THREE,562 Planned Start Date,String,PROD_START_DATE,9/13/2018 Planned Start Time,String,PROD_START_TIME,081234 ITEM_NAME: Type MANIFEST_NUMBER: 8480014 MANUFACTURER_CODE: 80375993 INFO_ONE: 15869563 INFO_TWO: 6569569665 INFO_THREE: 562 PROD_START_DATE: 9/13/2018 PROD_START_TIME: 081234 PROD_TIMESTAMP: 2018-09-13 08:12:34 1 row in set (0.00 sec) Of course, this is only a short example of all of the parsing you can do with triggers in MySQL. And, if you want to look at your trigger, you can query the INFORMATION_SCHEMA.TRIGGERS table: mysql> SELECT * FROM INFORMATION_SCHEMA.TRIGGERS WHERE TRIGGER_NAME='_convert_MANIFEST_DATA'\G *************************** 1. row *************************** TRIGGER_CATALOG: def TRIGGER_SCHEMA: MANIFEST_DATA_DB TRIGGER_NAME: _convert_MANIFEST_DATA EVENT_MANIPULATION: INSERT EVENT_OBJECT_CATALOG: def EVENT_OBJECT_SCHEMA: MANIFEST_DATA_DB EVENT_OBJECT_TABLE: MANIFEST_DATA_TABLE ACTION_ORDER: 1 ACTION_CONDITION: NULL ACTION_STATEMENT: BEGIN # Declare temporary variables # This is the temporary field which contains all of the data DECLARE _parse_MANIFEST_DATA varchar(4096); # The temporary date and time fields needed to create the DATETIME field named PROD_TIMESTAMP DECLARE _parse_TIME_DATE_DATE char(10); DECLARE _parse_TIME_DATE_TIME char(10); DECLARE _parse_TIME_DATE_date_final char(10); DECLARE _parse_TIME_DATE_hour char(2); DECLARE _parse_TIME_DATE_minutes char(2); DECLARE _parse_TIME_DATE_sec char(2); # Take the incoming large dataset which has line feeds and # Replace the line feeds with a comma set _parse_MANIFEST_DATA = replace(NEW.MANIFEST_DATA, " ", ","); # Parse each new column from the temporary field SET NEW.ITEM_NAME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -35), ",", 1); SET NEW.MANIFEST_NUMBER = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -25), ",", 1); SET NEW.MANUFACTURER_CODE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -21), ",", 1); SET NEW.INFO_ONE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -17), ",", 1); SET NEW.INFO_TWO = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -13), ",", 1); SET NEW.INFO_THREE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -9), ",", 1); SET NEW.PROD_START_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET NEW.PROD_START_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Get the values from these two fields in order to combine them into a DATETIME field SET _parse_TIME_DATE_DATE = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -5), ",", 1); SET _parse_TIME_DATE_TIME = SUBSTRING_INDEX(SUBSTRING_INDEX(_parse_MANIFEST_DATA, ',', -1), ",", 1); # Convert the date from MM/DD/YYYY to YYYY-MM-DD SET _parse_TIME_DATE_date_final = STR_TO_DATE(_parse_TIME_DATE_DATE, '%c/%d/%Y'); # Parse the time so we can add colons between the hour, minutes and seconds SET _parse_TIME_DATE_hour = SUBSTRING(_parse_TIME_DATE_TIME, 1, 2); SET _parse_TIME_DATE_minutes = SUBSTRING(_parse_TIME_DATE_TIME, 3, 2); SET _parse_TIME_DATE_sec = SUBSTRING(_parse_TIME_DATE_TIME, 5, 2); # Combine the newly-formatted date with the newly-formatted time so it conforms to the DATETIME format SET NEW.PROD_TIMESTAMP = CONCAT(_parse_TIME_DATE_date_final, " ", _parse_TIME_DATE_hour, ":", _parse_TIME_DATE_minutes, ":", _parse_TIME_DATE_sec) ; END ACTION_ORIENTATION: ROW ACTION_TIMING: BEFORE ACTION_REFERENCE_OLD_TABLE: NULL ACTION_REFERENCE_NEW_TABLE: NULL ACTION_REFERENCE_OLD_ROW: OLD ACTION_REFERENCE_NEW_ROW: NEW CREATED: 2018-09-20 22:13:28.54 SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION DEFINER: root@localhost CHARACTER_SET_CLIENT: utf8mb4 COLLATION_CONNECTION: utf8mb4_0900_ai_ci DATABASE_COLLATION: utf8mb4_0900_ai_ci 1 row in set (0.02 sec)   Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn. Tony is the author of Twenty Forty-Four: The League of Patriots Visit http://2044thebook.com for more information. Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition Visit https://amzn.to/2oPFLI0 for more information.

  • How to check and fix MySQL replication inconsistencies ?
    There are several possibilities to end up with inconsistent MySQL replication, This could be accidental or intentional. In this blog I would like to discuss on how to identify the inconsistent slaves with master and fix them. I used here pt-table-checksum (to find the difference between master and slave) and pt-table-sync (to sync. between MySQL master and slave) from Percona Toolkit, The detailed documentation of Percona toolkit is available here for your reference. I expect / recommend you to be careful (as I mentioned above, sometimes records are inserted / deleted on MySQL slave intentionally) before using pt-table-checksum to sync. slave with master because rollbacking this task is even more expensive. The objective of this blog is to show you how to find differences between master and slave in an MySQL replication infrastructure, If you have decided to sync. slave with master then please proceed with pt-table-sync tool. Both pt-table-checksum and pt-table-sync are highly customizable tools, I have used very simple form of them in the examples below: Master – 192.168.56.3 Slave – 192.168.56.4 Percona Toolkit Version – 3.0.12 MySQL Version -MySQL GA 5.7.23 I have created a user in the master to check and repair: GRANT ALL ON *.* to chksum@'%' identified by 'Password@2018'; In case you have non default ports (3306) for MySQL, Please set the following variables on the slaves: # my.cnf [mysqld] report_host = slave report_port = 3307 Confirm data inconsistencies in Master (192.168.56.3) and Slave (192.168.56.4): Master mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec) Slave mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443311 | +----------+ 1 row in set (0.09 sec) Check data inconsistencies using pt-table-checksum: Check for data inconsistencies by executing the following command** on the Master: ** command I have executed below :- pt-table-checksum h=192.168.56.3,u=chksum,p=Password@2018,P=3306 –set-vars innodb_lock_wait_timeout=30 –no-check-binlog-format –databases=employees –tables=titles [root@localhost ~]# pt-table-checksum h=192.168.56.3,u=chksum,p=Password@2018,P=3306 --set-vars innodb_lock_wait_timeout=30 --no-check-binlog-format --databases=employees --tables=titles Checking if all tables can be checksummed ... Starting checksum ... TS ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME TABLE 09-20T22:53:02 0 2 443308 5 6 0 1.319 employees.titles Fixing data inconsistencies using pt-table-checksum We are checking data inconsistency from Master (192.168.56.3) to Slave (192.168.56.3) by executing the following command** on the Master: ** command I have executed below :- pt-table-sync –sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 –databases=employees –tables=titles –print [root@localhost ~]# pt-table-sync --sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 --databases=employees --tables=titles --print REPLACE INTO `employees`.`titles`(`emp_no`, `title`, `from_date`, `to_date`) VALUES ('10144', 'Senior Staff', '1992-10-14', '1993-08-10') /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; REPLACE INTO `employees`.`titles`(`emp_no`, `title`, `from_date`, `to_date`) VALUES ('10144', 'Staff', '1985-10-14', '1992-10-14') /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='87000' AND `title`='Staff Engineer' AND `from_date`='1990-01-01' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97000' AND `title`='Database Engineer' AND `from_date`='1991-01-01' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97500' AND `title`='Project Manager' AND `from_date`='1983-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97501' AND `title`='Project Manager' AND `from_date`='1983-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; DELETE FROM `employees`.`titles` WHERE `emp_no`='97502' AND `title`='Project Engineer' AND `from_date`='1993-04-11' LIMIT 1 /*percona-toolkit src_db:employees src_tbl:titles src_dsn:P=3306,h=192.168.56.3,p=...,u=chksum dst_db:employees dst_tbl:titles dst_dsn:P=3306,h=192.168.56.4,p=...,u=chksum lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:3789 user:root host:localhost.localdomain*/; [root@localhost ~]# To fix inconsistencies on MySQL Master against the Slave execute the following command on the Master: [root@localhost ~]# pt-table-sync --sync-to-master h=192.168.56.4,u=chksum,p=Password@2018,P=3306 --databases=employees --tables=titles --execute Confirm the data inconsistencies in Master (192.168.56.3) and Slave (192.168.56.4) are fixed: Master mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec) Slave mysql> select count(1) from titles; +----------+ | count(1) | +----------+ | 443308 | +----------+ 1 row in set (0.09 sec) Conclusion  I recommend / encourage my customers to perform pt-table-checksum exercise regularly in their MySQL replication infrastructure to avoid unpleasant experiences due to data consistency issues. The post How to check and fix MySQL replication inconsistencies ? appeared first on MySQL Consulting, Support and Remote DBA Services.

  • This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update
    Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community. This week is clearly what I’d call a “MariaDB week” — plenty of announcements coming from MariaDB Corporation and MariaDB Foundation. It started with Alibaba Cloud and MariaDB Announce the Launch of ApsaraDB RDS for MariaDB TX, which makes Alibaba Cloud the first public cloud to offer the enterprise offering of MariaDB, MariaDB TX 3.0. It is not available yet as of this announcement for rolling out from the interface, but I expect it will be soon. Exciting, as you can already get MariaDB Server on Amazon RDS for MariaDB, and you can join the waitlist preview for Azure. MariaDB Corporation has received more funding from ServiceNow Ventures in the Series C round, and has gained a new board member in Pat Casey. ServiceNow is a user of MariaDB, and “ServiceNow’s platform runs on up to 85,000 MariaDB databases that serve more than 25 billion queries per hour.” There was an excellent keynote session at M|18 about how ServiceNow uses MariaDB. The Register refers to this as “protecting ServiceNow’s toolchain”. For good measure, MariaDB acquired Clustrix as well. This is the second acquisition after MammothDB earlier in the year. It is worth reading the TechCrunch take on this. Clustrix, a Y Combinator company, has been around since 2006 and raised $72 million. The price of the acquisition was not announced. For a bit of behind the scenes chatter from ex-employee shareholders, Hacker News delivers. From a MariaDB Foundation standpoint, we see Otto Kekäläinen, the MariaDB Foundation CEO stepping down. Thanks for all your hard work Otto! And maybe you missed it, but not long ago, Percona Became a Bronze Sponsor of MariaDB Foundation. Speaking of conferences, the tutorial schedule and a sneak peek of sessions for Percona Live Europe Frankfurt have been announced. In addition, the Call for Papers – 2019 Annual MariaDB User Conference closes October 31, 2018. Releases SQLite Release 3.25.1 now comes with enhancements to ALTER TABLE and window functions. Percona XtraDB Cluster 5.6.41-28.28 Percona Server for MongoDB 3.2.21-3.12 and Percona Server for MongoDB 3.6.7-1.5 Link List MongoDB 4.1.3 is ready for testing and MMAPv1 has been removed. A very apt goodbye, “sometimes a piece of software has served its purpose and it’s time to move on. The original MongoDB storage engine, MMAPv1, was deprecated in 4.0 and will no longer be an option in 4.2.” Postgres 11 – a First Look — don’t forget to check out PostgreSQL 11 Beta 4. Mark Callaghan on durability debt, defined as “the amount of work that can be done to persist changes that have been applied to a database.“ Upcoming Appearances Open Source Summit Europe 2018 – 22-24 October 2018 Feedback I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.   The post This Week in Data with Colin Charles 53: It’s MariaDB Week PLUS Percona Live Europe Update appeared first on Percona Database Performance Blog.

  • SQL Order of Operations – SQL Query Order of Execution
    Knowing the bits and bytes of an SQL query’s order of operations can be very valuable, as it can ease the process of writing new queries, while also being very beneficial when trying to optimize an SQL query. If you’re looking for the short version, this is the logical order of operations, also known as the order of execution, for an SQL query: FROM, including JOINs WHERE GROUP BY HAVING WINDOW functions SELECT DISTINCT UNION ORDER BY LIMIT and OFFSET But the reality isn’t that easy nor straight forward. As we said, the SQL standard defines the order of execution for the different SQL query clauses. Said that, modern databases are already challanaging that default order by applying some optimization tricks which might change the actual order of execution, though they must end up returning the same result as if they were running the query at the default execution order. Why would they do that? Well, it can be silly if the database would first fetch all data mentioned in the FROM clause (including the JOINs), before looking into the WHERE clause and its indexes. Those tables can hold lots of data, so you can imagine what will happen if the database’s optimizer would stick to the traditional order of operations of an SQL query. Let’s look into each of the SQL query parts according to their execution order. FROM and JOINs The tables specified in the FROM clause (including JOINs), will be evaluated first, to determine the entire working set which is relevant for the query. The database will merge the data from all tables, according to the JOINs ON clauses, while also fetching data from the subqueries, and might even create some temporary tables to hold the data returned from the subqueries in this clause. In many cases though, the database’s optimizer will choose to evaluate the WHERE part first, to see which part of the working set can be left out (preferably using indexes), so it won’t inflate the data set too much if it doesn’t really have to. WHERE clause The WHERE clause will be the second to be evaluated, after the FROM clause. We have the working data set in place, and now we can filter the data according to the conditions in the WHERE clause. These conditions can include references to the data and tables from the FROM clause, but cannot include any references to aliases defined in the SELECT clause, as that data and those aliases may not yet ‘exist’ in that context, as that clause wasn’t yet evaluated by the database. Also, a common pitfall for the WHERE clause would be to try and filter out aggregated values in the WHERE clause, for example with this clause: “WHERE sum(available_stock) > 0“. This statement will fail the query execution, because aggregations will be evaluated later in the process (see the GROUP BY section below). To apply filtering condition on aggregated data, you should use the HAVING clause and not the WHERE clause. GROUP BY clause Now that we filtered the data set using the WHERE clause, we can aggregate the data according to one or more columns appearing in the GROUP BY clause. Grouping the data is actually splitting it to different chunks or buckets, where each bucket has one key and a list of rows that match that key. Not having a GROUP BY clause is like putting all rows in a one huge bucket. Once you aggregate the data, you can now use aggregation functions to return a per-group value for each of the buckets. Such aggregation functions include COUNT, MIN, MAX, SUM and others. HAVING clause Now that we have grouped the data using the GROUP BY clause, we can use the HAVING clause to filter out some buckets. The conditions in the HAVING clause can refer to the aggregation functions, so the example which didn’t work in the WHERE clause above, will work just fine in the HAVING clause: “HAVING sum(available_stock) > 0″. As we’ve already grouped the data, we can no longer access the original rows at this point, so we can only apply conditions to filter entire buckets, and not single rows in a bucket. Also, as we mentioned in previous sections, aliases defined in the SELECT clause cannot be accessed in the section either, as they weren’t yet evaluated by the database (this is true in most databases). Window functions If you are using Window functions, this is the point where they’ll be executed. Just like the grouping mechanism, Window functions are also performing a calculation on a set of rows. The main difference is that when using Window functions, each row will keep its own identity and won’t be grouped into a bucket of other similar rows. Window functions can only be used in either the SELECT or the ORDER BY clause. You can use aggregation functions inside the Window functions, for example: SUM(COUNT(*)) OVER () SELECT clause Now that we are done with discarding rows from the data set and grouping the data, we can select the data we want to be fetched from the query to the client side. You can use column names, aggregations and subqueries inside the SELECT clause. Keep in mind that if you’re using a reference to an aggregation function, such as COUNT(*) in the SELECT clause, it’s merely a reference to an aggregation which already occurred when the grouping took place, so the aggregation itself doesn’t happen in the SELECT clause, but this is only a reference to its result set. DISTINCT keyword The syntax of the DISTINCT keyword is a bit confusing, because the keyword takes its place before the column names in the SELECT clause. But, the actual DISTINCT operation takes place after the SELECT. When using the DISTINCT keyword, the database will discard rows with duplicate values from the remaining rows left after the filtering and aggregations took place. UNION keyword The UNION keyword combines the result sets of two queries into one result set. Most databases will allow you to choose between UNION DISTINCT (which will discard duplicate rows from the combined result set) or UNION ALL (which just combines the result sets without applying any duplication check). You can apply sorting (ORDER BY) and limiting (LIMIT) on the UNION’s result set, the same way you can apply it on a regular query. ORDER BY clause Sorting takes place once the database has the entire result set ready (after filtering, grouping, duplication removal). Once we have that, the database can now sort the result set using columns, selected aliases, or aggregation functions, even if they aren’t part of the selected data. The only exception is when using the DISTINCT keyword, which prevents sorting by a non-selected column, as in that case the result set’s order will be undefined. You can choose to sort the data using a descending (DESC) order or an ascending (ASC) order. The order can be unique for each of the order parts, so the following is valid: ORDER BY firstname ASC, age DESC LIMIT and OFFSET In most use cases (excluding a few like reporting), we would want to discard all rows but the first X rows of the query’s result. The LIMIT clause, which is executed after sorting, allows us to do just that. In addition, you can choose which row to start fetching the data from and how many to exclude, using a combination of the LIMIT and OFFSET keywords. The following example will fetch 50 rows starting row #100: LIMIT 50 OFFSET 100

  • Become a ClusterControl DBA: Operational Reports for MySQL, MariaDB, PostgreSQL & MongoDB
    The majority of DBA’s perform health checks on their databases every now and then. Usually, it would happen on a daily or weekly basis. We previously discussed why such checks are important and what they should include. To make sure your systems are in a good shape, you’d need to go through quite a lot of information - host statistics, MySQL statistics, workload statistics, state of backups, database packages, logs and so forth. Such data should be available in every properly monitored environment, although sometimes it is scattered across multiple locations - you may have one tool to monitor MySQL state, another tool to collect system statistics, maybe a set of scripts, e.g., to check the state of your backups. This makes health checks much more time-consuming than they should be - the DBA has to put together the different pieces to understand the state of the system. Integrated tools like ClusterControl have an advantage that all of the bits are located in the same place (or in the same application). It still does not mean they are located next to each other - they may be located in different sections of the UI and a DBA may have to spend some time clicking through the UI to reach all the interesting data. The whole idea behind creating Operational Reports is to put all of the most important data into a single document, which can be quickly reviewed to get an understanding of the state of the databases. Operational Reports are available from the menu Side Menu -> Operational Reports: Once you go there, you’ll be presented with a list of reports created manually or automatically, based on a predefined schedule: If you want to create a new report manually, you’ll use the 'Create' option. Pick the type of report, cluster name (for per-cluster report), email recipients (optional - if you want the report to be delivered to you), and you’re pretty much done: The reports can also be scheduled to be created on a regular basis: At this time, 5 types of reports are available: Availability report - All clusters. Backup report - All clusters. Schema change report - MySQL/MariaDB-based cluster only. Daily system report - Per cluster. Package upgrade report - Per cluster. Availability Report Availability reports focuses on, well, availability. It includes three sections. First, availability summary: You can see information about availability statistics of your databases, the cluster type, total uptime and downtime, current state of the cluster and when that state last changed. Another section gives more details on availability for every cluster. The screenshot below only shows one of the database cluster: We can see when a node switched state and what the transition was. It’s a nice place to check if there were any recent problems with the cluster. Similar data is shown in the third section of this report, where you can go through the history of changes in cluster state. Backup Report The second type of the report is one covering backups of all clusters. It contains two sections - backup summary and backup details, where the former basically gives you a short summary of when the last backup was created, if it completed successfully or failed, backup verification status, success rate and retention period: ClusterControl also provides examples of backup policy if it finds any of the monitored database cluster running without any scheduled backup or delayed slave configured. Next are the backup details: You can also check the list of backups executed on the cluster with their state, type and size within the specified interval. This is as close you can get to be certain that backups work correctly without running a full recovery test. We definitely recommend that such tests are performed every now and then. Good news is ClusterControl supports MySQL-based restoration and verification on a standalone host under Backup -> Restore Backup. Daily System Report This type of report contains detailed information about a particular cluster. It starts with a summary of different alerts which are related to the cluster: Next section is about the state of the nodes that are part of the cluster: You have a list of the nodes in the cluster, their type, role (master or slave), status of the node, uptime and the OS. Another section of the report is the backup summary, same as we discussed above. Next one presents a summary of top queries in the cluster: Finally, we see a “Node status overview” in which you’ll be presented with graphs related to OS and MySQL metrics for each node. As you can see, we have here graphs covering all of the aspects of the load on the host - CPU, memory, network, disk, CPU load and disk free. This is enough to get an idea whether anything weird happened recently or not. You can also see some details about MySQL workload - how many queries were executed, which type of query, how the data was accessed (via which handler)? This, on the other hand, should be enough to pick most of the issues on MySQL side. What you want to look at are all spikes and dips that you haven’t seen in the past. Maybe a new query has been added to the mix and, as a result, handler_read_rnd_next skyrocketed? Maybe there was an increase of CPU load and a high number of connections might point to increased load on MySQL, but also to some kind of contention. An unexpected pattern might be good to investigate, so you know what is going on. Package Upgrade Report This report gives a summary of packages available for upgrade by the repository manager on the monitored hosts. For an accurate reporting, ensure you always use stable and trusted repositories on every host. In some undesirable occasions, the monitored hosts could be configured with an outdated repository after an upgrade (e.g, every MariaDB major version uses different repository), incomplete internal repository (e.g, partial mirrored from the upstream) or bleeding edge repository (commonly for unstable nightly-build packages). The first section is the upgrade summary: It summarizes the total number of packages available for upgrade as well as the related managed service for the cluster like load balancer, virtual IP address and arbitrator. Next, ClusterControl provides a detailed package list, grouped by package type for every host: This report provides the available version and can greatly help us plan our maintenance window efficiently. For critical upgrades like security and database packages, we could prioritize it over non-critical upgrades, which could be consolidated with other less priority maintenance windows. Schema Change Report This report compares the selected MySQL/MariaDB database changes in table structure which happened between two different generated reports. In the MySQL/MariaDB older versions, DDL operation is a non-atomic operation (pre 8.0) and requires full table copy (pre 5.6 for most operations) - blocking other transactions until it completes. Schema changes could become a huge pain once your tables get a significant amount of data and must be carefully planned especially in a clustered setup. In a multi-tiered development environment, we have seen many cases where developers silently modify the table structure, resulting in significant impact to query performance. In order for ClusterControl to produce an accurate report, special options must be configured inside CMON configuration file for the respective cluster: schema_change_detection_address - Checks will be executed using SHOW TABLES/SHOW CREATE TABLE to determine if the schema has changed. The checks are executed on the address specified and is of the format HOSTNAME:PORT. The schema_change_detection_databases must also be set. A differential of a changed table is created (using diff). schema_change_detection_databases - Comma separated list of databases to monitor for schema changes. If empty, no checks are made. In this example, we would like to monitor schema changes for database "myapp" and "sbtest" on our MariaDB Cluster with cluster ID 27. Pick one of the database nodes as the value of schema_change_detection_address. For MySQL replication, this should be the master host, or any slave host that holds the databases (in case partial replication is active). Then, inside /etc/cmon.d/cmon_27.cnf, add the two following lines: schema_change_detection_address=10.0.0.30:3306 schema_change_detection_databases=myapp,sbtest Restart CMON service to load the change: $ systemctl restart cmon For the first and foremost report, ClusterControl only returns the result of metadata collection, similar to below: With the first report as the baseline, the subsequent reports will return the output that we are expecting for: Related resources  An Overview of Database Operational Reporting in ClusterControl  Database Security Monitoring for MySQL and MariaDB  Become a ClusterControl DBA: Performance and Health Monitoring Take note only new tables or changed tables are printed in the report. The first report is only for metadata collection for comparison in the subsequent rounds, thus we have to run it for at least twice to see the difference. With this report, you can now gather the database structure footprints and understand how your database has evolved across time. Final Thoughts Operational report is a comprehensive way to understand the state of your database infrastructure. It is built for both operational or managerial staff, and can be very useful in analysing your database operations. The reports can be generated in-place or can be delivered to you via email, which make things conveniently easy if you have a reporting silo. We’d love to hear your feedback on anything else you’d like to have included in the report, what’s missing and what is not needed. Tags:  MySQL MariaDB PostgreSQL MongoDB galera operations reporting

Estate

giorni_estate2.jpg

Calendario impegni

September
S M T W T F S
26 27 28 29 30 31 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 1 2 3 4 5 6