Bash script to check for final closing /data xml tag in file

Bash script to check for final closing </data> xml tag in file:

if [ -f file.xml ]
then
    # if grep -1iq '</data>' file.xml    # simplest syntax
    # if tac file.xml | grep -1iq '</data>'   # faster syntax (tac is reverse cat)
    if tail -100 file.xml | grep -1iq '</data>'   # fastest syntax, assumes at most 100 trailing lines of whitespace/etc. at end of file.
    then
        echo file ready to be loaded
        # code to load and archive file here
    else
        echo file partially transferred, so no action taken.
    fi
else
    echo file not yet transferred, so no action taken.
fi

This could be put into an infinite loop:

while :
do
    # code to do stuff here
    sleep 1  # sleep for 1 second before checking again, to avoid wasting cpu.
done
Posted in Linux | Leave a comment

List oracle indexes and their columns with listagg

List indexes on large tables, with listagg to show the indexed columns in a comma separated line, and outer join to include tables with no indexes:

SELECT t.owner
     , t.table_name
     , ROUND ( t.blocks * 8 / 1024 / 1024 ) AS gb  -- assuming each block is default 8kb
     , t.num_rows
     , i.index_name
     , LISTAGG ( ic.column_name || ' , ' ) WITHIN GROUP ( ORDER BY ic.column_position ) AS ind_cols
     , i.index_type
     , i.uniqueness
     , t.partitioned AS table_partitioned
     , i.partitioned AS index_partitioned
  FROM dba_tables t
  LEFT OUTER JOIN dba_indexes i
    ON i.table_owner = t.owner
   AND i.table_name = t.table_name
  LEFT OUTER JOIN dba_ind_columns ic
    ON ic.index_owner = i.owner
   AND ic.index_name = i.index_name
 WHERE t.blocks > 1024 * 1024 / 8  -- 1gb, assuming each block is default 8kb
 GROUP BY t.owner
     , t.table_name
     , t.blocks
     , t.num_rows
     , i.index_name
     , i.index_type
     , i.uniqueness
     , t.partitioned
     , i.partitioned
 ORDER BY t.blocks DESC , t.owner , t.table_name , i.index_name
;
Posted in Performance tuning | Leave a comment

Oracle distributed query join over remote database link slow performance leading driving_site hint fix

Oracle Optimizer doesn’t cope so well with distributed queries, joining to tables in different databases.

1) The leading and/or driving_site hints can dramatically improve distributed query performance.

SELECT /*+ leading(tab1) driving_site(tab2) */ ... FROM table1 AS tab1 JOIN table2@db2 AS tab2 ...

Driving_site specifies the database you want doing the bulk of the work, likely the largest database involved. Specify a table alias rather than a database name in the hint.

Leading specifies the table (alias) you want to start the query off at, likely the most restrictive table involved.

For my problem query, I found leading was better for consistent performance than driving_site, and that response time improved from 28 minutes without hint, down to 27 seconds with leading hint. The driving_site hint was very much cache dependent: 2 minutes 20 seconds initially when run from cold, 7 seconds if rerun while all data was still cached. Hinting both ways at the same time had the same effect in my case as just using the driving_site hint, because my leading table was not in the driving database.

For normal, non distributed queries, it would be better to try to improve optimizer stats before resorting to hints. Where possible, avoid adding hints blindly, but use only in specific cases after testing the effects. Re-testing should be done after a major upgrade, because Oracle continually improve the optimizer, and your hint may eventually prevent improvements from taking effect.

2) If you have a lot of distributed queries like this, you should consider merging the databases into a single database (can keep them in separate schemas).

3) Another option, especially for reporting systems, would be to replicate the remote tables, using e.g. materialized views or transportable tablespaces.

Jonathan Lewis has full details on this in an article in Oracle Scene Magazine Spring 2015 Issue 56. The full magazine is here, or just the four pages of his specific article here.

Posted in Performance tuning | Leave a comment

Oracle User Group Scotland Conference 2015

Oracle User Group Scotland Conference 2015

1) SQLcl (SQL command line) new enhanced SQL*Plus, easy to output index info, DDL, output/load data in csv, xml. Early adopter (beta) version available for download now, production version expected soon and to be bundled by default into Oracle 12cR2.

More at Kris Rice.

2) Relax and Recover Martin Nash

A free tool to make a bootable .iso file for fast bare metal recovering of any server, onto which the normal backup files could then be restored. Free GPL licence. Linux only.

3) Who changed my Plan? Presanth Kothuri
Continue reading

Posted in Oracle User Group, Performance tuning | 1 Comment

execute immediate quote escape oracle pl/sql

To run this via execute immediate:

CREATE TABLE t2 AS SELECT * FROM t1 WHERE col1 = '1' ;

Either use new style quote format:

EXECUTE IMMEDIATE q'[CREATE TABLE t2 AS SELECT * FROM t1 WHERE col1 = '1']' ;

Or older style still works also:

EXECUTE IMMEDIATE 'CREATE TABLE t2 AS SELECT * FROM t1 WHERE col1 = ''1''' ;
Posted in Uncategorized | Leave a comment

Partition an existing oracle table example

I used this SQL to convert an existing non-partitioned table to be partitioned. Table was mostly queried on snapshot_id=’Current’, which had the most up to date data, but had much more historic data that was rarely queried against, all flagged with snapshot_id = ‘YYYYMMDD’ format.

-- 1) Create an empty table with one partition
CREATE TABLE a1 TABLESPACE targtabs_big
PARTITION BY RANGE ( snapshot_id ) (
    PARTITION adm_fact_all VALUES LESS THAN ( MAXVALUE ) TABLESPACE targtabs_big
)
AS SELECT * FROM adm_fact WHERE 1 = 2 ;
-- 2) Swap adm_fact into that empty table with one partition
ALTER TABLE a1 EXCHANGE PARTITION adm_fact_all WITH TABLE adm_fact ;
-- 3) Flip adm_fact to this thing we have just built. (beware grants, indexes, constraints, triggers, etc.)
DROP TABLE adm_fact ;  -- make sure to take a note of grants, indexes, constraints, triggers, etc. before doing this!
RENAME a1 TO adm_fact ;
-- 4) Split the partition to put Current by itself
ALTER TABLE adm_fact SPLIT PARTITION adm_fact_all AT ( '29990000' ) INTO (
     PARTITION adm_fact_old TABLESPACE targtabs_big COMPRESS -- AD 2999 = everything except 'Current'
   , PARTITION adm_fact_current TABLESPACE targtabs_big -- 'Current'
) ;
-- 5) Add the indexes back in - they all disappeared at step 3 above (leave out LOCAL if want global non-partitioned index).
CREATE INDEX i_af_pc on adm_fact ( programme_code , programme_discipline_code )
LOCAL tablespace targindx_med ;
CREATE INDEX i_af_si on adm_fact ( snapshot_id )
LOCAL tablespace targindx_med ;
-- 6) Add grants, constraints, triggers, etc. back in - they all disappeared at step 3 above.
GRANT SELECT ON adm_fact TO role ;
-- 7) Gather optimizer stats
EXEC dbms_stats.gather_database_stats_job_proc
-- Or for 10gR2 or older instead use: EXEC dbms_scheduler.run_job ( 'GATHER_STATS_JOB' )
-- Optionally check things are working ok.
SELECT snapshot_id , COUNT(*) FROM adm_fact GROUP BY snapshot_id ORDER BY 1 ;
SELECT snapshot_id , COUNT(*) FROM adm_fact PARTITION ( adm_fact_old ) GROUP BY snapshot_id ORDER BY 1 ;
SELECT snapshot_id , COUNT(*) FROM adm_fact PARTITION ( adm_fact_current ) GROUP BY snapshot_id ORDER BY 1 ;
-- 9) (Optional) by default updating snapshot_id is not allowed, so e.g.:
-- UPDATE adm_fact SET snapshot_id = 'Current' WHERE snapshot_id != 'Current' AND ROWNUM < 11 ;
-- would fail with error ""ORA-14402: updating partition key column would cause a partition change"
-- You can change that behaviour with:
ALTER TABLE adm_fact ENABLE ROW MOVEMENT ;

Dbms_redefinition could be used instead to reduce the outage time to almost nothing, but the above method is fast, was good enough for my outage timescales.

Indexes or their partitions can be compressed as well as table partitions, but often less benefit from that.

Posted in Uncategorized | Leave a comment

SQL to display date duration as number of years and months

Convert date duration (here customer age at time of first application) in years and months, e.g. ‘3408’ for 34 years and 8 months old:

  WITH customers AS ( SELECT SYSDATE AS appl_date , TO_DATE ( '01-AUG-1980' , 'DD-MON-YYYY' ) AS date_of_birth FROM DUAL )
SELECT TO_CHAR ( GREATEST ( 0 , LEAST ( 99 , FLOOR ( ( c.appl_date - c.date_of_birth ) / 365.25 ) ) ) , 'fm00' )  -- years
          || TO_CHAR ( GREATEST ( 0 , FLOOR ( 12 * MOD ( ( ( c.appl_date - c.date_of_birth ) / 365.25 ) , 1 ) ) ) , 'fm00' ) AS years_months
     , ( c.appl_date - c.date_of_birth ) / 365.25 AS years
     , 12 * MOD ( ( ( c.appl_date - c.date_of_birth ) / 365.25 ) , 1 ) AS months
  FROM customers c ;
Posted in Uncategorized | Leave a comment

Rebuild table indexes lobs in oracle

Rebuilds are beneficial in some rare circumstances such as after a mass delete operation. The below method is safer and faster than export/import.

Example table:

CREATE TABLE tab1 ( col1 VARCHAR2(1) , col2 CLOB ) TABLESPACE ts1 ;
CREATE INDEX tab1i ON tab1 ( col1 ) TABLESPACE ts1 ;

List objects and tablespaces:

SELECT tablespace_name FROM user_tables WHERE table_name = 'TAB1' ;
SELECT tablespace_name FROM user_lobs WHERE table_name = 'TAB1' ;
SELECT tablespace_name , index_name FROM user_indexes WHERE table_name = 'TAB1' ;

Rebuild:

ALTER TABLE tab1 MOVE ;
ALTER TABLE tab1 MOVE LOB ( col2 ) STORE AS ( TABLESPACE ts1 ) ;
ALTER INDEX tab1i REBUILD ;

Update optimizer stats:

EXEC dbms_scheduler.run_job ( 'GATHER_STATS_JOB' )

For large tables, it may be better to rebuild them online using dbms_redefinition.

H/t Anand.

Posted in Uncategorized | 1 Comment

Oracle data transform using analytic lag row_number case

So I needed to transform this source data:

Event CategoryStart WeekEnd Week
LEC11
LEC111
LEC111
LEC33
SEM22
SEM46
SEM99
TUT112
TUT46

Into this output format:

Event CategoryNumber of OccurrencesStart WeekEnd Week
LEC311
LEC222
LEC333
LEC2411
SEM122
SEM146
SEM199
TUT113
TUT246
TUT1712

Seems simple, but required a combination of CASE, LAG, and ROW_NUMBER – as Tom Kyte steps through in detail in this article from 2004.

Here is the solution:

SELECT q5.event_category , q5.num , MIN ( q5.week ) AS start_week , MAX ( q5.week ) AS end_week
  FROM (
       SELECT q4.week , q4.event_category , q4.num
            , MAX ( q4.rn ) OVER ( PARTITION BY q4.event_category ORDER BY q4.week ) AS max_rn
        FROM (
             SELECT q3.week , q3.event_category , q3.num
                  , CASE
                       WHEN q3.num != q3.num_prev THEN q3.row_num
                       WHEN q3.week != q3.week_prev + 1 THEN q3.row_num
                       END AS rn  -- mark the beginning of each group
               FROM (
                    SELECT q2.week , q2.event_category , q2.num
                         , LAG  ( q2.num  , 1 , 0 ) OVER ( PARTITION BY q2.event_category ORDER BY q2.week ) AS num_prev
                         , LAG  ( q2.week , 1 , 0 ) OVER ( PARTITION BY q2.event_category ORDER BY q2.week ) AS week_prev
                         , ROW_NUMBER ( ) OVER ( PARTITION BY q2.event_category ORDER BY q2.week ) AS row_num
                      FROM (
                           SELECT t.event_category , q1.week , COUNT(*) as num
                             FROM timetable t
                             JOIN (
                                  SELECT LEVEL AS week
                                    FROM DUAL
                                 CONNECT BY LEVEL <= 52
                                  ) q1
                               ON q1.week BETWEEN t.startw AND t.endw
                            GROUP BY t.event_category , q1.week
                           ) q2
                    ) q3
             ) q4
       ) q5
 GROUP BY q5.event_category , q5.num , q5.max_rn
 ORDER BY q5.event_category , MIN ( q5.week )
;

Sample data:

CREATE TABLE timetable ( startw NUMBER , endw NUMBER , event_category VARCHAR2(3) ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 1 , 11  ,'LEC' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 1 , 11  ,'LEC' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 1 , 1 , 'LEC' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 3 , 3 , 'LEC' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 2 , 2 , 'SEM' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 4 , 6 , 'SEM' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 9 , 9 , 'SEM' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 1 , 12 , 'TUT' ) ;
INSERT INTO timetable ( startw , endw , event_category ) VALUES ( 4 , 6 , 'TUT' ) ;
Posted in Uncategorized | Leave a comment

oracle update select merge

If you want to update one table based on the values in another table, use ‘UPDATE = (SELECT) WHERE (SELECT)’ or ‘MERGE WHERE’, as in examples (3) and (3a) below.

Sample data:

DROP TABLE a ;
CREATE TABLE a ( id VARCHAR2(10) , text1 VARCHAR2(10) , text2 VARCHAR2(10) ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '1' , 'yes' , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '2' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '3' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '4' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '5' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '5' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '7' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '8' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '9' , NULL , NULL ) ;
INSERT INTO a ( id , text1 , text2 ) VALUES ( '0' , NULL , NULL ) ;
DROP TABLE b ;
CREATE TABLE b ( id VARCHAR2(10) , text1 VARCHAR2(10) , text2 VARCHAR2(10) ) ;
INSERT INTO b ( id , text1 , text2 ) VALUES ( '1' , 'yes' , NULL ) ;
INSERT INTO b ( id , text1 , text2 ) VALUES ( '2' , 'maybe' , 'yes' ) ;
INSERT INTO b ( id , text1 , text2 ) VALUES ( '3' , NULL , NULL ) ;
INSERT INTO b ( id , text1 , text2 ) VALUES ( '4' , NULL , 'maybe' ) ;
INSERT INTO b ( id , text1 , text2 ) VALUES ( 'ne' , 'maybe' , 'maybe' ) ;
COMMIT ;

Two of the rows in table a – rows 2 and 4 – have different values than their equivalent rows in table b. So ideally I want to update just those 2 rows in table a.

1) Simplest method works ok in terms of getting to the correct end result, but updates the whole of table a, all 10 rows. That doesn’t matter for the end result, because is updating columns that already match to the same value, but is an unnecessary performance overhead.

UPDATE a SET ( a.text1 , a.text2 ) = ( SELECT b.text1 , b.text2 FROM b WHERE a.id = b.id ) ;

2) Adding an IN check excludes the rows in a that don’t have a counterpart in b, now down to 4 updates.

UPDATE a SET ( a.text1 , a.text2 ) = ( SELECT b.text1 , b.text2 FROM b WHERE a.id = b.id )
WHERE a.id IN ( SELECT b.id FROM b ) ;

EXISTS could be used instead of IN, except it requires slightly more code. The second line would be a longer:

WHERE EXISTS ( SELECT NULL FROM b WHERE a.id = b.id ) ;

EXISTS and IN are functionally the same and perform the same; unlike NOT EXISTS and NOT IN which handle NULLs differently.

2b) This MERGE statement is identical, 4 rows merged:

MERGE INTO a USING ( SELECT * FROM b ) b
ON ( a.id = b.id )
WHEN MATCHED THEN UPDATE SET a.text1 = b.text1 , a.text2 = b.text2 ;

Merge can do more: INSERT when not matched, delete based on a where clause; so becomes useful when that is needed.

2c) Using an inline view is also identical, 4 rows updated, but this needs a primary or unique constraint to be in place on the table which is not being updated. Without that it will fail with ORA-01779: cannot modify a column which maps to a non key-preserved table.

ALTER TABLE b ADD CONSTRAINT b_pk PRIMARY KEY ( id ) ;
UPDATE (
   SELECT a.text1 AS a_text1 , a.text2 AS a_text2 , b.text1 AS b_text1 , b.text2 AS b_text2
     FROM a JOIN b ON a.id = b.id
   ) v
SET v.a_text1 = v.b_text1 , v.a_text2 = v.b_text2 ;

3) Adding more conditions to the IN clause gets us to the desired point, 2 rows updated the first time it is run, no rows updated on subsequent runs.

UPDATE a SET ( a.text1 , a.text2 ) = ( SELECT b.text1 , b.text2 FROM b WHERE a.id = b.id )
WHERE a.id IN (
   SELECT b.id FROM b
    WHERE NVL ( a.text1 , 'x' ) != NVL ( b.text1 , 'x' )
       OR NVL ( a.text2 , 'x' ) != NVL ( b.text2 , 'x' )
) ;

The NVLs are needed because ‘something=NULL’ always evaluates to false.

3b) Again, MERGE can do the same thing, 2 rows merged initially, none subsequently:

MERGE INTO a USING ( SELECT * FROM b ) b
ON ( a.id = b.id )
WHEN MATCHED THEN UPDATE SET a.text1 = b.text1 , a.text2 = b.text2
   WHERE NVL ( a.text1 , 'x' ) != NVL ( b.text1 , 'x' )
      OR NVL ( a.text2 , 'x' ) != NVL ( b.text2 , 'x' ) ;

That works ok for a 1:1 join, but where the second table returns multiple values then this will either toggle the update around the different values each time it runs, orfail altogether with ORA-30926: unable to get a stable set of rows in the source tables depending on the number of multiples. A workaround such as a MAX function or similar will ensure unique values are selected

INSERT INTO b ( id , text1 , text2 ) VALUES ( '4' , 'duplicate' , 'duplicate' ) ;
MERGE INTO a USING ( SELECT b.id , MAX ( b.text1 ) AS text1 , MAX ( b.text2 ) AS text2 FROM b GROUP BY b.id ) b
ON ( a.id = b.id )
WHEN MATCHED THEN UPDATE SET a.text1 = b.text1 , a.text2 = b.text2
   WHERE NVL ( a.text1 , 'x' ) != NVL ( b.text1 , 'x' )
      OR NVL ( a.text2 , 'x' ) != NVL ( b.text2 , 'x' ) ;

NULL values of id are handled ok without workarounds like this.
3c) And again, using an inline view can do the same thing, 2 rows updated initially, none subsequently.

ALTER TABLE b ADD CONSTRAINT b_pk PRIMARY KEY ( id ) ;
UPDATE (
   SELECT a.text1 AS a_text1 , a.text2 AS a_text2 , b.text1 AS b_text1 , b.text2 AS b_text2
     FROM a JOIN b ON a.id = b.id
    WHERE NVL ( a.text1 , 'x' ) != NVL ( b.text1 , 'x' )
       OR NVL ( a.text2 , 'x' ) != NVL ( b.text2 , 'x' )
   ) v
SET v.a_text1 = v.b_text1 , v.a_text2 = v.b_text2 ;

Those statements only tell you the number of rows changed. If you want to know the details of what was changed, and you know for sure that no one else is modifying the data at that time, then you can run a report before the update/merge:

SELECT a.id , a.text1 AS a_text1 , a.text2 AS a_text2 , b.text1 AS b_text1 , b.text2 AS b_text2
  FROM a JOIN b ON a.id = b.id
 WHERE NVL ( a.text1 , 'x' ) != NVL ( b.text1 , 'x' )
    OR NVL ( a.text2 , 'x' ) != NVL ( b.text2 , 'x' ) ;

More likely you need certainty that what you are reporting on is really changed, that means a begin..end block is needed to ensure transaction isolation from any other data modifications at that time. That means writing the report to a log table rather than just selecting the information:

BEGIN
   INSERT INTO log_table
   SELECT a.id , a.text1 AS a_text1 , a.text2 AS a_text2 , b.text1 AS b_text1 , b.text2 AS b_text2 , SYSDATE
     FROM a JOIN b ON a.id = b.id
    WHERE NVL ( a.text1 , 'x' ) != NVL ( b.text1 , 'x' )
       OR NVL ( a.text2 , 'x' ) != NVL ( b.text2 , 'x' )
   ;
   -- main UPDATE or MERGE statement goes here.
END ;

PL/SQL can be a better alternative, provided is not a large number of updates where performance is needed.

set serverout on
BEGIN
   FOR r1 IN (
      SELECT a.id , a.text1 AS a_text1 , a.text2 AS a_text2 , b.text1 AS b_text1 , b.text2 AS b_text2
        FROM a JOIN b ON a.id = b.id
       WHERE NVL ( a.text1 , 'x' ) != NVL ( b.text1 , 'x' )
          OR NVL ( a.text2 , 'x' ) != NVL ( b.text2 , 'x' )
      )
   LOOP
      dbms_output.put_line ( 'Updating ' || r1.id || ' from ' || r1.a_text1 || ' , ' || r1.a_text2 || ' to ' || r1.b_text1 || ' , ' || r1.b_text2 ) ;
      UPDATE a SET a.text1 = r1.b_text1 , a.text2 = r1.b_text2 WHERE a.id = r1.id ;
   END LOOP ;
END ;
Posted in Scripts | 1 Comment