Background

HP is posting many challenges on TopCoder to advertise their new technology Haven platform.
I am currently involved in this mapping tutorial challenge, which requires the setup of a local version of Vertica DB.
The data provided by TC contains two files. The first is a schema.sql, which drops a table first and then creates it. The second is the data insertion file, which contains over 200,000 entries of INSERT statements, following a DELETE FROM table cleaning statement.
I created the database named NYTrafficCollision using adminTools, and connected to it. Then I executed the schema.sql file, and then executed the data insertion SQL file. (For the record, the data insertion was quite slow. I started the execution of the data insertion SQL file at about 23:30, and went to bed to read a book. When I finished the book in about 1:00 am, I got up and checked. The data was still being inserted and I just left my PC on all night.)

Problem description

The problem was found when I tried to fetch records from the DB using JDBC Vertica Driver. The ResultSet returned from the Connection was always containing 0 records.
Before this tutorial challenge, I have previously written another tutorial project that connected to Vertica performing CRUD actions, which was fine. This time when I tried to connect to the DB and perform only reading actions, however, no record was returned at all.
Nevertheless, when I executed the command below the result got me feeling very weird.

    SELECT COUNT(*) FROM nypd_motor_vehicle_collisions;

The return result was

 count
-------- 
 204709
(1 row)

How come the data was stored in the table, and I could not get any records from it using JDBC?

Struggling

Feeling weird of this bug, I tried the following things to make things work.

  1. I checked whether the connection to the DB was successful, and as it turns out the connection was good.
  2. I printed the returned ResultSet size using a while(rs.next()) loop, and the print result was always 0.
  3. I used Statement instead of PreparedStatement, no luck.
  4. I got rid of try-with-resource statement introduced in Java 7, no luck.
  5. I tried to open my old project (the one teaching how to fetch Tweets from Twitter and how to get sentiment analysis results using HP IDOLOnDemand). It cannot connect to the DB called TwitterAnalysis.
    So I turned off the database NYTrafficCollisions and turned on TwitterAnalysis instead. The query against the database was good.
    It seemed the problem was in the project or in the database.
  6. I switched back to the new project, and tried to use the project to connect to the old DB TwitterAnalysis and the printed size of returned ResultSet was 474.
    Finally.
  7. I stopped the database TwitterAnalysis and started NYTrafficCollisions again.
  8. This time, very weird things happened. When I executed the following command again, the return result was no longer 204709.
            SELECT COUNT(*) FROM nypd_motor_vehicle_collisions;
        

The return result was

     count
    -------- 
     0

I didn’t know why this was the case. Was the schema.sql automatically executed again? Because there is a DROP TABLE statement in the head of it.

Then I realized maybe restarting the database was the cause. I did a search on Google and found this post. Fortunately(Unfortunately?), someone had been in the same place where I was. There is a reply in the post describing a potential cause for this problem.

When you create the database, the schema (design) only contains the dbadmin user definition. Any additional objects you create using DDL (e.g. tables, projections, users, roles) is automatically committed and persistent. Any data you add using DML may or may not be persistent depending on the commit options used:
- insert is not committed by default, you have to issue a “commit;” - copy is committed by default, but it has a “no commit” option to alter that behaviour

Resolution

I executed the data insertion SQL file again, but this time I didn’t wait for it to finish. I interrupted using CTRL + C and executed the SELECT COUNT(*) statement. The count was 47 this time.

I issued a COMMIT; command in the database terminal, and tried to fetch data using JDBC again.

Voila! The data was presented in the Webpage.

After debugging

I checked on Google again to see the commit settings of Vertica. I found this post on Stack Overflow. The OP was basically going through the same problem with me. (Why haven’t I found this post earlier!)

The accepted answer described the default setting of Vertica. Apparently, vsql is in transaction mode by default. Thus everything we insert in vsql would not be commited automatically.

We can use

    \set AUTOCOMMIT on

and

    \set AUTOCOMMIT on

to configure this setting.

Conclusion

First conclusion: It’s so stupid of me not thinking of COMMITTING A TRANSACTION. Nonetheless, there will not be a next time that I’ll go through this kind of mistake again.

This debugging process cost me about 3 hours. The process was quite frustrating, especially when I found all the data lost when I restarted the database.