1   What is it?
2   What is required to get it work?
3   How does it work?
3.1   Extracting from a directory without a spooler and without any filter
3.2   Extracting from a directory without a spooler but with white- and blacklist
3.3   White- and blacklist for extraction
3.4   Overview of parameters for an extractor
3.5   Start connected spooling to a mysql-server
 
3.5.1   Creating a database
3.5.2   Overview of parameters for a spooler
3.6   Spooling to a mysql-server using two tables without filtering
3.7   Spooling to a mysql-server using four tables with black- and whitelist
4   Querying a freedb2mysql-database
4.1   Overview of parameters for queries
4.2   Examples for queries
5   What database-schemata are used?
6   Auxiliary function: Merging
7   Notes
8   Links

What is it?

freedb2mysql is a java-package that reads files or an entire directory tree from the freeDB-project, extracts definitions of compact discs (cds), spools and migrates this data to a mysql-database-server.

Besides this spooling you can use the package to perform some basic validation of the freeDB-data. Thinking about the package in two layers you will find a lower layer underneath the upper spooling-facilities.

This lower layer (the so called extractor) reads files or an entire directory and tries to extract valid cd-data. You do not need to spool this data at all, you can run the extractor alone and let him check the read data.

Finally this package provides a few classes that do retrieve disc-data either from a file or directory containing freeDB-datafiles or from a MySQL-database that has been built by this package.

The freedb2mysql-package is freeware under the General Public License (Version 2, June 1991). The package is neither complete nor perfect. Database-interaction relies on two database-schemata, which are not designed to spawn small table-files. These schemata serve an educational purpose; they are not optimized for quick queries or rapid data retrieval.

These and other drawbacks should be prevailed by the fact that you can implement your own schemata. If you are an average-experienced java-programmer you hopefully will be able to utilize the included sources to implement your own schemata or to refine the extractor to your needs.

Important: If you want to use this tool please stick to the jar-file provided by SourceForge the jar linked and introduced on this site is outdated, the documentation here is just held for historical reasons. The notes-section below is up to date and used as some kind of project-blog. If you just look for a quick start take a look at "How to import the FreeDB-database (textfiles) into MySQL".

What is required to get it work?

The first thing you will need is the data from freeDB. Second you need any java-runtime-environment version 1.4 or above. If you want to spool the data to a mysql-server straight away you need a jdbc-driver. This package is tested with Connector/J 3.1, other drivers might work as well.

Finally you need to download the freedb2mysql-package. This package as well as the jdbc-driver should appear in your CLASSPATH. This documentation assumes that

  1. java is installed
  2. standard-java-library (such as src.zip or rt.jar) is in CLASSPATH
  3. freedb2mysql.jar is in CLASSPATH
  4. a jdbc-driver (such as mysql-connector-java-3.1.6-bin.jar) is in CLASSPATH.
  5. the directory T:\public_html\freedb2mysql contains the freedb2mysql-package and the jdbc-driver
  6. a local copy of the freeDB-database resides in G:\Install\freedb\freedb-complete-20090301.merged

If you plan to connect to a mysql-server and stick to Connector/J 3.1, you should rename any version of this driver mysql-connector-java-bin.jar for this is the name of that package java tries to load the jdbc-driver from (see manifest).

The freedb2mysql-package is build to be started via java -jar freedb2mysql.jar. If the CLASSPATH is set correctly the commands java -jar freedb2mysql.jar and java freedb2mysql.Starter are equivalent. This documentation uses the call java -jar freedb2mysql.jar.

How does it work?

This section shows a few ways to use freedb2mysql´s starter using the call java -jar freedb2mysql.jar.

Extracting from a directory without a spooler and without any filter

You want to find out how many valid disc-definitions exist in the genre named "Rock". The files in the subdirectory with the identical name rock should give these definitions. The following command will cause some harddisc-activity and take a while:

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -noDataStringValidation -skipLangAnno -eval -extractFrom G:\Install\freedb\freedb-complete-20090301.merged\rock

G:\Install\freedb\freedb-complete-20090301.merged\rock.EmptyValidator.profiler saved

... many lines omitted ...

Type of Extractor                                                                      ProfilingExtractor
Extracted from                                     G:\Install\freedb\freedb-complete-20090301.merged\rock
Files processed                                                                                       134
Lines processed                                                                                  39687664
Discs dropped                                                                                       31332
Discs extracted                                                                                    692045
Extracting took                                                                              3m:58s:368ms
Profiling took                                                                               7m:18s:388ms

Exceptions                           31332 (100.0%)
-------------------------------------------------------
CDCreatedWithInvalidPlayTime          2201 (  7.02476%)
CDCreatedWithoutValidDiscIdException     7 (  0.02234%)
ExtractedTitleArtistCountMissmatch   16390 ( 52.31073%)
ExtractedTitleOffsetCountMissmatch   10976 ( 35.03127%)
TrackCreatedWithInvalidPlayTime        290 (  0.92557%)
TrackCreatedWithoutTitleException     1468 (  4.6853%)

The starter launched an extractor that traversed through the directory freedb-complete-20090301.merged\rock and interpreted any file found as a definition from the freeDB-database. Because some of the entries found do not fit the required format these entries are dropped.

Note that the sum valid cds plus invalid cds does not represent the number of cd-definitions in a file or a directory. As there might be multiple disc-ids set for the same definition, this definition is regarded to be a complete definition for any id given. The freeDB-data-format allowed this kind of definition.

If there is something wrong with a read definition in a particular file and if this definition is recognized to be invalid the extractor will drop that definition and increase the count of invalid definitions by one. If each definition just comes with a single id associated this calculation is perfectly right.

But if a single definition is intended to serve for more than just one disc and this definition is dropped the summary does not describe the ´real´ number of definitions given in a particular file and checked by the extractor.

As long as all definitions are valid the count of extracted definitions is correct because the extractor does clone a (valid) definition for any disc-id associated with it.

Extracting from a directory without a spooler but with white- and blacklist

You checked files in the rock-directory and found that they specify genres you are not interested in. Let´s say you like to rock but hate to roll. On the other hand you have a slightly different genre that you are even more interested in as in rock but you still want to limit yourself to files listened in the rock-directory. So you look for anything classified as metal.

First you create a plaintext file to contain your whitelist and an other one to contain your blacklist. Additionally you want a report in the current directory. You issue

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -skipLangAnno -eval -extractFrom G:\Install\freedb\freedb-complete-20090301.merged\rock -reportExtractionToFile -genresBlackListFile smallBlackList.txt -genresWhiteListFile smallWhiteList.txt

G:\Install\freedb\freedb-complete-20090301.merged\rock.CascadedValidator.profiler saved

... many lines omitted ...

Type of Extractor                                                                      ProfilingExtractor
Extracted from                                     G:\Install\freedb\freedb-complete-20090301.merged\rock
Files processed                                                                                       134
Lines processed                                                                                  39687664
Discs dropped                                                                                       31332
Discs extracted                                                                                    692045
Extracting took                                                                              1m:32s:907ms
Profiling took                                                                                  23s:215ms

Exceptions                                      686744 (100.0%)
--------------------------------------------------------------------
CDCreatedWithInvalidPlayTime                      2201 (  0.320497%)
CDCreatedWithoutValidDiscIdException                 7 (  0.001019%)
ExtractedTitleArtistCountMissmatch               16390 (  2.386624%)
ExtractedTitleOffsetCountMissmatch               10976 (  1.598266%)
TrackCreatedWithInvalidPlayTime                    290 (  0.042228%)
TrackCreatedWithoutTitleException                 1468 (  0.213762%)
ValidationFailedForGenreDoesMatchBlackList       13806 (  2.010356%)
ValidationFailedForGenreDoesNotMatchWhiteList   584685 ( 85.138712%)
ValidationFailedForIrregularArtistNameException  10234 (  1.49022%)
ValidationFailedForIrregularTitleException       46350 (  6.749239%)
ValidationFailedForIrregularUseOfParenthesis        11 (  0.001601%)
ValidationFailedForTooManyNonLatinChars            326 (  0.04747%)

This result differs from the first try. This time the extractor dropped the most extracted cds because their genre was not whitelisted. An other remarkable amount of cds was dropped because it´s genre matched the blacklisted roll.

The report-file Extracting from rock.txt lists any dropped file and notes a reason why this file is dropped. This file grew up to 76 megabyte.

Note that the sum valid cds plus invalid cds does not fit the count of the example above. The reason: An invalid and dropped definition might represent more than a single definition. If you drop many definitions you surely drop some that are intended to represent multiple definitions (see above).

White- and blacklist for extraction

Working with white- and blacklist provides a veriy simple filter-mechanism. It just works on genre titles. If you want to use any of the lists you should keep three things in mind. First: The extractor does not check both lists for intersection. Second: Any line of such a list will be treated as a single word and a match is given if this word does appear anywhere in the genre of the definition in question. Third: Matching is checked case-insensitive.

Roll   matches   ROCK´N´ROLL  
Rol   matches   pop and roll  
Roll   does not match   rock and rol  

Overview of parameters for an extractor

-acceptDuplicateDiscs  an extractor started with this switch drops duplicate definitions before possible spooling them. By default this switch is not set (turned off) and duplicate definitions are neither parsed, validated and annotated nor passed to spoolers
-easyExtraction  tells the extractor to stay easy. By default the only character-sequence that might separate an artist and a track is " / ". Any line that might contain an artist as well as a title is searched for the last appearance of this sequence. Using the switch -easyExtraction lets the extractor also look for " - " if it does not find the standard-sequence
-eval  launches an extractor to parse, validate and annotate discs without any further treatment
-extractFrom  specifies the source of extraction. This source can either be a textfile or a directory. The later will be traversed and any single file in this path will be read and interpreted as a freeDB-formatted disc-definition
-genresBlackListFile  tells the extractor where to read blacklisted genre-keywords
-genresExclusiveWhiteListFile  tells the extractor where to read whitelisted genre-keywords. Only genres read from this file will be taken as genres
-genresWhiteListFile  tells the extractor where to read whitelisted genre-keywords
-noDataStringValidation  an extractor started with this switch does not validate the data read. But if this switch is not set, the extractor is validating the data read from files and passed to discs´ constructors. If any string (title, artist or genre) of a definition cannot be validated the disc´s definition will be rejected. This validation is meant to drop most of the garbage before it gets spooled and is passed to the mysql-server. Validation is performed by default
-rejectDiscsWithoutReleaseYear  an extractor started with this switch will drop any disc-definition that comes without a release year set. By default this switch is not set (turned off) and definitions without release year are considered to be valid
-reportExtracting  tells the extractor to issue any message and any error encountered. The log is sent to screen and may produce a tremendous load of messages. If you want to examine these messages, consider the following switch
-reportExtractionToFile  if you set -reportExtractionToFile the connector reports to a logfile named Extracting from file.txt where file is replaced by the name of the file (or directory) the extractor reads from
-skipLangAnno  tells the extractor not to guess the language of disc´s meta-data (title, artist´s name, genre and titles of tracks) and annotate the CD with a language-code. As language-annotation is rather time-consuming using this switch will speed up extraction

Start connected spooling to a mysql-server

A spooler needs an underlying extractor that reads a file or directory, parses and validates the data, tries to annotate discs with a language-code and finally passes retrieved disc-definitions to a spooler. This spooler tries to store these discs in a MySQL-database. Quite often discs that were extracted without problems cause exceptions while the spooler migrates them to database. That´s a consequence of the database-layout and should not bother you - usually several discs the extractor considers to be valid at first are not transferred to the server.

Before you can migrate the (valid) data to the mysql-server you will have to establish the connection. This section tells you what to do to get a mysql-connector going.

Creating a database

The spooler will create your database. But he needs an account that is allowed to carry out operations as CREATE or DROP on that mysql-server you plan to connect to. There are two kinds of databases. One is based on a two-tabled-schema the other on a four-tabled-schema. If you want to do the first initial database-creation ´by hand´ on your own and need to know the corresponding sql-statements just start java:

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -dbName freedbsmall -dumpDBschema

CREATE DATABASE IF NOT EXISTS freedbsmall;
DROP TABLE IF EXISTS freedbsmall.artists;
CREATE TABLE IF NOT EXISTS freedbsmall.artists (
  aid BIGINT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  aname VARCHAR(255) NOT NULL,
  PRIMARY KEY(aid),
  UNIQUE(aname)
) TYPE = MyISAM;
DROP TABLE IF EXISTS freedbsmall.genres;
CREATE TABLE IF NOT EXISTS freedbsmall.genres (
  gid BIGINT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  gtitle VARCHAR(255) NOT NULL,
  PRIMARY KEY(gid),
  UNIQUE(gtitle)
) TYPE = MyISAM;
DROP TABLE IF EXISTS freedbsmall.discs;
CREATE TABLE IF NOT EXISTS freedbsmall.discs (
  did BIGINT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  freedbdiscid BIGINT(10) UNSIGNED NOT NULL,
  aid BIGINT(10) UNSIGNED NOT NULL,
  dtitle VARCHAR(241) NOT NULL,
  gid BIGINT(10) UNSIGNED NOT NULL,
  dreleased YEAR(4) DEFAULT NULL,
  dtracks INT(4) UNSIGNED NOT NULL,
  dseconds INT(4) UNSIGNED NOT NULL,
  dlang ENUM ('deu', 'fra', 'ita', 'por', 'spa', 'eng') DEFAULT NULL,
  UNIQUE(aid, dtitle, dtracks),
  PRIMARY KEY(did)
) TYPE = MyISAM;
DROP TABLE IF EXISTS freedbsmall.tracks;
CREATE TABLE IF NOT EXISTS freedbsmall.tracks (
  did BIGINT(10) UNSIGNED NOT NULL,
  tnumber INT(4) UNSIGNED NOT NULL,
  aid BIGINT(10) UNSIGNED NOT NULL,
  ttitle VARCHAR(255) NOT NULL,
  tseconds INT(4) UNSIGNED NOT NULL,
  PRIMARY KEY(did, tnumber)
) TYPE = MyISAM;

CREATE INDEX artists_index ON freedbsmall.artists (
  aname
);
CREATE INDEX genres_index ON freedbsmall.genres (
  gtitle
);
CREATE INDEX discs_index ON freedbsmall.discs (
  freedbdiscid, aid, dtitle, gid, dreleased, dseconds
);
CREATE INDEX tracks_index ON freedbsmall.tracks (
  ttitle
);

If you prefer a database with only two tables you issue:

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -createLargeTables -dbName freedblarge -dumpDBschema

CREATE DATABASE IF NOT EXISTS freedblarge;
DROP TABLE IF EXISTS freedblarge.discs;
CREATE TABLE IF NOT EXISTS freedblarge.discs (
  did BIGINT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  freedbdiscid BIGINT(10) UNSIGNED NOT NULL,
  aname VARCHAR(203) NOT NULL,
  dtitle VARCHAR(187) NOT NULL,
  gtitle VARCHAR(106) NOT NULL,
  dreleased YEAR(4) DEFAULT NULL,
  dtracks INT(4) UNSIGNED NOT NULL,
  dseconds INT(4) UNSIGNED NOT NULL,
  dlang ENUM ('deu', 'fra', 'ita', 'por', 'spa', 'eng') DEFAULT NULL,
  UNIQUE(aname, dtitle, dtracks),
  PRIMARY KEY(did)
) TYPE = MyISAM;
DROP TABLE IF EXISTS freedblarge.tracks;
CREATE TABLE IF NOT EXISTS freedblarge.tracks (
  did BIGINT(10) UNSIGNED NOT NULL,
  tnumber INT(4) UNSIGNED NOT NULL,
  ttitle VARCHAR(255) NOT NULL,
  aname VARCHAR(203) NOT NULL,
  tseconds INT(4) UNSIGNED NOT NULL,
  PRIMARY KEY(did, tnumber)
) TYPE = MyISAM;

CREATE INDEX discs_index ON freedblarge.discs (
  freedbdiscid, aname, dtitle, gtitle, dreleased, dseconds
);
CREATE INDEX tracks_index ON freedblarge.tracks (
  ttitle
);

Overview of parameters for a spooler

Before you start spooling to the newly created database take a look at the parameters for any mysql-connector.

-continueSpoolingToPreExistingTables  tells the connector not to drop existing tables. Instead of a complete (re-) creation the spooler assumes that the needed tables do already exist in the database and tries to fill them. If neither the database nor the needed tables are found the spooler won´t work. Omitting this parameter makes the spooler drop any existing table and (re-) create the database from scratch
-createLargeTables  tells the spooler to use a two-tabled-databaseschema. Omitting this parameter makes the spooler use the default schema with four tables
-dbHost  specifies the name of the (remote) host running the mysql-server. This parameter must be specified even if you connect to you own computer via -dbHost localhost
-dbName  specifies the name of the database used on the mysql-server. Omitting this parameter makes the spooler use the default freedb
-dbPassword  specifies user´s password
-dbPort  specifies the number of the port on the (remote) host. Omitting this parameter makes the spooler use default 3306
-dbUser  specifies the name of the user that is connecting to the server
-dumpDBschema  dumps database-schema used
-jdbcDriver  specifies the name of the jdbc-driver. Omitting this parameter makes the spooler load the default com.mysql.jdbc.Driver
-limitArtistCacheTo  tells the four-tabled mysql-connector to limit internal artist-cache. Don´t mess around with this until you encountered an OutOfMemoryError and have read this
-limitGenreCacheTo  tells the four-tabled mysql-connector to limit internal genre-cache. Don´t mess around with this until you encountered an OutOfMemoryError and have read this
-reportSpooling  tells the spooler to issue any message and any error encountered. The log is sent to screen and may produce a tremendous load of messages. If you want to examine these messages, consider the following switch
-reportSpoolingToFile  if you set -reportSpooling the spooler reports to a logfile named Spooling to database.txt where database is replaced by the name of the database the connector spools to
-skipColumnsWidthsCalc  use some default-widths for columns in database instead of determining them from files extracted and profiled
-skipIndexing  tells the connector not to create indices on the spooled tables when spooling is done. Omitting this parameter leads to default-behaviour and the connector instructs the server to create indices after all discs are spooled to database
-spool  tells the extractor to employ a connector and spool discs

Spooling to a mysql-server using two tables without filtering

You still are in start-position. In addition we assume that there is a mysql-server running on localhost. Finally the user root can connect without a password required.

Now you decide to spool any soundtrack:

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -extractFrom G:\Install\freedb\freedb-complete-20090301.merged\soundtrack -spool -createLargeTables -dbHost localhost -dbUser root

G:\Install\freedb\freedb-complete-20090301.merged\soundtrack.DefaultDiscValidator.profiler saved
G:\Install\freedb\freedb-complete-20090301.merged\soundtrack.DefaultDiscValidator.optimizer saved

Length to optimize 3 stringprofiles to:     496

CD Artist (maximal length, optimal length): 148, 190
CD Title (maximal length, optimal length):  148, 190
CD Genre (maximal length, optimal length):   73, 116

Type of Extractor                                                                       SpoolingExtractor
Extracted from                               G:\Install\freedb\freedb-complete-20090301.merged\soundtrack
Files processed                                                                                        25
Lines processed                                                                                   6840910
Discs dropped                                                                                        9486
Discs extracted                                                                                     94538
Extracting took                                                                                 31s:157ms

Type of CDAdapter                                                             ValidatingAnnotatingAdapter
Discs recieved                                                                                      94538
Discs validated                                                                                     81516
Valid discs changed                                                                                 31835
Type of Comparator                                    Eurocentric profiles (deu, eng, fra, ita, por, spa)
Discs annotated                                                                                     18886
Language guessing                                                                            3m: 1s:716ms
Spooled to                                                                               freedb@localhost
Discs failed to spool                                                                                4941
Discs spooled                                                                                       76575
Discs in database                                                                                   76575
DB-Creation                                                                                         312ms
Spooling to DB                                                                               3m:24s: 33ms
Release-Year-Updates                                                                             1s:513ms
Validation                                                                                      41s:713ms
Index-Creation                                                                                  32s:328ms

Exceptions                                      80593 (100.0%)
------------------------------------------------------------------
CDSpoolerFailedWithDiscInsertion                 4941 (  6.1308%)
ComparedProfileDiffersTooMuch                   46545 ( 57.75315%)
ComparedProfileTooShort                         16085 ( 19.9583%)
ValidationFailedForIrregularArtistNameException  1811 (  2.24709%)
ValidationFailedForIrregularGenreTitleException   164 (  0.20349%)
ValidationFailedForIrregularTitleException      11047 ( 13.70714%)

We focus on the last part of the report above. A remarkable count of discs (abou 13,000) has been dropped by the built-in default-validator (can be switched off using -noDataStringValidation) but an other question arises: Why are so many discs (about 4,900) not spooled to the database? The common reason: A disc not spooled is rejected by the database because the database contains that disc already. It does not use the original disc-id from freeDB to identify the spooled discs.

Both database-schemata used by this package declare the combination "artist - title of disc - number of tracks on disc" to be unique. So if the spooler tries to store two discs with the same title by the same artist with the same count of tracks an instance of the exception CDSpoolerFailedWithDiscInsertion occurs and the second disc will not be spooled to database.

The original freeDB-discid is stored in the database but it is not used to determine if a disc is already spooled to the database or not.

Spooling to a mysql-server using four tables with black- and whitelist

We assume that you dropped the tables created above or start over with a freshly installed mysql-server. With this next pass you try to get any rock that relates to metal but drop any rock that rolls. You can reuse the lists you already created (see above) and issue

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -extractFrom G:\Install\freedb\freedb-complete-20090301.merged\rock -spool -dbHost localhost -dbUser root -genresBlackListFile smallBlackList.txt -genresWhiteListFile smallWhiteList.txt

G:\Install\freedb\freedb-complete-20090301.merged\rock.CascadedValidator.profiler saved
G:\Install\freedb\freedb-complete-20090301.merged\rock.CascadedValidator.optimizer saved

Length to optimize 3 stringprofiles to:     496

CD Artist (maximal length, optimal length):  69, 158
CD Title (maximal length, optimal length):  107, 196
CD Genre (maximal length, optimal length):   52, 142

Type of Extractor                                                                       SpoolingExtractor
Extracted from                                     G:\Install\freedb\freedb-complete-20090301.merged\rock
Files processed                                                                                       134
Lines processed                                                                                  39687664
Discs dropped                                                                                       31332
Discs extracted                                                                                    692045
Extracting took                                                                              2m: 1s:169ms

Type of CDAdapter                                                             ValidatingAnnotatingAdapter
Discs recieved                                                                                     692045
Discs validated                                                                                     36633
Valid discs changed                                                                                  7131
Type of Comparator                                    Eurocentric profiles (deu, eng, fra, ita, por, spa)
Discs annotated                                                                                      8543
Language guessing                                                                            1m: 3s:295ms
Spooled to                                                                               freedb@localhost
Discs failed to spool                                                                                4750
Discs spooled                                                                                       31883
Discs in database                                                                                   31883
DB-Creation                                                                                      2s:391ms
Spooling to DB                                                                               2m:33s:732ms
Release-Year-Updates                                                                             1s:508ms
Validation                                                                                      15s:712ms
Index-Creation                                                                                   5s:891ms

Exceptions                                      688252 (100.0%)
--------------------------------------------------------------------
CDSpoolerFailedWithDiscInsertion                  4750 (  0.690154%)
ComparedProfileDiffersTooMuch                    12463 (  1.810819%)
ComparedProfileTooShort                          15627 (  2.270534%)
ValidationFailedForGenreDoesMatchBlackList       13806 (  2.005951%)
ValidationFailedForGenreDoesNotMatchWhiteList   584685 ( 84.952168%)
ValidationFailedForIrregularArtistNameException  10234 (  1.486955%)
ValidationFailedForIrregularTitleException       46350 (  6.734451%)
ValidationFailedForIrregularUseOfParenthesis        11 (  0.001598%)
ValidationFailedForTooManyNonLatinChars            326 (  0.047366%)

Using such a four-tabled connectors might yield OutOfMemoryErrors. If you encounter one you either limit the connector´s internal cache or increase the memory of java´s vm using switches like -Xms128m -Xmx512m.

Querying a freedb2mysql-database

Having built a database is nice - querying it is even neater. The retrieval-API of this package is barely worked out and a user-interface does not exist, even querying can just be done via command-line - anyhow it works and offers access to previous spooled data on a mysql-server. This section shows how you retrieve discs selected by an artist´s name, a disc´s title or freeDB´s discId.

Overview of parameters for queries

To gain access to a mysql-server and its freedb2mysql-database, you have to use the parameters that configure a spooler mentioned above (Overview of parameters for a spooler). Beyond this you can use the following switches and parameters for queries:

-queryServerForArtist   specifies the name of an artist whose discs will be dumped. The name might contain SQL-wildcards as % or _
-queryServerForDisc   specifies the title of discs that will be dumped. The name might contain SQL-wildcards as % or _
-queryServerForFreeDBDiscId   specifies a hexadecimal freeDB-discId. This freeDB-discId might represent a single disc as well as multiple discs. Please note that this package uses the freeDB-discId as an attribute amongst others. Discs in a freedb2mysql-database are not identified by the original freeDB-discId. As the packages performs some filtering while the freedb2mysql-database is built you will surly not find all freeDB-discIds in a freedb2mysql-database - the later is just a (smaller) port of the original freeDB
-serverDump   this switch dumps the complete content of the database connected
-showTracks   this switch makes the output of results more detailed

Examples for queries

Let´s say you like to examine different (mis-) spellings of the name AC/DC, your freedb2mysql-database is named freedblarge and the mysql-server operates on localhost. Using the SQL-wildcard % for any kind and any count of chars in between you type

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -dbHost localhost -queryServerForArtist AC%DC -dbName freedblarge

Querying freedblarge (2239351 entries) for discs by artists named AC%DC

Back in Black by AC - DC (10 tracks, 2523 sec, id 7809db0a, genre: rock)
... many lines omitted ...
Rare, Rarer, Rarities by AC~DC (14 tracks, 3522 sec, released 2002, id cc0dc20e, genre: Rock)

810 discs found

Later on you wonder what disc(s) and tracks might be identified by the freeDB-discId c80ac00f in database freedbsmall you type

T:\public_html\freedb2mysql>java -jar freedb2mysql.jar -dbHost localhost -dbName freedbsmall -queryServerForFreeDBDiscId c80ac00f -showTracks

Querying freedbsmall (2239351 entries) for id c80ac00f

Machine 15 by Millencolin (15 tracks, 2752 sec, released 2008, id c80ac00f, genre: Punk)
1	 Machine 15 (by Millencolin, 149 sec)
2	 Done is Done (by Millencolin, 230 sec)
3	 Detox (by Millencolin, 218 sec)
4	 Vicious Circle (by Millencolin, 251 sec)
5	 Broken World (by Millencolin, 188 sec)
6	 Come On (by Millencolin, 218 sec)
7	 Centerpiece (by Millencolin, 11 sec)
8	 Who's Laughing Now (by Millencolin, 187 sec)
9	 Brand New Game (by Millencolin, 208 sec)
10	 Ducks & Drakes (by Millencolin, 198 sec)
11	 Turnkey Paradise (by Millencolin, 195 sec)
12	 Route One (by Millencolin, 210 sec)
13	 Danger For Stranger (by Millencolin, 179 sec)
14	 Saved By Hell (by Millencolin, 218 sec)
15	 End Piece (by Millencolin, 92 sec)

Abred by Andarta (15 tracks, 2752 sec, released 1980, id c80ac00f, genre: Celtic)
1	 Abred (by Andarta, 215 sec)
2	 Laeron Avel (by Andarta, 129 sec)
3	 Distro Marzhin (by Andarta, 320 sec)
4	 Eun Den Yaouank (by Andarta, 93 sec)
5	 Eun De A Oen E Pourmen (by Andarta, 198 sec)
6	 Bale Arzur 1 (by Andarta, 103 sec)
7	 Bale Arzur 2 (by Andarta, 247 sec)
8	 Melusine (by Andarta, 208 sec)
9	 Ton D'Ar Ger (by Andarta, 188 sec)
10	 Ar Re C'Hlaz (by Andarta, 156 sec)
11	 Soniou Da Zansal (by Andarta, 185 sec)
12	 Evel An Avel (by Andarta, 160 sec)
13	 Ton Bale (by Andarta, 89 sec)
14	 Ar Falc'Hon (by Andarta, 199 sec)
15	 Falskredenn (by Andarta, 262 sec)

2 discs found

What database-schemata are used?

There are two database-schemata. The two-tabled-version provides a scenario to practice basic SELECT-statements while the second distributes the same data among four tables and is more normalized. The later schema is intended to provide a background for practicing more complex SELECT-statements.

I did not examine the databases too careful but one resemblance is obvious: Both databases use MyISAM-tables and they both grow big. If you spool a complete database from freeDB and create indices the corresponding directories need about 2.3 up to 2.6 gigabyte discspace. The four-tabled-schema produces slightly smaller databases than the two-tabled-schema but both grow big anyway.

Auxiliary function: Merging

The database downloadable from freeDB comes as a huge tar-ball containing an incredible amount of tiny text-files (2,649,632 by February 2009, distributed amongst just eleven subdirectories). This number of files is not sensibly manageable using a file-system such as NTFS. First approach is to unite many files into one, to merge them into only few big files.

Merging freeDB-files is not just a windows-accommodation: A pleasant side effect of working on few big files as a database to extract from is a significant increased speed retrieving data from these large text-files. Iterating over and parsing just 532 files takes an extractor less than 10 minutes. An extractor working on the original set of text-files needs about 10 hours to do the same.

But there is also a drawback merging files: about 4000 parsable discs get lost. As these are just about 0.2 percent of the original data it seems to be a noticeable benefit to first merge data before spooling.

For there are so many tiny files in these freeDB-directories you might end up with an Exception in thread "main" java.lang.OutOfMemoryError: Java heap space. An easy way to get rid of this problem is to increase java´s heap-space by some call like java -Xms64m -Xmx256m -jar freedb2mysql.jar -merge -freeDBSourceDir freedb-complete-20090301 which starts java with an initial heap size ob 64 megabyte and a maximum heap size of 256 megabyte.

To configure merging you use these options:

-freeDBSourceDir   specifies the directory that contains the freeDB-data
-freeDBTargetDir   specifies the directory the concatenated data is written to. If this directory is not given the data will be written in a newly created directory with the same name as freeDBSourceDir and the extension .merged
-merge   starts merging before any other extracting / spooling begins
-quiet   suppresses output of regular merging-operation
-targetFileSize   specifies the count of bytes for a target files within the .merged-directories. Such a target-text-file will be filled up with single files from freeDB until it´s byte-count reaches the count specified here. If you do not set a different count the default 5,000,000 (5 megabyte) is used
-verifyMerge   verifies merged data by comparison

Notes

19.3.2010   ICU 4.4 is integrated and tested with Windows XP and Linux as well, it became default detector (again).
11.3.2010   Shifted the tests from the inner source-package to an external dir and grouped them into a JUnit-Suite. Doing so messed up the entire SVN, it has been restructured.
28.1.2010   Spooling to two-tabled (large) databases (LATIN2-encoded) results in a data-set that contains 2,002,415 entries. The same directory freedb-20091201 transferred into a four-tabled (small) database (UTF8-encoded) ends up with just 1,616,090 discs stored - the UTF8-dataset loses discs in the range of 20% compared to LATIN2-encoding.
18.12.2009   As David Alimian pointed out the importer assumed wrong encoding of read text-files from FreeDB and just got a small amount of data right. To circumvent this problem of mistaking the encoding and losing much data the importer now integrates Chardet and ICU.
For now merging text-files assumes these files to be UTF-8-encoded and uses ICU to check for an other encoding. If ICU detects an other encoding and the VM supports it this other encoding is used parsing the text.
Along with these changes on the text-file-side changes on the database-side became necessary. The four-tabled (large) databases use a static row-width and do stick to LATIN2-encoding. The two-tabled (small) database changed it´s encoding to UTF8.
8.11.2009   So far the GUI seems to be stable enough to announce it right now: freedbimporter has a (rather coarse) Swing-GUI! The sources available here should be considered to be frumpy stuff held for historical reasons only. So if you want to employ this converter please stick to the jars provided by SourceForge.
9.9.2009   The svn-code freedbimporter now includes a minimal cddb-server. If webserver wispa.gotdns.org welcomes you the same address will provide cddb-access to the four-tabled freedbsmall via port 8880.
3.4.2009   This upload is presumably going to be last crafted freedb2mysql-package as I try to cope with SourceForge´s SVN. Try svn co https://freedbimporter.svn.sourceforge.net/svnroot/freedbimporter freedbimporter
24.2.2009   Started to implement generics.
9.11.2008   After removing a last tiny but devastating handbrake (HashMap) language-annotation became default.
22.9.2008   The implementation of a TrackicalDisciminator entailed several slight changes that have not been tested completely.
2.9.2008   Slightly changed sql-statements CREATE INDEX with the intention to develop a freedb-server utilizing this package someday.
2.7.2008   Slightly changed the tracks-tables with the intention to develop a freedb-server utilizing this package someday.
28.3.2008   Switched IDE to eclipse and slightly changed the layout of FreeDB2MySQL´s jar. Spooler also has changed a bit and will validate transferred discs with next pass.
20.1.2008   Changed database-schemata and got rid of the freeDBdiscs-table. freeDB´s discId now is an attribute that is stored within the discs-table. The retrieval-API developed the past months now is accessible via command-line.
2.8.2007   The latest refactoring ripped the script-generator out of the repository. There is just one kind of spooler left that builds a mysql-database from the freeDB-files. This connector is meant to carry out data-retrieval from the database in the future.
13.6.2007   An attempt to get rid of data-garbage led to the introduction of different validators. Along their introduction the extractors and the spoolers underwent some changes.
10.5.2006   While testing the latest 5-something-version of MySQL I came across the necessity to tell a server explicitly that all tables to create are MyISAM-tables. This is reflected in the latest version of this package and should give some more MySQL-compatibility.
16.3.2006   Starter modified along the introduction of a utility-class that is to optimize columns-widths in the schema of a three-tabled database. Optimizing itself is not optimized and takes about 20 minutes.
24.10.2005   Finished a bit of refactoring with the intention that one day it should be possible to profile the data from disc to an extend that one can judge be the frequency of characters in cd´s artist-name and titles if this cd is a German, an English, a French or a Spanish cd.
22.5.2005   Released Disc-Information-Service (DIS) that utilizes databases created by this package.
12.4.2005   While spooling freedb-win-20050408 and it´s misc-dir an OutOfMemoryError occurred.
29.3.2005   After revising this page it´s URL has been registered at some searchengines.
27.1.2005   After removing some handbrakes the spooling works acceptable slow. Spooling a complete freeDB-dataset requires eight to ten hours on an elderly athlon 500.
23.1.2005   After several devastating attempts to spool freeDB-databases the MySQLConnectors have been modified. Testing still goes on but the performance stays bad. Either the database-schemata are useless or about 1,5 million discs and 15 million tracks can not be spooled any faster than they are spooled today.
10.8.2004   Released first Version of the package without elaborated API-documentation and only few testing.

Links






E-Mail: Christian Kruggel Version 9.4 by 21.8.2013