This document is continually changing! One of the ways it gets changed is by people communicating with me using comments. In the future, I will host a wiki for this purpose – but for the time being please help me out by posting your suggested changes/improvements as comments!
In my previous post I described the troubles I had with standalone document management softwares. Many of the issues I had were related to a lack of flexibility and lack of integration with my CMS of choice: Drupal.
At first glance(and even after looking quite hard), Drupal seems to be weak when it comes to document management. But just like any Drupal solution, a careful examination of the available modules might turn up the ingredients for the perfect recipe!
In this article, I’m going to describe the steps required to get off the ground with a Drupal based document management solution that will provide:
- Organization of documents
- Revision control
- WebDav access
- Rich metadata
- Indexing for search
- In-browser display of documents
- Document conversion services
- All the goodness you get from building it inside Drupal
- Free authentication
- Free administration interface
- Integration with other Drupal modules (Views anyone?)
- Awesome community of developers
Getting started
I’d recommend testing this out on a fresh install of Drupal6.6 – should you encounter difficulty, the number of modules on an established site could make troubleshooting a bit more difficult. After you’ve got it down, you can move on to your active development site.
Thanks to Arto Bendiken, Miglius Alaburda, Justin Miller, Ben Lavender, Frank Febbraro, and of course Moshe Weitzman.
This article is based on Setting up your system for file conversions with File Framework. Ben gives a very helpful and accurate rundown of what it takes to get going under CentOS. Since I was trying it out under Ubuntu, I thought I’d spend the time documenting my troubles – and include instructions to add some extra bells and whistles.
System stuff
First things first, lets go ahead and get all the packages we need:
sudo apt-get install php5 php5-dev php-pear make php-getid3 libmagic-dev clamav swftools unrtf poppler-utils catdoc ghostscript tzdata tzdata-java alsa-tools alsa-utils libx11-6 libxext6 libxi6 libxtst6 asoundconf-gtk libfreetype6 libpng12-0 libjpeg62 giflib-tools libsm6 openjdk-6-jdk openoffice.org openoffice.org-headless code2html pstotext
sudo pecl install Fileinfo
sudo pear install http://download.pear.php.net/package/HTTP_WebDAV_Server-1.0.0RC4.tgz
sudo pear install http://download.pear.php.net/package/HTTP_WebDAV_Client-1.0.0.tgz
If you have trouble with the install of the pear modules, probably the version has changed – you should visit the HTTP packages page.
JODConverter
We also need to get the JOD Converter. It’s a few .jar files that we’ll stick in a directory in /opt. JODConverter is the piece that actually manages the conversion process through openoffice.
cd /opt && wget http://internap.dl.sourceforge.net/sourceforge/jodconverter/jodconverter-2.2.1.zip && unzip jodconverter-2.2.1.zip && mv jodconverter-2.2.1 jodconverter
Run OpenOffice as a service
Long story short, use a version later than 2.3 to avoid problems running it ‘headless’. This is essential for the file conversion process.
edit: I realized that the OpenOffice service really needs to be running as www-data, so using an init script like this one is really necessary.
#!/bin/bash
#
# description: Open Office Service
#
export WEBUSER=www-data
export PATH=$PATH
export LANG=en_US.UTF-8
start() {
echo -n "Starting OpenOffice service: "
sudo -u $WEBUSER /opt/openoffice.org3/program/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp" -nofirststartwizard &
echo "OpenOffice Started"
}
stop() {
echo -n "Stopping soffice: "
pkill soffice
echo "OpenOffice Stopped"
}
case "$1" in
start)
start
;;
stop)
stop
;;
status)
status soffice
;;
restart|reload|condrestart)
stop
start
;;
*)
echo $"Usage: $0 {start|stop|restart|reload|status}"
exit 1
esac
exit 0
If you want OpenOffice3 like I’m using, you might want to remove the 2.4 packages with apt-get remove and go to openoffice.org and download the .deb packages. I installed by extracting the archive, cd’ing into the folder and using
sudo dpkg -i *.deb
and doing the same in the desktop integration folder. I can’t really recommend using OOo3 because the Ubuntu folks don’t have it in the repos…and the GUI is very crash happy.
Drupal stuff
Clean URLs
Pop over to the Drupal.org page describing how to set up clean urls if you don’t have that going already. Clean urls aren’t necessary, but due to a bug currently in bitcaching – it is.
Install Drush
If you aren’t using the Drush module, I highly recommend it. Although not related to or necessary for this project, since I discovered it one day ago, it’s become one of my favorite modules. It provides a familiar way to install and update your packages – and has a number of modules that extend it’s functionality.
- Install the Drush module by downloading the tarball to your modules directory (sites/all/modules) and extract it.
- Go into your modules page in Drupal and enable the Drush and associated modules. You won’t be able to turn on the simpletest runner module, that’s fine. Also – I wasn’t able to use the CVS support, so I have that disabled as well.
One last thing – you need to add a softlink to drush.php somewhere in your path. For me, I just echoed the path variable and picked the place that looked the best… Make sure you change any paths to whatever works.
% echo $PATH
/home/hopkinsju/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
% ln -s /var/www/drupal/sites/all/modules/drush/drush.php /home/hopkinsju/bin/drush
Now you should be able to type ‘drush’ and the computer will know what you’re talking about.
Add required modules with Drush
Now we just do this to get all the modules we need:
drush pm install bitcache cck dav fileframework rdf views fileserver #FTW!
Drush will go out and grab the latest version of each module and extract it in your ’sites/all/modules’ directory.
note: As of this writing bitcache-alpha3 had a bug in it. Using alpha3 will result in the error “Fatal error: Unsupported operand types in serverpath/includes/common.inc on line 1546″. To resolve this, you can use either the alpha2 or dev versions of the bitcache module
A few other bits
The File Framework can get metadata for and play flash and mp3 files. You need only add a couple things to the vendor folder of fileframeworks:
edit: Using the commands below should get you going without a problem, but I wanted to clarify: You MUST use the ’slim’ version of the xspf player. Also, the path to getid3 should be /vendor/getid3 – there should also be a directory /vendor/getid3/getid3 containing the different modules.
update: new versions of getID3 and flowplayer as of Mar 18, 2009 – also you need to make folders for them…I’ll update the lines in a bit.
cd /path/to/drupal/sites/all/modules/fileframework/vendor
wget http://voxel.dl.sourceforge.net/sourceforge/getid3/getid3-1.7.9.zip
unzip getid3-1.7.9.zip
wget http://flowplayer.org/releases/flowplayer/flowplayer-3.0.7.zip
unzip flowplayer-3.0.7.zip
wget http://voxel.dl.sourceforge.net/sourceforge/musicplayer/xspf_player_slim-correct-0.2.3.zip
unzip xspf_player_slim-correct-0.2.3.zip
Enable the modules
Visit your modules page and enable the modules you need. When I first attempted this, I did run into an error where I had enabled one module or another without first enabling the modules it required(I think it was the RDF API module that needed to be enabled before the File formats). You’ll want to actually look at what you’re installing rather that just checking all the boxes of course. But basically – check all the boxes ;)
Drupal admin area things
- Visit admin/settings/dav/dav_fs and save the page to create the dav directory
- Enable DAV Server in admin/settings/dav
- If you want html highlighting for text files admin/settings/file/format/text
- Enable antivirus scanning (I chose to run it as a program) admin/settings/file/antivirus
- Enable file formats admin/settings/file/format
- Go tell the Fileserver that you want it to use the ‘Files’ vocab. Doing this will enable automatic creation of file nodes when items are added to that folder via WebDAV.
Please post your comments if you can improve on what I’ve done!
Happy document managing!