Improving Magento ImportExport module with Better file format and Google Docs integration

September 16, 2012

In this post I’ll run through two ways that we improve the default Magento ImportExport module when importing products to Magento stores. The two improvements I’ll cover are: importing directly from a Google Docs spreadsheets and improving the multi-line format to make generating the import files easier.

Background to the Magento ImportExport Module

The ImportExport module itself is such a drastic improvement on the old DataFlow module it’s almost unbelievable, so let me first start by saying it’s not half bad to start with. Andreas does a brilliant job of describing the current file format, it’s in German but Google Translate plus the excellent example tables will make it quite readable. Vinai’s presentation at Magento Imagine 2011 is also required reading to understand how the new module works, and most importantly how easy it is to extend.

The two things I didn’t really like about the current functionality were the need to get a single local file to the server for processing and the multi-line format for configurable products and multi-store/site/category/image products. Here’s the changes made to improve on these.

Extending the ImportExport

If you did the background reading, you’ll know from Vinai’s presentation that extending the ImportExport is really easy. If you put a file named Blah.php into app/code/local/Mage/ImportExport then any file uploaded with the suffix .blah will be processed by your code – simple eh? This is the technique we’ll use below.

For example, here’s the Blah.php override – let’s assume it does nothing but log to the Magento log file if it works. You can follow along at home if you put this file into app/code/local/Mage/ImportExport and import a csv file called anything.blah.

class Mage_ImportExport_Model_Import_Adapter_Blah extends Mage_ImportExport_Model_Import_Adapter_Csv {
 
	/**
	* 
	* Dummy override of ImpotExport _init()
	*/
	protected function _init() {
		Mage::log("Blah...");	
		return parent::_init();
	}
 
}

So now we know how we’ll hook into the importing, let’s look at what we need to change to get the file from GoogleDocs.

Fetching the file from Google

We are going to ‘publish’ the spreadsheet as csv, and consume it from within Magento with a simple file_get_contents(). To do that we need to publish the Google Doc and then capture the csv URL, here’s a screenshot showing what you need to do.

Once you have that published URL, put it in text file called test.google. Now let’s make a quick override that handles the .google file.

class Mage_ImportExport_Model_Import_Adapter_Google extends Mage_ImportExport_Model_Import_Adapter_Csv {
 
	protected function _init() {
 
		// Read the url from the .google file
		$this->_fileHandler = fopen($this->_source, 'r');
		$csvUrl = fgets($this->_fileHandler);
 
		// fetch the actual csv and save it into a _downloaded.csv file
		$csv = file_get_contents(trim($csvUrl));
		file_put_contents($this->_source."_downloaded.csv", $csv);
 
		// update the source to point to the downloaded file
		$this->_source = $this->_source."_downloaded.csv";
 
		return parent::_init();
	}
}

Now upload your test.google file and see that it fetches the correct csv data from the Google Doc and imports it using the normal process. Don’t worry, you don’t always have to upload it manually through the admin screen, it works the same if you import automatically from a script too.

Right, so that covers importing from a remote source, now let’s look at how we can improve the format of the csv.

Changing the CSV Format

To make changes to the csv format in a way that does not require core code changes, we’re going to read in and dynamically translate the csv data from our special format back the core Magento format. That way we benefit from any improvements or bug fixes the core team does to the original import process.

Firstly, what’s the format change? Currently if you have a product in mutliple websites in your template it has to look like this snippet. The same goes for any of the multi-value fields, for example the image gallery and categories.

sku name _product_websites
TEST001 Test Product 01 website1
website2



This is painful if you do things like mass product naming using a spreadsheet formula based on some other data source, for example brandA1&” “&nameB1&”, “&sizeC1. The formula in the second row, is not the product in the second row – hopefully that makes sense.

A better format (for us at least) is to have it all on one line, like so:

sku name _product_websites
TEST001 Test Product 01 website1||website2



* Note I have used || as a sub-separator, you could use anything – just make sure it’s not something likely to be in the field.

So how do we transform our format into the original? We process each line of the imported csv, and convert it on the fly to one or more translated rows. Here’s the code where the magic happens.

 
// Translate the custom format to the original format
 
$file = fopen($downloadedFileName, 'r');
$filteredFileName = $this->_source."_filtered.csv";
$filteredFile = fopen($filteredFileName, 'w');
 
while (($line = fgetcsv($file)) !== FALSE) {
 
	$newLines = $this->_getNewLines($line);
 
	// if we got new lines, write them, or write the existing line
	if (sizeof($newLines) == 0) {
		fputcsv($filteredFile, $line);
	} else {
		// put the new lines instead
		foreach ($newLines as $newLine) {
			fputcsv($filteredFile, $newLine);
		}
	}
}
 
public function _getNewLines($line) {
 
	$newLines = array();
 
	if ($line == null) {
		return $newLines;
	}
 
	$cellCount = 0;
	$numCells = sizeof($line);
 
	foreach ($line as $cell) {
 
		if (strpos($cell, $this::SPECIAL_FILTER_STRING) !== false) {
 
			// this line is a special one
			$splitCell = explode($this::SPECIAL_FILTER_STRING, $cell);
 
			$splitCount = 0;
			foreach ($splitCell as $cellPart) {
 
				if (!isset($newLines[$splitCount])) {
					// the first newLine is a copy of the current line, all others are empty
					$newLines[$splitCount] = ($splitCount == 0 ? $line : array_fill(0, $numCells, ""));
				}
 
				// set the cell of the line to the split value
				$newLines[$splitCount][$cellCount] = $cellPart;
				$splitCount++; // next split
			}
		}
 
		$cellCount++;
	}
 
	return $newLines;
}

I have put the Google.php class on GitHub that combines both the remote Google fetch, and the transformation, along with test cases of the format conversion code – so you can see some examples. I also have a demo spreadsheet here – with the old and new versions of a demo product. Also might be a useful starting point for your own imports.

What Next

With this functionality in place you can set up a cron script like that proposed by Vinai in his presentation, to actually sync the Magento products from a Google Docs spreadsheet every day or week (or whatever). We actually have several .google files under source control, that each point to a set of products in a Google spreadsheet.

Just don’t forget to also re-index in your script, have fun!

Ashley

Posts

Hi! I'm Ashley Schroder, a Software Engineer from New Zealand - this is a collection of notes on my experiences with Ecommerce Web Development, particularly Magento Development.

Are Your Customers Getting The Magento Emails You Send?

MageSend Magento Email sending with Amazon SES

I have made a premium Magento Extension called MageSend. MageSend makes it simple and easy to send using Amazon's highly reliable SES email service.

The extension is $99, and comes with a complete money back guarantee, please check it out, I am confident it will resolve any email sending problems with your Magento store.

7 responses to Improving Magento ImportExport module with Better file format and Google Docs integration

  1. Thanks for referencing my article about the ImportExport file format. There is an English version of the article at http://www.avs-webentwicklung.de/fileadmin/documents/20120512_Produktimport_ImportExport_eng.pdf.

    Then, if you don’t want to import from file, there is my array adapter for ImportExport which accepts arbitrary data sources as long as they can be transformed to an array. You can find it at https://github.com/avstudnitz/AvS_FastSimpleImport on github. See the readme for some simple examples. There are some improvements to ImportExport included, like automatic retrieval of images by http, partial indexing of the imported/deleted products only and some more.

    I like your changes in the file format, possibly I’ll integrate that into my Array Adapter too. Your format should make some tasks easier. Thanks for your work!

  2. Awesome post mate. Definitely something I’m gonna play around with. =]

  3. Hello Ashley, you are doing great job by sharing your experience on magento. But i am in trouble. I want your help to add webservices in magento.

  4. Hmm is anyone else encountering problems with the images on this blog loading?
    I’m trying to find out if its a problem on my end or if it’s the blog.
    Any feed-back would be greatly appreciated.

  5. According to this, not easy task for me, I can say- there always some problem occurs with csv products file. I have found this tool that claim to be able to do it .
    Have no idea how well it works and there may be more out there.
    http://www.magentocommerce.com/magento-connect/file2cart-csv-import-to-magento-9090.html

  6. Hello Guys,

    can you please suggest me how to call google file.

    Thanks in advance!!!!

    Manish

  7. Thank you for sharing this I recently had an issue with the magento import and export feature it ended up breaking my index management, I will definitely have to try this out on one of my magento installs